Thursday 18 January 2007

Stealing from the (Steel)Vine

First some background: The Asus P5W-DH Deluxe includes an Silicon Image SiI4723 hardware RAID controller. Yes, that's hardware RAID - not fakeraid. And since the controller presents the OS with just one logical disk (as it should), Silicon Image provide a utilty called SteelVine Configuration Manager (SVCM) that can be used to view the current state of the controller and attached disks / arrays.

Now, since we use an Asus P5W-DH Deluxe for a server at work, it would be good to be able to get email (or SMS) notifications if the array becomes degraded. The obvious play to start is with the SVCM.

The bad news: SVCM is a graphical application - and scraping status information from the SVCM GUI is not that easy :(

The good news: SVCM has been implemented with a server / client architecture, using TCP to communicate between the two. So, it should be possible to create our own clone of the SVCM client. It turns out that SVCM uses a simple XML protocol, and that the server is stateless (so far as I can tell), so that should make getting the info we want pretty easy. However, there was one little hurdle that needed to be overcome first.

The SVCM client / server protocol seems to be:
1. Client connects the server on port 51115.
2. Client sends XML request.
3. Server sends XML response.
4. Client goes back to step 2, or closed the connection.

To get the info we need, we only needed to send the "SVConfigCmd" request. And since my language of choice for such a simple task is BASH, I tried to use netcat to submit the request for me:

$ echo "<SVConfigCmd/>" | nc localhost 51115

This gets the correct response, just as I wanted, but it crashes the SVCM server :(

$ ./start_server.sh
Starting SteelVine daemon
FOUND A RIGEL DEVICE!!
(performed request via netcat here)
Mutex destroy failure: Device or resource busy
pure virtual method called
./start_server.sh: line 12: 12304 Aborted SteelVine -e

Some quick playing revealed that the problem only occurs when the client disconnects immediately after it reads the response. If there is any noticable delay at all, then the server will not die. Presumably, the offical SVCM client would crash the server too if you managed to run it, and exit it very quickly.

So anyway, the solution is simply to add a small delay before disconnecting... but how to do that? It would be easy if we were using a programming langugae with TCP/IP sockets such as perl or C/C++, but this is BASH...

Well, the solution was pretty obvious - I just needed to create something that would output the XML request to stdout and then pause for some amout of time before closing stdout. This would, of course, cause netcat to do something very similar, which is exactly what we want. I suppose this could be done in a number of ways, but I chose to do it like this:

$ bash -c 'echo "<SVConfigCmd/>" ; sleep 1' | nc -w3 localhost 51115

There! A simple 1 second sleep, and the SVCM server now keeps running :)

From there, it was trivial to write a BASH script that parses the XML result, and fires off an email if the RAID array and/or disks are reported as degraded in any way.

Paul.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home