3Ware RAID rebuilding

Monday, May 14. 2007

I've had the dubious honour of seeing some RAID failures and rebuilds lately. It's the kind of thing that doesn't get written about in the manuals very well, in particular what your RAID will report when it's having trouble. So, here are a couple of examples from a 3Ware RAID controller using tw_cli software. This is what tw_cli /c4 show displays when we have a dead drive:
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    DEGRADED       -       -       -       149.05    ON     -      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG            
p1     DEGRADED         u0     149.05 GB   312581808     G20X1BWG            
So, we swap the drive, and it looks like this while rebuilding:
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    REBUILDING     89      -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     DEGRADED         u0     149.05 GB   312581808     G209Y0HG
and after a little while...
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     OK               u0     149.05 GB   312581808     G209Y0HG
There are plenty of obvious strings to match in this output (though there are many other reports available), so it's a reasonable thing to base a monitoring script on. It's nice to see it actually work, and makes me extremely grateful that I bothered getting RAID n the first place. This would be a much unhappier post if I hadn't.

Trackbacks


Trackback specific URI for this entry
    No Trackbacks

Comments


    No comments

Add Comment

You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

    Submitted comments will be subject to moderation before being displayed.