3Ware RAID rebuilding

I’ve had the dubious honour of seeing some RAID failures and rebuilds lately. It’s the kind of thing that doesn’t get written about in the manuals very well, in particular what your RAID will report when it’s having trouble. So, here are a couple of examples from a 3Ware RAID controller using tw_cli software. This is what tw_cli /c4 show displays when we have a dead drive:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    DEGRADED       -       -       -       149.05    ON     -      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG            
p1     DEGRADED         u0     149.05 GB   312581808     G20X1BWG            

So, we swap the drive, and it looks like this while rebuilding:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    REBUILDING     89      -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     DEGRADED         u0     149.05 GB   312581808     G209Y0HG

and after a little while…

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     OK               u0     149.05 GB   312581808     G209Y0HG

There are plenty of obvious strings to match in this output (though there are many other reports available), so it’s a reasonable thing to base a monitoring script on.

It’s nice to see it actually work, and makes me extremely grateful that I bothered getting RAID n the first place. This would be a much unhappier post if I hadn’t.