3Ware RAID rebuilding

I’ve had the dubious honour of seeing some RAID failures and rebuilds lately. It’s the kind of thing that doesn’t get written about in the manuals very well, in particular what your RAID will report when it’s having trouble. So, here are a couple of examples from a 3Ware RAID controller using tw_cli software. This is what tw_cli /c4 show displays when we have a dead drive:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
u0    RAID-1    DEGRADED       -       -       -       149.05    ON     -      

Port   Status           Unit   Size        Blocks        Serial
p0     OK               u0     149.05 GB   312581808     G2109NHG            
p1     DEGRADED         u0     149.05 GB   312581808     G20X1BWG            

So, we swap the drive, and it looks like this while rebuilding:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
u0    RAID-1    REBUILDING     89      -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     DEGRADED         u0     149.05 GB   312581808     G209Y0HG

and after a little while…

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
u0    RAID-1    OK             -       -       -       149.05    ON     -

Port   Status           Unit   Size        Blocks        Serial
p0     OK               u0     149.05 GB   312581808     G2109NHG
p1     OK               u0     149.05 GB   312581808     G209Y0HG

There are plenty of obvious strings to match in this output (though there are many other reports available), so it’s a reasonable thing to base a monitoring script on.

It’s nice to see it actually work, and makes me extremely grateful that I bothered getting RAID n the first place. This would be a much unhappier post if I hadn’t.

Dell RAID firmware and lockfile on Ubuntu

Ian P. Christian ran into this problem a while ago:

On a seperate note, anyone know how to upgrade firmware using Dell’s software
on a non-RH system?

# ./RAID_FRMW_LX_R107404.BIN
which: no lockfile in
spsetup.sh: Cannot find utilities on the system to execute package.
Make sure the following utilities are in the path: sed lockfile tail rm mkdir
chmod ls basename

‘lockfile’ is missing – whatever that is!

Lockfile just doesn’t seem to exist outside of RedHat (e.g. lockfile-progs on Debian doesn’t include it), however, you can of course find it on a RedHat system, and I happen to have one handy. I just copied the binary to my Ubuntu installation, where it appeared to run just fine, and allowed the firmware updaters to run ok. Thought someone might like to know.

World’s worst “Managed” ISP: 123-reg

I recently fell for 123-reg‘s managed, dedicated server spiel. It sounds like a great deal – reasonable CPU, RAID, Ubuntu, good connectivity, quick provisioning etc for a bargain price. But it’s mostly untrue in the ways that matter. There is a distinct difference between “managed” and “dedicated” hosting. What you might expect from a managed service:

  • System installation and config
  • Frequent system updates and security patches
  • On-request package installation
  • Any custom tweaks requiring root access
  • Security audits

The whole point is to relieve the customer of the kind of sysadmin tasks that they might otherwise do themselves. Without this kind of service, it’s just a dedicated server. The managed service may or may not sit on top of a dedicated server (e.g. it’s possible to get management of a colo box), also provided by the ISP. See RackSpace for a classic managed service. A dedicated server should supply the following as a minimum:

  • Physical server
  • Hardware replacement guarantee (on a component basis)
  • Connectivity & IP address(es)
  • Config of reverse DNS entries
  • Installed OS
  • Root access over ssh

It’s common for the customer to forego root access in a managed service as it can make the ISP’s job impractical.

123’s approach is at an impossible halfway point. They set up various remote management services (some of which don’t work), provide the kind of locked-down service you might expect from a managed service (no root access), but then they completely fail to provide any of the managed services on top of this arrangement. So you’re left with a server that’s more or less unusable. For example, the only way to get additional packages installed is to request root access and do it yourself, but doing this means that you are then no longer eligible for any of their support services (such as they are).

The servers are set up with very fixed usage in mind. It’s all driven by domains being assigned to servers, them handling their own DNS, and providing absolutely minimal web hosting services for each domain. There is a control panel thing, but it’s very restricted, nowhere near what you get with even relatively clunky control panels such as Webmin. Each defined domain gets access to a single MySQL database. They provide a MySQL management interface for the administrator, but it doesn’t work, and so there is no way of getting root access to MySQL at all, and if you want to access a database for a domain, you have to log into that domain and access its database from there – there is no overview of the system as a whole. If your intended usage pattern happens to exactly match what they set up, I might concede that it might make a passable dedicated service, but there’s no way it could possibly be described as “managed” as they don’t lift a finger.

Over a couple of weeks, I sent perhaps 15 different email support requests. I received one reply apologising for the slow provisioning of the server, and another that said that I couldn’t have MySQL root access. None of my other requests were even acknowledged.

They provide a premium-rate phone number for support, and I called this a few times. Sometimes there was no answer, then I was told that I had reached the wrong department, then I was told that they’d not replied to my email because they had a huge backlog (figures), then that they had lost my email so could I resend it. It was clear that whoever I spoke to was not of the technical variety.

Eventually I had had enough. So I visited the page that features a ‘cancel my server’ button. It didn’t work (a lovely error 500 instead). So I emailed them requesting that they cancel my server and provide a refund as their product was simply not fit for purpose. Amazingly, I did receive a reply saying that they did not provide ‘that kind of management’ (curious, since they are not providing any kind of management), but that I would receive a refund. That was 6 weeks ago. In the last 2 weeks, they have been attempting (and failing) to charge my credit card every day. I’ve reported that to my card company, but it sounds like it’s not getting as far as them, i.e. it’s yet another internal system that’s broken, and despite another 5 emails describing this problem to them, they have still failed to reply. Looks like I will have to dispute my original setup payment to them.

Since then I’ve used uk2.net, who have been absolutely excellent, and a massive contrast to 123 (and they have an astonishing buy one get one free offer on during April). Other than that, Mythic Beasts’ MacMini service provides way better service than 123, while only ever describing it as a dedicated service.

Dutch government gets it right

The Dutch government passed a law last year that all government sites conform to some extremely tight specifications – W3C validation is just a small part of it. It’s great to see this kind of thing happening at such a high level. Now we just need to get the UK and EU to follow suit. There’s an english translation here and an article on 456 Berea Street.