I’ve had trouble in the past setting up Linux-HA a.k.a heartbeat to create a redundant front-end for a web service. The HA documentation is quite thorough and detailed, but it’s lacking realistic examples to get a working system up and running. This is a quick guide to creating a 2-way cluster with one or more floating IP addresses and automatic failover on Ubuntu 9.04. I also include a quick config for pound, a reverse proxy / load balancer.
Firstly install the standard heartbeat package:
aptitude install heartbeat
The ubuntu package is heartbeat 2.1.4, which is the last version to use HA 1.0 syntax. HA 2.0 has switched to a more complex XML config system, but that’s another story.
For this example I’ll use two servers that are going to act as your redundant front-end. I’ll call them www1 and www2 and give them IP addresses of 192.168.1.1 and 192.168.1.2. Then we’ll have two floating IPs at 192.168.1.3 and 192.168.1.4. One machine is designated as a primary, and it will hold the floating IPs by default, so all traffic for the floating IPs will go via that server. If that server fails, the IPs will be brought up by the other one. When the primary server reappears, the floating IPs will revert back to it because we are specifying “auto_failback on” in the config.
You need to edit the files mentioned on both machines, and they should be identical on both of them. First we need to set up /etc/ha.d/ha.cf. This contains information about the nodes that are going to be used by the resources we want heartbeat to manage – in this case the two floating IPs.
node www1 www2
udpport 694
ucast eth0 192.168.1.1
ucast eth0 192.168.1.2
deadtime 5
deadping 5
debug 0
debugfile /var/log/ha-debug.log
logfile /var/log/ha.log
auto_failback on
This defines the two nodes and uses unicast UDP between them for monitoring their state, i.e. whether they are still there. The other properties are well documented.
Next we need to define the resources we’re going to manage, the floating IPs. This is done in /etc/ha.d/haresources
in just one line (this is actually a shorthand for a more complex syntax, but it’s all we need):
www1 192.168.1.3 192.168.1.4
Before this will work, you need to enable a kernel option that allows services to specify an IP address that is not currently active on that machine. To do this, edit /etc/sysctl.conf and add this line:
net.ipv4.ip_nonlocal_bind=1
If you’re running the standard ufw firewall, you’ll need to allow heartbeat to communicate with each server:
ufw allow proto udp from 192.168.1.1 to any port 694
And something similar on the other machine.
Now you can start heartbeat:
/etc/init.d/heartbeat start
To check that it’s working, use ip:
ip addr
You should see the floating IPs up on whichever machine you designated as your primary, something like this:
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:aa:bb:cc:dd:ee brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.3/24 brd 192.168.1.255 scope global secondary eth0:0
inet 192.168.1.4/24 brd 192.168.1.255 scope global secondary eth0:1
inet6 fe80::123:1234:1234:1234/64 scope link
valid_lft forever preferred_lft forever
If you stop networking, kill the power or something equally unpleasant to one of the servers, the other should take over automatically – you’ll see the .3 and .4 IP addresses come up on the other machine.
So what next? Well, floating IPs are not the end of the story. You need something to accept connections to the IPs and distribute them to back-end nodes – which could well be the same machines as heartbeat is running on. Pound is a nice easy solution for this, and you can install the ubuntu pound package with a simple aptitude install pound
. A simple config for using our two front ends as web servers could be (based on default Ubuntu package config):
## Minimal sample pound.cfg
##
## see pound(8) for details
######################################################################
## global options:
User "www-data"
Group "www-data"
LogLevel 3
## check backend every X secs:
Alive 5
# poundctl control socket
# This original line suffers from Ubuntu bug https://bugs.launchpad.net/ubuntu/+source/pound/+bug/312336
#Control "/var/run/pound/poundctl.socket"
Control "/var/run/poundctl.socket"
Client 60
TimeOut 30
ListenHTTP
Address 192.168.1.3
Port 80
Service
BackEnd
#www1
Address 192.168.1.1
Port 80
TimeOut 30
End
BackEnd
#www2
Address 192.168.1.2
Port 80
TimeOut 30
End
Session
Type IP
TTL 7200
End
End
End
This config only handles one of the floating IPs we created, but you’d just copy it and specify the other floating IP as the service address. Note the fix for the Ubuntu bug in the PID location. You’ll notice that pound’s config has some things in common with heartbeat. Like heartbeat, it will keep an eye on the web servers, and if one disappears, pound will stop sending it traffic. This way failover at the IP and http layers is kept in sync.
The key thing to bear in mind when setting up a redundant system is that you can’t have a ‘master’ server; you need to ensure that the configs are identical, since the setup has to cope with either machine dying at any time, so while we have www1 as a default, it’s purely arbitrary.
I hope you found this guide useful – it would have saved me some time if I’d found something like it!
Hi; Thanks for the article, but, it does not work for me. Since the secondary box boots with only its physical IP address, when pound starts if fails because it cannot bind to the virtual address. The virtual address is only assigned to the secondary box when fail-over occurs. But, pound has no way of knowing it needs to restart.
Any thoughts???
Thanks
This is what the sysctl change addresses – it allows services to bind to IP addresses that it does not own – obviously it will never receive any traffic on them, but it won’t fail.
There are two other things – this article is now pretty old; heartbeat should no longer be used, especially for new installations. You should use pacemaker with CRM now.
Secondly, the approach I took here is to allow pound to be running all the time even when it’s not active – this makes the failover mechanism simpler as no services need to be started/stopped, it just brings up the IP and everything is ready and waiting for it.
You can still work that way, but the ‘official’ way would be to get pacemaker to bring up the service at the same time as it moves the IP. That approach needs more CRM config, but means you don’t have the nonlocal binding issue.
I’d also recommend that you switch to haproxy instead of pound. haproxy is just great to work with, can cope with enormous amounts of traffic. The reason I originally used pound is my ISP set it up for us, but I switched to haproxy when building another cluster myself. The thing I really like about it is its monitoring and status display. Currently haproxy does not do SSL wrap/unwrap, but you can use it with stunnel which works fine, or use the ‘beta’ version of haproxy 1.5 which has it built in. It’s labelled as a beta, but the author is so conservative (in a good way!) that it’s really reliable already – I’ve had no trouble with it.