Web Hooks, Callbacks and Distributed Observers

Someone at AMEE pointed out to me that there’s been a flurry of activity around so-called “Web Hooks” when I referred to the concept. This is quite heartening as I thought of this a couple of years ago and implemented this in Smartmessages early last year! I call them callbacks, but the idea is the same – it’s essentially a distributed observer pattern. I couldn’t figure out why nobody seemed to understand what I was on about… When I get some interesting event (e.g. a message open, mailshot completion, clickthrough etc), I ping a user-supplied URL with the appropriate event data, pretty much the one-liner that Jeff alludes to. The reason we do it is that sync with external systems (usually CRM) is something that were always running into, and there seems to be no sensible, generic way of dealing with it other than this, so I’m surprised it has not been discussed in this context before.
There’s one downside as far as I can see – it is highly dependent on the receiver to be able to handle the event in a timely fashion. This isn’t an issue if you’re connecting say, Yahoo! to Google, but it could be a big problem if you connect Google to your WordPress blog… My experience of CRM systems is that they are simply too slow to cope with the high rates of traffic that we are likely to generate, for example, if we point a stream of ~200 events per second at a CRM system, it will probably just bog down and fail (I’m thinking of the SalesForce API here which typically takes 1-2 sec to deal with a single SOAP API call). Retrying will only make this worse. I have two solutions for this: limit events to those that don’t happen so often (kind of lame!), or alternatively, use an outbound message queue to rate-limit the sending (Amazon SQS and Memcacheq spring to mind). Queueing works, but you lose some of the real-time aspect. Ideally clients would implement their own incoming queue in order to allow them to process events at their leisure, but this is mostly beyond the vast majority of web authors (or at least those that host the CRM systems that we hear from!).
Anyway, it’s nice to know that I’m not completely barking…

Scalable irony

This article on highscalability.com is a really excellent rundown of some of the options available for scaling a site to the heights of Digg. Ironically enough, at the time of writing the highscalabilitycom web server shows this error:

Unable to connect to database server
The MySQL error was: User highscal_admin already has more than ‘max_user_connections’ active connections.

I can point them at this really good article on how to avoid problems like this… oh, wait…