Subversion 1.7 to 1.6 downgrade with MacPorts

Wednesday, November 2. 2011

MacPorts told me that there had been a subversion update (1.7.1), which I went ahead and installed. Woo! Huge speed improvements for everything I tried with the CLI client, great stuff. A short time later my IDE (PHPStorm) fell over screaming. It doesn't like 1.7 yet, and it's a bit stuck until SVNKit supports it. I should have checked really. So how to downgrade? Fortunately this post makes it very easy. So I just did:

sudo port deactivate subversion @1.7.0_1
sudo port activate subversion @1.6.17_1

But now I'm stuck with a working copy in 1.7 format with uncommitted changes, and there is no tool to convert it back to 1.6 format. This is easily worked around; check out a new working copy (using svn 1.6) and sync across the changes, ignoring the .svn folders, like this:

rsync -av --update --exclude=".svn/***" ~/Sites/myproject1.7/ ~/Sites/myproject1.6

All happy now.

PHP Base-62 encoding

Wednesday, August 10. 2011

There's a really horrible bug (they won't call it that, but I can't think of any use case for the default broken behaviour!) in Apache's mod_rewrite that means that urlencoded inputs in rewrites get unescaped in their transformation to output patterns. The underlying 'bug' remains unfixed even in 2.3, though a workaround in the form of the 'B' flag first appeared in Apache 2.2.7, but was broken until 2.2.12 (which wasn't all that long ago). Put it like this: if you're not using the B flag in your mod_rewrite rules, your site is probably only working due to blind luck.

With that in mind, several years ago I spent ages looking for a base-62 encoder/decoder for PHP to replace mod_rewrite's broken urlencoding handling. Nobody seemed to have the slightest interest in writing one. Base-62 is interesting as it can be made safe for use in URLs, DNS, email addresses and pathnames, unlike any available encoding of base-64, as it only includes [0-9A-Za-z]. As a workaround for the above bug, I was interested in base-62 encoding URLs for embedding in redirects. At the time I wrote something using bc_math, but it was very slow (and weirdly got ripped off by some dickhead and passed off as his own, despite that fact that I said it was crap!). I eventually gave up on that and switched to base-64, which led to occasional URL corruption. If you include hashes in URLs, keeping them in the default hex representation is quite wasteful, and can contribute to issues with line length in email. Having hashes in base-62 is a nice way of reducing their size.

There are a few posts on base-62 in PHP, notably this one and this one, but they make the assumption that you're talking about a numeric value, and while a hash is a numeric value, it's way too big for PHP to handle as an integer. Others take the multiprecision artithmetic route, which treats the input binary as a single very large, and calculates its representation in another base; that works, but it's horribly slow.

Since then, the gmp and bc_math extensions were improved in PHP 5.3.2, and now they handle (usefully) up to base-62. So here's a simple function for getting a hash in base-62:

function base62hash($source) {
        return gmp_strval(gmp_init(md5($source), 16), 62);
}

and for converting to and from base-16 hashes:

function hash16to62($hash) {
        return gmp_strval(gmp_init($hash, 16), 62);
}

function hash62to16($hash) {
        return gmp_strval(gmp_init($hash, 62), 16);
}

I could still use a proper base-62 encoder for longer arbitrary strings, but at least now it should be simpler to write something iterative now that these extensions have (ahem) their bases covered.

Update: I've written a sufficiently usable PHP base-62 encoder for arbitrary-length binary strings that's not too slow. You can find it on github in this gist. Let me know if you find it useful

Incidentally I discovered that the gmp functions use [0-9a-f] up to base 16, but [0-9A-Za-z] (i.e. upper case first) from bases 17 to 62. This differs from most of the base-62 implementations I've found that tend to use lower case first.

This is all slightly academic now as the apache B-flag workaround works, so standard urlencoding works properly and I don't need to use a different encoding any more, however, there were so many examples of slow encoders, I thought the world could do with a usable one.

Update Something else worth mentioning is that if you use the apache B flag, you most likely need to turn the AllowEncodedSlashes directive on too, as otherwise you'll get mysterious 404s. I posted a bug report against the apache docs to make this clearer.

Update Apache used my rewrite of the B-flag docs, yay!

MySQL backups with Percona's XtraBackup

Friday, September 11. 2009

MySQL backup is sometimes very hard to do effectively. MySQL provides various options for backup, but many of them are simply unsuitable for large systems, particularly if they need to remain active during backups. Percona's XtraBackup is an open-source clone of InnoBase's InnoDB Hot Backup utility. So what makes XtraBackup a better solution, and how does it work?

Update: on December 10th 2009, Percona released Xtrabackup 1.0.


Continue reading "MySQL backups with Percona's XtraBackup"

Google's charting API has been around for quite a while now, but I've only just needed to actually look at it. It became immediately obvious that I needed a PHP encoding function, so off to google I went. Though I found several implementations, they were all incomplete or deficient in one way or another (and it didn't help that there was an error in google's extended encoding docs), so I've written my own based on several different ones. Both simple and extended encoders support automatic scaling, inflated maximum and lower-bound truncation, so you can pretty much stuff whatever data you like in, with no particular regard for pre-scaling and you'll get a usable result out. They have an identical interface, so you can use either encoding interchangeably according to the output resolution you need (contrary to popular belief, the encoding to use has very little to do with the range of values you need to graph). By default, the full range of possible values is used as it just seems silly not to. I deliberately omit the 's:' and 'e:' prefixes so that you can call these functions for multiple data series, and I include a function that does just that. You still need to generate your own URLs and other formatting, but that's a different problem. Read on for the code...

Continue reading "Google Charts API Simple and Extended Encoders in PHP"

I've just had a slightly tricky time upgrading a subversion repository on sourceforge. They have recently added support for subversion 1.5 at the server end. 1.5 brings major new features for merging, but as it's not backward compatible with older subversion clients, the upgrade is not done automatically. SF have also done a major rearrangement of their documentation while transferring everything to Trac, and it's not always easy to get the right info. Normally to upgrade a subversion repo, you just run the 'svnadmin upgrade /path/to/repo', however, it's not quite so simple on sourceforge as you don't have direct access to the repo, and the instructions they give are slightly wrong at the time of writing. You're likely to get an error like this (it's not obvious that this is a fatal error) when you reload a dump file:
svnadmin: File already exists: filesystem '/svnroot/projectname/db', transaction '443-0', path 'tags' \* adding path : tags ...
This is because load is intended to add files to an existing repo, not to replace those that are already there, so you need to wipe the repo and start from scratch. So, here is a working command sequence that needs to be run from a project login shell on sourceforge (it applies to the project you're logged in through, substitute your project's name for projectname):
adminrepo --checkout svn svnadmin dump /svnroot/projectname > svn.dump rm -rf /svnroot/projectname/\* svnadmin create /svnroot/projectname svnadmin load /svnroot/projectname < svn.dump adminrepo --save svn
Yes, you do need to delete the whole thing and re-import it, but it's quick and easy, and you have a backup in the dump file you take at the start. After the upgrade, make sure you get a new checkout of your project to ensure that you're using 1.5 all the way through. Now you'll find that commands like 'svn merge --reintegrate' work.
When you're developing web stuff, working with projects in path names (i.e. not at the top level of a domain) can be difficult (gets in the way of absolute links, rewrite rules etc), so you often need to set up a local apache virtual host, stick an entry in DNS and create an SSL certificate before you can get on with the serious business of doing some real work. This can get to be a drag when you do it a lot, but there is an extremely elegant solution that means you'll never have to do it again...

Continue reading "The web developer's holy vhost trinity"

Someone at AMEE pointed out to me that there's been a flurry of activity around so-called "Web Hooks" when I referred to the concept. This is quite heartening as I thought of this a couple of years ago and implemented this in Smartmessages early last year! I call them callbacks, but the idea is the same - it's essentially a distributed observer pattern. I couldn't figure out why nobody seemed to understand what I was on about... When I get some interesting event (e.g. a message open, mailshot completion, clickthrough etc), I ping a user-supplied URL with the appropriate event data, pretty much the one-liner that Jeff alludes to. The reason we do it is that sync with external systems (usually CRM) is something that were always running into, and there seems to be no sensible, generic way of dealing with it other than this, so I'm surprised it has not been discussed in this context before. There's one downside as far as I can see - it is highly dependent on the receiver to be able to handle the event in a timely fashion. This isn't an issue if you're connecting say, Yahoo! to Google, but it could be a big problem if you connect Google to your Wordpress blog... My experience of CRM systems is that they are simply too slow to cope with the high rates of traffic that we are likely to generate, for example, if we point a stream of ~200 events per second at a CRM system, it will probably just bog down and fail (I'm thinking of the SalesForce API here which typically takes 1-2 sec to deal with a single SOAP API call). Retrying will only make this worse. I have two solutions for this: limit events to those that don't happen so often (kind of lame!), or alternatively, use an outbound message queue to rate-limit the sending (Amazon SQS and Memcacheq spring to mind). Queueing works, but you lose some of the real-time aspect. Ideally clients would implement their own incoming queue in order to allow them to process events at their leisure, but this is mostly beyond the vast majority of web authors (or at least those that host the CRM systems that we hear from!). Anyway, it's nice to know that I'm not completely barking...

PHP London Conference 2009

Monday, March 2. 2009

From this truly excellent conference, I took away some good memories, some new ideas and a nasty bout of conference flu. There's nothing quite like being in close proximity to a few hundred people to really spread things around... Highlight for me was Aral Balkan's keynote. It's always nice to see someone showing plain enthusiasm, and I couldn't agree more with him about the "lost magic" of computing. Had a chat with him afterwards about AMEE and other things. He also seems to have put together some odd but dull things that I had noticed a need for - EU VAT codings and ISO language references as web services! I didn't really enjoy David Soria Parra's talk on sharding. It all came across as very negative and many of the ways of doing it and coping with the fallout were not really discussed. No mention of MySQL 5.1's partitioning (which is limited, but is at least a start), or more radical approaches like Sequoia. David Axmark's talk on Drizzle was more interesting than I expected, nice to see effort being put into this direction. Microsoft really does seem to be trying a bit harder these days - their CSS test suite for IE8 is very welcome, and the effort they are putting into PHP, apache and other projects benefits many people. It has to be said that while it's not a mainstream product, Surface is really pretty cool to play with. Chris Shiflett's talk was excellent too; his demos and examples were particularly good, and entertaining. The post-conference social was great fun, I met lots of nice new people. After our move to France I suspect it will be harder to get to events like this, so I should make the most of them while I can! I've had several ideas for talks that I'd like to do (I get sick of email sometimes!), so I guess I need to get a bit more proactive on actually submitting them to a call for papers.
(Page 1 of 5, totaling 33 entries) » next page