When you’re developing web stuff, working with projects in path names (i.e. not at the top level of a domain) can be difficult (gets in the way of absolute links, rewrite rules etc), so you often need to set up a local apache virtual host, stick an entry in DNS and create an SSL certificate before you can get on with the serious business of doing some real work. This can get to be a drag when you do it a lot, but there is an extremely elegant solution that means you’ll never have to do it again…
Continue reading “The web developer’s holy vhost trinity”
Apache mod_rewrite bug still lurks
There’s this enormous apache mod_rewrite bug that I ran into back in 2005, and to my dismay, it’s still there. Long-standing bugs are usually small edge cases that don’t affect many people, but this one is a monster that I suspects pretty much everyone that’s using mod_rewrite, and they’ve just been lucky in avoiding it. The basic issue is this: if you match params that require URL encoding to be safe, mod_rewrite will not rewrite the back-reference (that’s $1 below) correctly. So take this very simple redirect:
RewriteRule ^(.*)$ index.php?show=$1 [R]
so you hit http://www.example.com/a%2Fb and mod_rewrite neatly rewrites it as http://www.example.com/index.php?show=a/b! Notice that it has urldecoded the matched parameter in the replacement string.
Apache 2.2 introduced a new B flag to deal with this, but that apparently suffers the same problem! There are two workarounds I’ve used that are both horrible: double-encode the source string (if you are in control of both start and end points of the URLs) to survive the spurious urldecode, or base-64 encode (javascript flavour) it and do the decoding yourself. I did warn you they were horrible.
I’ll bet that there are a zillion mod_rewrites out there that suffer from this fundamental problem and haven’t even noticed. If a few people voted for this to be fixed, it would probably go away…
Does no-one use mod_rewrite?
I can’t quite believe this bug really exists. It’s so fundamental to so many applications, principally redirects which are just used everywhere.
Beware of the MultiView
I’ve been seeing some very confusing behaviour involving mod_rewrite and PHP.
I have this rewrite rule in a .htaccess file (it just allows me to see what the incoming URL looked like for test purposes):
RewriteRule (*.) x.php?x=$1 [R,L]
If I feed it a URL like:
http://www.example.com/thing/123
it matches the whole thing/123 part and maps it to my desired URL:
http://www.example.com/x.php?x=thing/123
That’s all fine. Now the weird bit. If I happen to have a script with the same base name as the matching path part, like “thing.php”, it gets magically picked up and inserted into the URL, so I end up with:
http://www.example.com/x.php?x=thing.php/123
Huh?! How did the ‘.php’ get in there?
Now because this rule is in a .htaccess file, it’s handled last in the chain of things that apache might do (and there are no other rewrite rules), it must be something further upstream that’s mapping ‘thing’ to ‘thing.php’. My first idea was that it might be looking inside thing.php (using mime_magic) and mapping its type, and file extension, according to the AddType directive that PHP is enabled by. However, turning off mime magic doesn’t stop it happening, so it’s not that.
It DOES stop doing it if I disable PHP – but why should PHP be involved in this at all? The final URL will eventually hit PHP, but because in this case I’m using [R] (which forces an external redirect) in the rewrite rule, PHP won’t see that until the request returns from the browser.
Is there some automatic aliasing thing that suggests that files without extensions might be some other kind of file? Well, that sounds like a fair description of the kind of thing that Apache’s MultiViews do. Surprise surprise – I turn MultiViews off and it all starts acting normally. As yet I’ve not figured out how to stop it a little more selectively (as MultiViews are a nice feature otherwise), but I can live without them for now.