Apache mod_rewrite bug still lurks

There’s this enormous apache mod_rewrite bug that I ran into back in 2005, and to my dismay, it’s still there. Long-standing bugs are usually small edge cases that don’t affect many people, but this one is a monster that I suspects pretty much everyone that’s using mod_rewrite, and they’ve just been lucky in avoiding it. The basic issue is this: if you match params that require URL encoding to be safe, mod_rewrite will not rewrite the back-reference (that’s $1 below) correctly. So take this very simple redirect:

RewriteRule ^(.*)$ index.php?show=$1 [R]

so you hit http://www.example.com/a%2Fb and mod_rewrite neatly rewrites it as http://www.example.com/index.php?show=a/b! Notice that it has urldecoded the matched parameter in the replacement string.

Apache 2.2 introduced a new B flag to deal with this, but that apparently suffers the same problem! There are two workarounds I’ve used that are both horrible: double-encode the source string (if you are in control of both start and end points of the URLs) to survive the spurious urldecode, or base-64 encode (javascript flavour) it and do the decoding yourself. I did warn you they were horrible.

I’ll bet that there are a zillion mod_rewrites out there that suffer from this fundamental problem and haven’t even noticed. If a few people voted for this to be fixed, it would probably go away…