Domain name validation

I was revisiting the validation of domain names and realised that most of the regexes posted around the web have faults.

Many refer to Sean Inman’s 2006 post, which does a fair job but is prone to break as new TLDs are introduced. This answer on StackOverflow is about the best I’ve found so far: it enforces label and overall lengths; allowing multiple dashes means it works with punycoded domains; it’s generally permissive so won’t break as TLDs change, but there’s one case not handled. RFC2872 says that labels that are not used as hostnames (i.e. which do not map to an IP, for example in TXT or SRV records) may contain any printable ASCII character, so  `_,;:'”!@£~$` and friends are all up for inclusion. This is most commonly found in domainkeys, which use the `_domainkey` label. There’s a good article on the use of underscores in DNS.

This does relate to the validation of email addresses (which often contain domains), and the best page on that subject is this one, however, you can’t simply extract the domain part from that as domain names in general are a superset of what’s used in email.

It’s difficult to do this right because you can’t tell whether a label is a hostname or not, or where a hostname stops and a domain begins, and validity varies according to context: `_domainkey.example.com` is invalid in an A record, but valid in a TXT record. I can foresee a parameter to allow you to specify usage context to deal with this. It might be better to process the name backwards so that you have more context available as you encounter each label, for example if you processed `www.example.com` as `com.example.www`, you would stand a better chance of knowing whether www is a hostname or a domain name.

I’m mainly thinking out loud here, I don’t have a solution as yet!

Microsoft finally gets it

I somehow missed Microsoft’s announcement that (in a complete U-turn from previous announcements) IE8 will support web standards mode by default, and thus any broken sites will have to enable IE7 mode by a meta tag. So finally, IE will cease to be the albatross around the neck of the internet, and developers the world over will at last be able to write standards-compliant sites that work in all major browsers.

I had real trouble believing that MS had convinced so many prominent web standards advocates (here and here) that the previous option was in some way a good thing, when it essentially meant that MS expected 99% of the web to change in order to support the 1% (almost entirely intranets and thus of no public interest) that are so badly written that they couldn’t survive a browser update.

I’m very happy to see this change of heart, which was a really unexpected thing to see from MS. They don’t normally give a stuff about such things, so they fully deserve the adulation that their announcement is getting in the comments. It also vindicates the slagging I gave the authors of those articles promoting the evil meta tag!

So, Thank you Microsoft! I look forward to not having to do anything special for IE – you probably just doubled the world’s web development productivity rate! Who knows – one day IE might be as good as Firefox or Safari…

The Email Standards Project

The Email Standards Project is a worthy effort to try to get email clients to handle HTML email in a consistent way. Many already do pretty well, but there are some big exceptions: Outlook 2007 (with its ancient Word rendering engine), GMail, .Mac, Hotmail and others. Many are opposed to the whole idea of HTML email, but often their resentment is based on the fact that historically email client support has been so bad that they’ve had very poor experiences. Worse is that some senders (not us!) send HTML-only messages, which is certainly something that will drive a Mutt user potty. Smartmessages supports sending in plain, html and mixed formats (settable by each individual subscriber), and we ensure that our users get a clean, reliable platform for delivering their creations, so we try to work around the deficiencies of things like MS Exchange.

Generally the poor support in big-name clients has led to a need to develop HTML for email for very much the lowest common denominator, which for the most part means no CSS (unless you’re prepared to tiptoe through the minefields of using it), no images, no scripts, no forms, no attachments. Too many designers think of email as being just like the web, but it’s not – the vast majority of web pages will simply not work as email. These days the only effective way of designing for email is to start out with classic HTML 4.0 with no CSS or images and make your message look good using only type, white space and colour, because this is probably all that 90% (yes, really that much) of your recipients are going to see. You can then sprinkle a few images in for enhancement, but you should have no text in images that is not shown as text. With the advent of Outlook 2007’s big step backwards, it’s no longer possible to use background images, so you can’t have text over images at all. You also can’t rely on alt attributes as image fallbacks, as some big clients don’t display the alt text if images are being suppressed as an anti-spam measure.

Many designers get very uppity about this kind of thing as it means that their palette of options is severely constrained, however, it should really be regarded as a challenge. It’s not too hard to make stuff look good with heavy use of images (see CSS Zen Garden for gorgeous examples), but producing stuff that looks good with no images or CSS (or more to the point to still look good when those parts have been ripped out) takes a great deal of skill, experience and appreciation of the medium.

Any effort to try and raise the bar gets our support, so props to the Email Standards Project and to Freshview for starting it.