Regular expression for email addresses (replacing the Visual Studio 2008 default).
Phil Gilmore
I recently had a user report that they couldn't register on my client's site. The site's registration page reported that their email address was invalid. The email address was indeed strange but valid nonetheless. We were using a regular expression validator control to check the email address on registration. The problem is that the regular expression that Visual Studio designer put in there for email addresses doesn't recognize it.
Here is the regular expression that the Visual Studio 2008 designer conveniently put into the page on our behalf.
\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
This has the following problems:
- Rules only apply at a single point in the input. Any character can preceed or follow any valid email address including spaces, punctuation characters, periods, @ symbols, etc. For example, the input ~!@#$%^&*()_+=-[]{}';":/.,?>< nobody@nowhere.com~! @#$%^&*()_+=-[]{}';":/.,?>< would pass this validation.
- User name may contain multiple consecutive periods. For example, the input nobody..lives........here@nowhere.com would pass this validation.
- Domain name may contain multiple consecutive periods after the first single period. For example, the input nobody@nowhere.co....uk would pass this validation.
- Many valid email addresses will not pass this validation. For example, the input nobody-@nowhere.com would fail.
Obviously, there are some flaws here. If you are familiar with regular expressions, you can see immediately that it's missing the ^ and $ restrictions, for example. Rather than try to massage this one into compliance, I started with a new one. There are probably a million of these on the web and no doubt some are better than mine. But I thought I'd blog it anyway since it's done and working. Here is what I came up with.
^([\w-_]+\.)*[\w-_]+@([\w-_]+\.)*[\w-_]+\.[\w-_]+$
This regular expression has the following attributes:
- User name must be one character or more.
- User name may contain one ore more periods.
- User name must not begin or end with a period.
- Double contiguous periods are not allowed in the user name.
- User name must only contain characters a-z, A-Z, 0-9, hyphens, underscores and single periods.
- Domain name must be three characters or more.
- Domain name must contain one or more periods.
- Domain name may not begin or end with a period.
- Double contiguous periods are not allowed in the domain name.
- User name must only contain characters a-z, A-Z, 0-9, hyphens, underscores and single periods.
- And single @ symbol is required between the user name and the domain name.
- Allowed special characters (hyphen and underscore) are permitted at any frequency anywhere in the user name or domain name.
Of course, you may choose to add more allowed special characters. Although I've never seen one in an email address, you may need to allow the plus (+) character, for example. This is easy. Just change all instances of \w to \w\+ and it will permit it.
I tested this regular expression against all the email addresses in the client's user database (about 5500 addresses). All email addresses that failed were either obviously invalid or outright blank (imported from another process, never validated against a regular expression). Out of 5500 email addresses, only 9 failed which weren't blank and they were all invalid.
Phil Gilmore (www.interactiveasp.net)