On 9 Feb 2012, at 8:47, Viktor Dukhovni wrote:

On Thu, Feb 09, 2012 at 01:15:52PM +0530, Ram wrote:

I am trying to validate email ids of subscribers coming to my site
Is there a standard  regular expression for email id syntax   that
confirms to rfc822.

I want to avoid junk entries from entering my database.

Postfix already checks this syntax in RCPT-TO , but is this regex
available already

Often it is a mistake to attempt to parse complex grammars with mere
regular expressions, some constructs are not handled by regexps in
full generality.

That's a bit more cautious and circumspect than I believe is necessary. It has been conventional wisdom (how's that for a dodge...) for at least 20 years that there is no practical way to perfectly represent the RFC822 address as a POSIX-compliant regex. I have no proof for that, but it is illuminating that the very whiny page at http://www.regular-expressions.info/email.html offers a short and very sloppy regex that is very close to a POSIX regex with a slew of justifications for its inaccuracies and a supposedly precise expression of RFC822 syntax in 428 characters using extended (Perl-like) syntax that can't be functionally translated to standard regex.

Shorter: Perfectly matching address syntax in a standard regex is hopeless.

To solve the problem of people putting bad addresses into a web form, it is a better goal to simply catch common errors and reject addresses you don't want to have to deal with even if they are technically valid, such as those using quoted strings, apostrophe, and backtick. It is fairly common for web input sanitization schemes and mail software (e.g. Mailman) to lead to breakage with 'strange' address syntax, so it is not entirely wrong to reject some formally valid but technically unusable addresses.

Reply via email to