On 9 Feb 2012, at 8:47, Viktor Dukhovni wrote:

On Thu, Feb 09, 2012 at 01:15:52PM +0530, Ram wrote:
I am trying to validate email ids of subscribers coming to my site
Is there a standard  regular expression for email id syntax   that
confirms to rfc822.

I want to avoid junk entries from entering my database.

Postfix already checks this syntax in RCPT-TO , but is this regex
available already
Often it is a mistake to attempt to parse complex grammars with mere
regular expressions, some constructs are not handled by regexps in
full generality.
That's a bit more cautious and circumspect than I believe is necessary. 
It has been conventional wisdom (how's that for a dodge...) for at least 
20 years that there is no practical way to perfectly represent the 
RFC822 address as a POSIX-compliant regex. I have no proof for that, but 
it is illuminating that the very whiny page at 
http://www.regular-expressions.info/email.html offers a short and very 
sloppy regex that is very close to a POSIX regex with a slew of 
justifications for its inaccuracies and a supposedly precise expression 
of RFC822 syntax in 428 characters using extended (Perl-like) syntax 
that can't be functionally translated to standard regex.
Shorter: Perfectly matching address syntax in a standard regex is 
hopeless.
To solve the problem of people putting bad addresses into a web form, it 
is a better goal to simply catch common errors and reject addresses you 
don't want to have to deal with even if they are technically valid, such 
as those using quoted strings, apostrophe, and backtick. It is fairly 
common for web input sanitization schemes and mail software (e.g. 
Mailman) to lead to breakage with 'strange' address syntax, so it is not 
entirely wrong to reject some formally valid but technically unusable 
addresses.


Reply via email to