On 9 Feb 2012, at 8:47, Viktor Dukhovni wrote:
On Thu, Feb 09, 2012 at 01:15:52PM +0530, Ram wrote:
I am trying to validate email ids of subscribers coming to my site
Is there a standard regular expression for email id syntax that
confirms to rfc822.
I want to avoid junk entries from entering my database.
Postfix already checks this syntax in RCPT-TO , but is this regex
available already
Often it is a mistake to attempt to parse complex grammars with mere
regular expressions, some constructs are not handled by regexps in
full generality.
That's a bit more cautious and circumspect than I believe is necessary.
It has been conventional wisdom (how's that for a dodge...) for at least
20 years that there is no practical way to perfectly represent the
RFC822 address as a POSIX-compliant regex. I have no proof for that, but
it is illuminating that the very whiny page at
http://www.regular-expressions.info/email.html offers a short and very
sloppy regex that is very close to a POSIX regex with a slew of
justifications for its inaccuracies and a supposedly precise expression
of RFC822 syntax in 428 characters using extended (Perl-like) syntax
that can't be functionally translated to standard regex.
Shorter: Perfectly matching address syntax in a standard regex is
hopeless.
To solve the problem of people putting bad addresses into a web form, it
is a better goal to simply catch common errors and reject addresses you
don't want to have to deal with even if they are technically valid, such
as those using quoted strings, apostrophe, and backtick. It is fairly
common for web input sanitization schemes and mail software (e.g.
Mailman) to lead to breakage with 'strange' address syntax, so it is not
entirely wrong to reject some formally valid but technically unusable
addresses.