On Mon, Dec 18, 2017 at 5:43 PM, Random832 <random...@fastmail.com> wrote: > On Sun, Dec 17, 2017, at 10:46, Chris Angelico wrote: >> But if you're trying to *validate* an email address - for instance, if >> you receive a form submission and want to know if there was an email >> address included - then my recommendation is simply DON'T. You can't >> get all the edge cases right; it is actually impossible for a regex to >> perfectly match every valid email address and no invalid addresses. > > That's not actually true (the thing that notoriously can't be matched in > a regex, RFC822 "address", is basically most of the syntax of the To: > header - the part that is *the address* as we speak of it normally is > "addr-spec" and is in fact a regular language, though a regex to match > it goes on for a few hundred characters.
Hmm, is that true? I was under the impression that the quoting rules were impossible to match with a regex. Or maybe it's just that they're impossible to match with a *standard* regex, but the extended implementations (including Python's, possibly) are able to match them? Anyhow, it is FAR from simple; and also, for the purpose of "detect email addresses in text documents", not desirable. Same as with URL detection - it's better to have a handful of weird cases that don't autolink correctly than to mis-detect any address that's at the end of a sentence, for instance. For that purpose, it's better to ignore the RFC and just craft a regex that matches *common* email address formats. ChrisA -- https://mail.python.org/mailman/listinfo/python-list