On Mon, Dec 19, 2011 at 7:20 PM, Wim Lewis <[email protected]> wrote: > > On 19 Dec 2011, at 4:50 PM, Nick Fitzsimons wrote: >> Strictly speaking that regex cannot determine that an email address is >> well-formed per the RFC as the grammar defining the form of email addresses >> is a Type 2 Chomsky Grammar and regular expressions are limited to Type 3 >> Chomsky Grammars. > > That's true of the whole address line, but is it true of what people usually > want to validate in an email-address field (the "addr-spec" production from > rfc2822, without any comments)? I think that language is regular, unless > there's a recursive rule hidden in the obsolete forms part of the grammar. > The comment syntax is type-2, of course, because it requires balanced > parentheses. > > (And of course, regexes aren't even regular expressions aka type-3 grammars > any more--- but not enough more to parse balanced strings, I think.)
I think that it's regular, as there isn't any nesting in the "addr-spec" rules. In fact, to write an FSM just for the local part I think you would only need four states: start-of-string, normal, quoted-string, and start-of-quoted-pair. The domain part is similar. The complexity mostly lies in what characters are accepted in each state. Cheers, Ian -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
