On Mon, Dec 19, 2011 at 7:20 PM, Wim Lewis <[email protected]> wrote:
>
> On 19 Dec 2011, at 4:50 PM, Nick Fitzsimons wrote:
>> Strictly speaking that regex cannot determine that an email address is 
>> well-formed per the RFC as the grammar defining the form of email addresses 
>> is a Type 2 Chomsky Grammar and regular expressions are limited to Type 3 
>> Chomsky Grammars.
>
> That's true of the whole address line, but is it true of what people usually 
> want to validate in an email-address field (the "addr-spec" production from 
> rfc2822, without any comments)? I think that language is regular, unless 
> there's a recursive rule hidden in the obsolete forms part of the grammar. 
> The comment syntax is type-2, of course, because it requires balanced 
> parentheses.
>
> (And of course, regexes aren't even regular expressions aka type-3 grammars 
> any more--- but not enough more to parse balanced strings, I think.)

I think that it's regular, as there isn't any nesting in the
"addr-spec" rules.  In fact, to write an FSM just for the local part I
think you would only need four states: start-of-string, normal,
quoted-string, and start-of-quoted-pair.  The domain part is similar.
The complexity mostly lies in what characters are accepted in each
state.

Cheers,
Ian

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to