On Aug 7, 2009, at 8:13 AM, Carl wrote:

>
> This is an excellent article on the traps to beware of when regex'ing
> email address formats
>
> http://www.regular-expressions.info/email.html
>
> This may ignite a debate though :)

A discussion, maybe. In the abstract, I like the idea of verifying the  
RFC verbatim, but we *should* be clear on what we're trying to do.  
Guard against typos? Prevent some kind of attack? How much do we care  
about false positives?

The article objects (to RFC-style checking) that j...@aol.com.nospam,  
for example, will validate. I'm not too concerned about that, in that  
there are lots of ways that a user can enter a wrong but  
(syntactically) valid address. We deal with that through active  
validation, not a syntax check.

Might there be a security concern? The quoted variation of the RFC  
checker is very permissive:

        "([^"\r\\]|\\["\r\\])*"

Could that open the door to some kind of injection attack? Presumably  
we sanitize it for display; how about when we actually use it to send  
mail? Any consumer that doesn't understand quoted names could end up  
very confused.

I take false positives as a v. bad thing: if a user enters a real and  
valid address, I do not want to reject it. So I don't much like the  
explicit list of TLDs (below), on the grounds that it's bound to  
expand, and at some point it'll break. From the Wikipedia TLD article:

> During the 32nd International Public ICANN Meeting in Paris in 2008,  
> ICANN started a new process of TLD naming policy to take a  
> "significant step forward on the introduction of new generic top- 
> level domains." This program envisions the availability of many new  
> or already proposed domains, as well a new application and  
> implementation process. Observers believed that the new rules could  
> result in hundreds of new gTLDs to be registered. Proposed TLDs  
> include music, berlin and nyc.

I think I'd favor the RFC-style pattern without the quoted-name  
alternation.

One thing we could do is to give the developer an option:  
IS_EMAIL(something or other) that lets them select one of a small  
number of regexes. And of course the developer can always use IS_MATCH  
if they don't like our choice of email filters.

If we permitted a choice, I'd suggest:

        1. default to the RFC regex, but without quoted names
        2. RFC including quoted names
        3. something like the pattern below, including the TLD filter (maybe)


>
> I favour this variation...
> [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-
> z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|
> info|mobi|name|aero|jobs|museum)\b
>
> C
>
>
> On Aug 7, 8:25 am, Jonathan Lundell <jlund...@pobox.com> wrote:
>> On Aug 7, 2009, at 12:22 AM, mdipierro wrote:
>>
>>
>>
>>> I will take a patch for this.
>>
>> If nobody else gets to it first, I'll work up a patch over the  
>> weekend.
>>
>>
>>
>>
>>
>>> Massimo
>>
>>> On Aug 7, 1:33 am, Jonathan Lundell <jlund...@pobox.com> wrote:
>>>> On Aug 6, 2009, at 9:32 PM, DenesL wrote:
>>
>>>>> IS_EMAIL does not follow the RFC specs for valid email addresses
>>>>> (seehttp://en.wikipedia.org/wiki/E-mail_address)
>>
>>>>> even a simple a...@b.com fails
>>
>>>>> it is kinda late to work on the regex now, maybe tomorrow.
>>
>>>> The RFC is fairly hard to validate. If that's what we really  
>>>> want, I
>>>> found this one on the web that looks about right:
>>
>>>> ^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?...@[a-
>>>> z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$
>>
>>>> It assumes the case-insensitive flag.
>>
>>>> http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-
>>>> email...
>>
>>>> Overkill? Or, what the heck?



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to