On 04/02/2010 06:25 PM, Jared Williams wrote:
>  
> 
>> -----Original Message-----
>> From: Rasmus Lerdorf [mailto:ras...@lerdorf.com] 
>> Sent: 03 April 2010 01:20
>> To: Jared Williams
>> Cc: internals@lists.php.net
>> Subject: Re: [PHP-DEV] Re: [PHP-CVS] svn: /php/php-src/ 
>> branches/PHP_5_2/NEWS 
>> branches/PHP_5_2/ext/filter/logical_filters.c 
>> branches/PHP_5_3/NEWS 
>> branches/PHP_5_3/ext/filter/logical_filters.c 
>> trunk/ext/filter/logical_filters.c
>>
>> On 04/02/2010 04:47 PM, Jared Williams wrote:
>>> Would make sense. Especially considering HTML5's current
> validation 
>>> rules of emails is something different again.
>>>
>>>
>>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-
>>> the-type-attribute.html#e-mail-state
>>>
>>> Having a mismatch in validation between client & server 
>> just a recipe 
>>> for user frustration.
>>
>> I actually think this regex is really close to the HTML5 
>> specification.
>>  The main thing it drops are comments and folded whitespace, 
>> both of which are not supported in this regex either.
>> That means addresses like the following will all fail even 
>> though they are technically valid:
>>
>> test
>> b...@example.com
>>
>> (with a carriage return after test there)
>>
>> (hey)rasmus(there)@(go)php.net(woo)
>>
>> rasmus(Hey
>> guess what
>> this is a "valid"
>> email address)
>> @php.net
>>
>> rasmus."ras...@php.net"@php.net
>>
>> As far as I am concerned I am perfectly ok with rejecting 
>> addresses like these and I think we should just stick to the 
>> HTML5 definition.
>>
>> The ABNF for an HTML5 valid email field is:
>>
>>   1*( atext / "." ) "@" ldh-str 1*( "." ldh-str )
>>
>> which means there must be a . in the domain part, so HTML5 
>> doesn't think a...@b is valid either.  The left-hand side looks 
>> wrong though.  It seems to me it should be:
>>
>>   1*atext *("." 1*atext)
>>
>> You can't have a trailing . there.  rasm...@php.net is not 
>> valid and if I am reading that HTML5 ABNF correctly it would 
>> seem to allow that.
>>
> 
> Interesting, missed the point of the . when initially looked at this, 
> Here's the regexp current in webkit 
> 
> 38    static const char emailPattern[] =
> 39        "[a-z0-9!#$%&'*+/=?^_`{|}~.-]+" // local part
> 40        "@"
> 41        "[a-z0-9-]+(\\.[a-z0-9-]+)+"; // domain part
> 
> (http://trac.webkit.org/browser/trunk/WebCore/html/ValidityState.cpp)

I am all for disallowing esoteric otherwise valid addresses, but this
trivial regex will allow all sorts of completely invalid addresses that
will never actually route.  Some examples of invalid addresses that
passes that regex:

ras...@php.123
ras...@-php.net
ras...@php-.net
ras...@php.net-
.ras...@php.net
rasm...@php.net
rasmus..lerd...@php.net
....@php.net
ras...@128.128.128.128

That last one needs to be ras...@[128.128.128.128] to be valid.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to