On Aug 7, 2009, at 10:04 AM, Yarko Tymciurak wrote: > Whoever makes up this patch, since this is complicated enough, > can I ask you follow the commented regex style (re.X) > which is now used to validate paths; > > see example starting on line 74 of main.py: > http://bazaar.launchpad.net/~mdipierro/web2py/devel/annotate/head%3A/gluon/main.py
That's my plan (I'm the one who did the main.py re.X patch). > > Thanks, > - Yarko > > On Fri, Aug 7, 2009 at 10:56 AM, Carl <carl.ro...@gmail.com> wrote: > > You've convinced me that staying close to RFC is a "best choice" even > though we lose the opportunity for users to correct addresses at the > point of data entry. > > nb the suggested regex in my last posting doesn't work well enough! > e.g., a...@domain.co.uk isn't matched > > C > > > > On Aug 7, 4:48 pm, Jonathan Lundell <jlund...@pobox.com> wrote: > > On Aug 7, 2009, at 8:13 AM, Carl wrote: > > > > > > > > > This is an excellent article on the traps to beware of when > regex'ing > > > email address formats > > > > >http://www.regular-expressions.info/email.html > > > > > This may ignite a debate though :) > > > > A discussion, maybe. In the abstract, I like the idea of verifying > the > > RFC verbatim, but we *should* be clear on what we're trying to do. > > Guard against typos? Prevent some kind of attack? How much do we > care > > about false positives? > > > > The article objects (to RFC-style checking) that > j...@aol.com.nospam, > > for example, will validate. I'm not too concerned about that, in > that > > there are lots of ways that a user can enter a wrong but > > (syntactically) valid address. We deal with that through active > > validation, not a syntax check. > > > > Might there be a security concern? The quoted variation of the RFC > > checker is very permissive: > > > > "([^"\r\\]|\\["\r\\])*" > > > > Could that open the door to some kind of injection attack? > Presumably > > we sanitize it for display; how about when we actually use it to > send > > mail? Any consumer that doesn't understand quoted names could end up > > very confused. > > > > I take false positives as a v. bad thing: if a user enters a real > and > > valid address, I do not want to reject it. So I don't much like the > > explicit list of TLDs (below), on the grounds that it's bound to > > expand, and at some point it'll break. From the Wikipedia TLD > article: > > > > > During the 32nd International Public ICANN Meeting in Paris in > 2008, > > > ICANN started a new process of TLD naming policy to take a > > > "significant step forward on the introduction of new generic top- > > > level domains." This program envisions the availability of many > new > > > or already proposed domains, as well a new application and > > > implementation process. Observers believed that the new rules > could > > > result in hundreds of new gTLDs to be registered. Proposed TLDs > > > include music, berlin and nyc. > > > > I think I'd favor the RFC-style pattern without the quoted-name > > alternation. > > > > One thing we could do is to give the developer an option: > > IS_EMAIL(something or other) that lets them select one of a small > > number of regexes. And of course the developer can always use > IS_MATCH > > if they don't like our choice of email filters. > > > > If we permitted a choice, I'd suggest: > > > > 1. default to the RFC regex, but without quoted names > > 2. RFC including quoted names > > 3. something like the pattern below, including the TLD > filter (maybe) > > > > > > > > > > > > > I favour this variation... > > > [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-] > +)*@(?:[a- > > > z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil| > biz| > > > info|mobi|name|aero|jobs|museum)\b > > > > > C > > > > > On Aug 7, 8:25 am, Jonathan Lundell <jlund...@pobox.com> wrote: > > >> On Aug 7, 2009, at 12:22 AM, mdipierro wrote: > > > > >>> I will take a patch for this. > > > > >> If nobody else gets to it first, I'll work up a patch over the > > >> weekend. > > > > >>> Massimo > > > > >>> On Aug 7, 1:33 am, Jonathan Lundell <jlund...@pobox.com> wrote: > > >>>> On Aug 6, 2009, at 9:32 PM, DenesL wrote: > > > > >>>>> IS_EMAIL does not follow the RFC specs for valid email > addresses > > >>>>> (seehttp://en.wikipedia.org/wiki/E-mail_address) > > > > >>>>> even a simple a...@b.com fails > > > > >>>>> it is kinda late to work on the regex now, maybe tomorrow. > > > > >>>> The RFC is fairly hard to validate. If that's what we really > > >>>> want, I > > >>>> found this one on the web that looks about right: > > > > >>>> ^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~]| > (?...@[a- > > >>>> z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$ > > > > >>>> It assumes the case-insensitive flag. > > > > >>>>http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an- > > >>>> email... > > > > >>>> Overkill? Or, what the heck? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py-users" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to web2py+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---