Whoever makes up this patch, since this is complicated enough,
can I ask you follow the commented regex style (re.X)
which is now used to validate paths;

see example starting on line 74 of main.py:
http://bazaar.launchpad.net/~mdipierro/web2py/devel/annotate/head%3A/gluon/main.py

Thanks,
- Yarko

On Fri, Aug 7, 2009 at 10:56 AM, Carl <carl.ro...@gmail.com> wrote:

>
> You've convinced me that staying close to RFC is a "best choice" even
> though we lose the opportunity for users to correct addresses at the
> point of data entry.
>
> nb the suggested regex in my last posting doesn't work well enough!
> e.g., a...@domain.co.uk isn't matched
>
> C
>
>
>
> On Aug 7, 4:48 pm, Jonathan Lundell <jlund...@pobox.com> wrote:
> > On Aug 7, 2009, at 8:13 AM, Carl wrote:
> >
> >
> >
> > > This is an excellent article on the traps to beware of when regex'ing
> > > email address formats
> >
> > >http://www.regular-expressions.info/email.html
> >
> > > This may ignite a debate though :)
> >
> > A discussion, maybe. In the abstract, I like the idea of verifying the
> > RFC verbatim, but we *should* be clear on what we're trying to do.
> > Guard against typos? Prevent some kind of attack? How much do we care
> > about false positives?
> >
> > The article objects (to RFC-style checking) that j...@aol.com.nospam,
> > for example, will validate. I'm not too concerned about that, in that
> > there are lots of ways that a user can enter a wrong but
> > (syntactically) valid address. We deal with that through active
> > validation, not a syntax check.
> >
> > Might there be a security concern? The quoted variation of the RFC
> > checker is very permissive:
> >
> >         "([^"\r\\]|\\["\r\\])*"
> >
> > Could that open the door to some kind of injection attack? Presumably
> > we sanitize it for display; how about when we actually use it to send
> > mail? Any consumer that doesn't understand quoted names could end up
> > very confused.
> >
> > I take false positives as a v. bad thing: if a user enters a real and
> > valid address, I do not want to reject it. So I don't much like the
> > explicit list of TLDs (below), on the grounds that it's bound to
> > expand, and at some point it'll break. From the Wikipedia TLD article:
> >
> > > During the 32nd International Public ICANN Meeting in Paris in 2008,
> > > ICANN started a new process of TLD naming policy to take a
> > > "significant step forward on the introduction of new generic top-
> > > level domains." This program envisions the availability of many new
> > > or already proposed domains, as well a new application and
> > > implementation process. Observers believed that the new rules could
> > > result in hundreds of new gTLDs to be registered. Proposed TLDs
> > > include music, berlin and nyc.
> >
> > I think I'd favor the RFC-style pattern without the quoted-name
> > alternation.
> >
> > One thing we could do is to give the developer an option:
> > IS_EMAIL(something or other) that lets them select one of a small
> > number of regexes. And of course the developer can always use IS_MATCH
> > if they don't like our choice of email filters.
> >
> > If we permitted a choice, I'd suggest:
> >
> >         1. default to the RFC regex, but without quoted names
> >         2. RFC including quoted names
> >         3. something like the pattern below, including the TLD filter
> (maybe)
> >
> >
> >
> >
> >
> > > I favour this variation...
> > > [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-
> > > z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|
> > > info|mobi|name|aero|jobs|museum)\b
> >
> > > C
> >
> > > On Aug 7, 8:25 am, Jonathan Lundell <jlund...@pobox.com> wrote:
> > >> On Aug 7, 2009, at 12:22 AM, mdipierro wrote:
> >
> > >>> I will take a patch for this.
> >
> > >> If nobody else gets to it first, I'll work up a patch over the
> > >> weekend.
> >
> > >>> Massimo
> >
> > >>> On Aug 7, 1:33 am, Jonathan Lundell <jlund...@pobox.com> wrote:
> > >>>> On Aug 6, 2009, at 9:32 PM, DenesL wrote:
> >
> > >>>>> IS_EMAIL does not follow the RFC specs for valid email addresses
> > >>>>> (seehttp://en.wikipedia.org/wiki/E-mail_address)
> >
> > >>>>> even a simple a...@b.com fails
> >
> > >>>>> it is kinda late to work on the regex now, maybe tomorrow.
> >
> > >>>> The RFC is fairly hard to validate. If that's what we really
> > >>>> want, I
> > >>>> found this one on the web that looks about right:
> >
> > >>>> ^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?...@[a-
> > >>>> z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$
> >
> > >>>> It assumes the case-insensitive flag.
> >
> > >>>>http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-
> > >>>> email...
> >
> > >>>> Overkill? Or, what the heck?
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to