[web2py:11512] Re: Possible bug in IS_URL

achipa Sun, 09 Nov 2008 03:37:44 -0800

I guess this part clears that up:

   A URI is always in an "escaped" form, since escaping or unescaping
a
   completed URI might change its semantics.


   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If
the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead of "~" in an http URL
   path, but the two are equivalent for an http URL.

On Nov 5, 9:04 pm, "Kyle Smith" <[EMAIL PROTECTED]> wrote:
> I don't believe this is entirely correct. Spaces and apostrophes
> absolutely _should_ be represented in a URL encoded however section
> 2.2 of the spec discusses unsafe characters and their representations.
>
> "Usually a URL has the same interpretation when an octet is
> represented by a character and when it encoded. However, this is not
> true for reserved characters: encoding a character reserved for a
> particular scheme may change the semantics of a URL.
>
> Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
> reserved characters used for their reserved purposes may be used
> unencoded within a URL."
>
> Typically when a character in a URL needs to be encoded it's done by
> the user agent if possible, as most users have no idea what the
> equivalent character encoding is or they are pasting in a URL from
> some other application or document. So shouldhttp://www.test.com/my
> file.pdf be rejected as an invalid URL, or should that be validated
> and changed tohttp://www.test.com/my%20file.pdf. Automatically
> encoding characters like # . _ & or + would screw up a large number of
> perfectly valid URL's as it could change their meaning.
>
> Kyle
>
> On Wed, Nov 5, 2008 at 3:36 AM, achipa <[EMAIL PROTECTED]> wrote:
>
> > Underscores, spaces and apostrophes are NOT valid (regardless of the
> > part of the url they're in). As per the RFC, spaces and most other non-
> > letter characters should be considered unsafe and must be encoded.
> > Note that most modern browsers do some conversions transparently, so
> > you can type spaces and similar in the address bar and those will get
> > converted in the actual request to %20-s and such - whether you want
> > to keep that convenience functionality with web2py is a different
> > matter.
>
> > On Nov 4, 8:25 pm, "Kyle Smith" <[EMAIL PROTECTED]> wrote:
> > > For your unit test there's a few other basic things you should probably be
> > > checking.
>
> > > The host portion can contain dashes
>
> > >http://my-site.com
>
> > > The path portion can contain many/most characters ex:
>
> > >http://my-site.com/path_to/my_file_for_'97.pdf
>
> > > In this example there are underscores and an apostrophe which are only 
> > > valid
> > > in the path/file portion of the URL.
>
> > > Kyle
>
> > > On Mon, Nov 3, 2008 at 10:49 PM, Jonathan Benn <[EMAIL PROTECTED]>wrote:
>
> > > > Hi Massimo,
>
> > > > If you would like some help developing a good regex, I have passable
> > > > skill in this area. I just need to have a list of conforming URLs vs.
> > > > non-conforming (to test against) and I can do the rest.
>
> > > > On Nov 3, 7:15 pm, mdipierro <[EMAIL PROTECTED]> wrote:
>
> > > > > fixed in trunk.
>
> > > > Thank you. Unfortunately, now it seems to be rejecting all valid
> > > > cases, e.g.:
>
> > > >http://www.benn.ca
> > > >http://benn.ca
> > > >http://amazon.com/books/
> > > >https://amazon.com/movies
> > > > rstp://idontknowthisprotocol
> > > > HTTP://allcaps.com
> > > >http://localhost
> > > >http://localhost/
> > > >http://localhost/hello
> > > >http://localhost/hello/
> > > >http://localhost:8080
> > > >http://localhost:8080/
> > > >http://localhost:8080/hello
> > > >http://localhost:8080/hello/
> > > > file:///C:/Documents%20and%20Settings/Jonathan/Desktop/view.py
>
> > > > I wrote a unit test for IS_URL(). Since I can't seem to attach
> > > > documents, I will paste it here:
>
> > > > '''
> > > >    Unit tests for IS_URL()
> > > > '''
>
> > > > import unittest
> > > > from gluon.validators import *
>
> > > > ###############################################################################
> > > > class TestIsUrl(unittest.TestCase):
>
> > > >    x = IS_URL()
>
> > > >    def testInvalidUrls(self):
> > > >        urlsToCheck = ['fff',
> > > >                       'htp://invalid.com',
> > > >                       'http:hello.com',
> > > >                       'hTTp://www.benn.ca']
>
> > > >        failures = []
>
> > > >        for url in urlsToCheck:
> > > >            if self.x(url)[1] == None:
> > > >                failures.append('Incorrectly accepted: ' + url)
>
> > > >        if len(failures) > 0:
> > > >            self.fail(failures)
>
> > > >    def testValidUrls(self):
>
> > > >        urlsToCheck = ['http://www.benn.ca',
> > > >                       'http://benn.ca',
> > > >                       'http://amazon.com/books/',
> > > >                       'https://amazon.com/movies',
> > > >                       'rstp://idontknowthisprotocol',
> > > >                       'HTTP://allcaps.com',
> > > >                       'http://localhost',
> > > >                       'http://localhost/',
> > > >                       'http://localhost/hello',
> > > >                       'http://localhost/hello/',
> > > >                       'http://localhost:8080',
> > > >                       'http://localhost:8080/',
> > > >                       'http://localhost:8080/hello',
> > > >                       'http://localhost:8080/hello/',
> > > >                       'file:///C:/Documents%20and%20Settings/Jonathan/
> > > > Desktop/view.py']
>
> > > >        failures = []
>
> > > >        for url in urlsToCheck:
> > > >            if self.x(url)[1] != None:
> > > >                failures.append('Incorrectly rejected: ' + url)
>
> > > >        if len(failures) > 0:
> > > >            self.fail(failures)
>
> > > > ###############################################################################
> > > > if __name__ == "__main__":
> > > >    unittest.main()
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:11512] Re: Possible bug in IS_URL

Reply via email to