I guess this part clears that up: A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.
In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. On Nov 5, 9:04 pm, "Kyle Smith" <[EMAIL PROTECTED]> wrote: > I don't believe this is entirely correct. Spaces and apostrophes > absolutely _should_ be represented in a URL encoded however section > 2.2 of the spec discusses unsafe characters and their representations. > > "Usually a URL has the same interpretation when an octet is > represented by a character and when it encoded. However, this is not > true for reserved characters: encoding a character reserved for a > particular scheme may change the semantics of a URL. > > Thus, only alphanumerics, the special characters "$-_.+!*'(),", and > reserved characters used for their reserved purposes may be used > unencoded within a URL." > > Typically when a character in a URL needs to be encoded it's done by > the user agent if possible, as most users have no idea what the > equivalent character encoding is or they are pasting in a URL from > some other application or document. So shouldhttp://www.test.com/my > file.pdf be rejected as an invalid URL, or should that be validated > and changed tohttp://www.test.com/my%20file.pdf. Automatically > encoding characters like # . _ & or + would screw up a large number of > perfectly valid URL's as it could change their meaning. > > Kyle > > On Wed, Nov 5, 2008 at 3:36 AM, achipa <[EMAIL PROTECTED]> wrote: > > > Underscores, spaces and apostrophes are NOT valid (regardless of the > > part of the url they're in). As per the RFC, spaces and most other non- > > letter characters should be considered unsafe and must be encoded. > > Note that most modern browsers do some conversions transparently, so > > you can type spaces and similar in the address bar and those will get > > converted in the actual request to %20-s and such - whether you want > > to keep that convenience functionality with web2py is a different > > matter. > > > On Nov 4, 8:25 pm, "Kyle Smith" <[EMAIL PROTECTED]> wrote: > > > For your unit test there's a few other basic things you should probably be > > > checking. > > > > The host portion can contain dashes > > > >http://my-site.com > > > > The path portion can contain many/most characters ex: > > > >http://my-site.com/path_to/my_file_for_'97.pdf > > > > In this example there are underscores and an apostrophe which are only > > > valid > > > in the path/file portion of the URL. > > > > Kyle > > > > On Mon, Nov 3, 2008 at 10:49 PM, Jonathan Benn <[EMAIL PROTECTED]>wrote: > > > > > Hi Massimo, > > > > > If you would like some help developing a good regex, I have passable > > > > skill in this area. I just need to have a list of conforming URLs vs. > > > > non-conforming (to test against) and I can do the rest. > > > > > On Nov 3, 7:15 pm, mdipierro <[EMAIL PROTECTED]> wrote: > > > > > > fixed in trunk. > > > > > Thank you. Unfortunately, now it seems to be rejecting all valid > > > > cases, e.g.: > > > > >http://www.benn.ca > > > >http://benn.ca > > > >http://amazon.com/books/ > > > >https://amazon.com/movies > > > > rstp://idontknowthisprotocol > > > > HTTP://allcaps.com > > > >http://localhost > > > >http://localhost/ > > > >http://localhost/hello > > > >http://localhost/hello/ > > > >http://localhost:8080 > > > >http://localhost:8080/ > > > >http://localhost:8080/hello > > > >http://localhost:8080/hello/ > > > > file:///C:/Documents%20and%20Settings/Jonathan/Desktop/view.py > > > > > I wrote a unit test for IS_URL(). Since I can't seem to attach > > > > documents, I will paste it here: > > > > > ''' > > > > Unit tests for IS_URL() > > > > ''' > > > > > import unittest > > > > from gluon.validators import * > > > > > ############################################################################### > > > > class TestIsUrl(unittest.TestCase): > > > > > x = IS_URL() > > > > > def testInvalidUrls(self): > > > > urlsToCheck = ['fff', > > > > 'htp://invalid.com', > > > > 'http:hello.com', > > > > 'hTTp://www.benn.ca'] > > > > > failures = [] > > > > > for url in urlsToCheck: > > > > if self.x(url)[1] == None: > > > > failures.append('Incorrectly accepted: ' + url) > > > > > if len(failures) > 0: > > > > self.fail(failures) > > > > > def testValidUrls(self): > > > > > urlsToCheck = ['http://www.benn.ca', > > > > 'http://benn.ca', > > > > 'http://amazon.com/books/', > > > > 'https://amazon.com/movies', > > > > 'rstp://idontknowthisprotocol', > > > > 'HTTP://allcaps.com', > > > > 'http://localhost', > > > > 'http://localhost/', > > > > 'http://localhost/hello', > > > > 'http://localhost/hello/', > > > > 'http://localhost:8080', > > > > 'http://localhost:8080/', > > > > 'http://localhost:8080/hello', > > > > 'http://localhost:8080/hello/', > > > > 'file:///C:/Documents%20and%20Settings/Jonathan/ > > > > Desktop/view.py'] > > > > > failures = [] > > > > > for url in urlsToCheck: > > > > if self.x(url)[1] != None: > > > > failures.append('Incorrectly rejected: ' + url) > > > > > if len(failures) > 0: > > > > self.fail(failures) > > > > > ############################################################################### > > > > if __name__ == "__main__": > > > > unittest.main() --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---