Martin v. Löwis wrote: > John Nagle schrieb: > >>Here's a URL, found in a link, which gives us trouble >>when we try to follow the link: >> >> http://sportsbra.co.uk/../acatalog/shop.html >> >>Browsers immediately turn this into >> >> http://sportsbra.co.uk/acatalog/shop.html >> >>and go from there, but urllib tries to open it explicitly, which >>results in an HTTP error 400. >> >>Is "urllib" wrong? > > > I can't see how. HTTP 1.1 says that the parameter to the GET > request should be an abs_path; RFC 2396 says that > /../acatalog/shop.html is indeed an abs_path, as .. is a valid > segment. That RFC also has a section on relative identifiers > and normalization; it defines what .. means *in a relative path*. > > Section 4 is explicit about .. in absolute URIs: > # The syntax for relative URI is a shortened form of that for absolute > # URI, where some prefix of the URI is missing and certain path > # components ("." and "..") have a special meaning when, and only when, > # interpreting a relative path. > > Notice the "and only when": the browsers who modify above > URL before sending it seem to be in clear violation of > RFC 2396. > > Regards, > Martin
I think you're right. The problem is that there is apparently a de-facto standard in browsers that any number of "../" sequences at the beginning of the path part of a URL have no effect. Even Google seems to use that interpretation; not only does it follow that link, it lists it in Google without the "..". John Nagle -- http://mail.python.org/mailman/listinfo/python-list