I agree that this URI handling sucks. I'm the one that committed the change
that made it happen and I still think it sucks. However, allowing these
encoded characters opens some very large security problems.
Also, even if TC 3.2.x allowed these characters, the resulting URL wouldn't
be portable because several other web servers impose the same restrictions.
In fact we lifted our restriction on encoded special characters straight
from Apache HTTPD. In your example URL, if /app/UCB was a CGI script then
you would still get a 404 on Apache (at least on Apache 1.3.19 which is what
I tested with).
If you need to pass this sort of data to a servlet (or CGI) the most
portable way is to simply use a query string.
Marc Saegesser
> -----Original Message-----
> From: Lars Oppermann [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, September 13, 2001 5:00 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: URI handling in tomcat 3.2.3
>
>
> Hi everyone,
>
> we were in progress of moving our project to tomcat 3.2.3 when we came
> accross the new handling of URIs (release-notes sec. 7.2).
>
> Since we are using the URI to transport other hierarchical information
> then filesystem paths, we have the feeling, that this kind of
> functionality belongs to the default servlet serving filesystem
> requests. Especialy the fact that %25, %2E, %2F and %5c inside an URI
> lead to a 404 error seems to somewhat strange.
> For Example:
> http://server/app/UCB/vnd.sun.star.hier:%2F/address/myresource
> would be rejected, before app has teh possibilty to look at the request
> and ...hier://address/myfile... would be normalized to hier:/address.
>
> We are perfectly aware of the security concerns behind these changes.
> However, they only apply when serving resources from the filesystem. A
> URL's path-components however are in no way bound to the representaion
> of filesystem paths.(After all, the U in URL stands for universal :)
>
> RFC 2396 states that '/' in an URI has another semantic meaning then %2F
> in an URI. The '/' seperates path-components, while the %2F means a
> slash character in a path-component. When such an URI is mapped to a
> filesystem this would denote a filename that contains a slash. When the
> system does not allow for such names, it is the responsebilty of the
> filesystem servlet to report an error (404 since such a file must not
> exist on unix for example).
>
> What are your opinions on this?
>
> Cheers
> -Lars
> --
> ----------------------------------------------------------------------
> Lars Oppermann <[EMAIL PROTECTED]> Sun Microsystems
> Software Engineer - Sun ONE Webtop Sachsenfeld 4
> Phone: +49 40 23646 959 D-20097 Hamburg
> Fax: +49 40 23646 550 http://www.sun.com/webtop