@All, thanks everybody for responding so far. I apologize for not
originally including the version of Tomcat we're using (6.0.18). It was
an oversight on my part in my hurried effort to write the email --
totally just blitzed that I should include that (my first post to the
Tomcat list).
I realize that I accidentally posted an incorrect example. The question
posted by Christopher S. using the URLEncoded value of %3B is the
correct encoding. I shouldn't have pasted in a plain text ";".
This ";" is an edge case for us. This URL encoding is part of a SES /
Friendly URL implementation of a framework I'm a contributor for and
therefore this was logged as defect report by a framework user since
originally we were not URL encoding special characters (they were
including a short "error" message in the URL proper). We have no clue
what people are generating and how they using as URLs -- just trying to
get an idea of why this wasn't working.
@Christopher S., Sadly Tomcat still truncates the path info even when
encoding the the ";" as %3B. We've actually resorted to using "unicode"
like representation (changing "U+03B" to U_03B so to not use a +) and
transforming back when processing the incoming requests.
@Chuck, thanks for your snarky response. I'll leave it at that.
@Bill, thanks for mentioning the specific RFC referencing the encoding
of ";". That's what the team here figured out as well and it's nice to
have an independent person verify our assumptions after reading the RFC
sections. Is it safe to assume that Tomcat code base assumes that
anything after the ";" has to be the jsessionid? That's our assumption
at the moment. Yes, there is a work around by using the
request.getRequestURI() instead as that has the whole URI and manually
remove the "absolute" path to get the complete path info.
Since our framework is deployed on several different CFML servlets --
their implementation to get at the original http request wrapper differs
a bit (three different vendors). We'll probably stick to use the poor
man's encoding using a modified unicode representation of ";" in the
end. Another solution is to write a filter and use the getRequestURI()
and replace the bad path info in the request with the full length version.
Thanks again,
.Peter
Bill Barker said the following on 12/23/-28158 01:59 PM:
Shouldn't that be /index.cfm/somePathInfo&%3BwithMoreInfo/
?
If you try the above URL, does it work?
java.net.URLEncoder will encode ";" as "%3B".
See the URL Specification (RFC 1738,
http://www.ietf.org/rfc/rfc1738.txt), section 2.2 "URL Character
Encoding Issues
":
"
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.
"
The HTTP specification does not specifically say that semi-colons are
reserved, but perhaps the common interpretation of the URL spec is such
that semi-colons should always be encoded.
Actually it does, just by reference. Section 3.2.1 of RFC 2616 defers to
RFC 2396. And section 3.3 of that RFC gives a special meaning to ';'.
Tomcat doesn't handle this correctly according the the RFC, but no
developer/contributor has had enough of an itch to fix it. But I doubt that
fixing it would help the OP much.
The fully compliant Tomcat would have to remove anything after a ';'
(including the ';') up until the next '/' (if any) for the purpose of
mapping the request. It should then re-include them in the various parts of
the request URI (except for ";jsessionid"). So it's a lot of work to
implement an archane feature that has plenty of work arounds.
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkpozVsACgkQ9CaO5/Lv0PBnLwCfXFSSIDAnRR0BurRKeS0ub/v9
3UYAoJ1gp5oIqnJw2WgHx9LdVzqqAOAI
=rpT0
-----END PGP SIGNATURE-----