Url Encoding/Decoding and static resources

David Weinrich Wed, 27 Dec 2000 23:09:08 -0800

Hello again!

  I have run into something of an issue with tomcat 3.2.1 and 4.0 and
urls/filenames that include funky/reserved characters ( the most
common/troublesome being # so far ). To fix this I read rfc1738
ftp://ftp.isi.edu/in-notes/rfc1738.txt on the 'proper' way to handle these
issues and found the answer in section 2.2 "URL Character Encoding Issues".

  After fixing my webapp to encode these characters in the proper way, I
found that apache handled the urls and served out the resources correctly,
but Tomcat 3.2.1 and 4.0 didn't. Actually 4.0 handled the most common
character ( space or %20 ) but didn't handle other encoded characters, and
3.2.1 didn't handle any encoded characters ( if my memory serves me
correcly ). After digging through the source I was able to put minor changes
into a few files in 4.0 and one file in 3.2.1 that allowed both servers to
handle these URLs correctly for all of the test cases I have that previously
failed. I'll attach the patches for 4.0 now, I still want to go back over
the 3.2 patches one more time and make sure I didn't miss anything.

  I attempted to handle all of the difficult situations that unencoding the
URL might pose ( inserting control characters and/or trying to get to a file
outside the appropriate area ), but there might be security/implementation
issues that I missed. Also, it could very well be the case that these are
not the right places to fix this particular problem, my apologies if I
missed the mark ;) Thanks again and I should have the 3.2 patch worked over
by tomorrow afternoon or so.

David Weinrich

--- ResourcesBase.java  Tue Dec 26 23:36:29 2000
+++ ResourcesBaseEd.java        Tue Dec 26 23:37:27 2000
@@ -961,9 +961,32 @@
      * @param path Path to be normalized
      */
     protected String normalize(String path) {
+       String normalized = path;
+
+       // Resolve encoded characters in the normalized path,
+       // which also handles encoded spaces so we can skip that later.
+       // Placed at the beginning of the chain so that encoded 
+       // bad stuff(tm) can be caught by the later checks
+       while (true) {
+           int index = normalized.indexOf("%");
+           if (index < 0)
+               break;
+           char replaceChar = 
+               (char) ( Integer.parseInt( 
+                   normalized.substring( index + 1, index + 3 ), 16 )  );
+           // check for control characters ( values 00-1f and 7f-9f), 
+           // return null if present. See:
+           // http://www.unicode.org/charts/PDF/U0000.pdf 
+           // http://www.unicode.org/charts/PDF/U0080.pdf
+           if ( Character.isISOControl( replaceChar ) ) {
+               return null;
+           }
+           normalized = normalized.substring(0, index) +
+               replaceChar +
+               normalized.substring(index + 3);
+        }
 
        // Normalize the slashes and add leading slash if necessary
-       String normalized = path;
        if (normalized.indexOf('\\') >= 0)
            normalized = normalized.replace('\\', '/');
        if (!normalized.startsWith("/"))
@@ -977,15 +1000,6 @@
            normalized = normalized.substring(0, index) +
                normalized.substring(index + 1);
        }
-
-       // Resolve occurrences of "%20" in the normalized path
-       while (true) {
-           int index = normalized.indexOf("%20");
-           if (index < 0)
-               break;
-           normalized = normalized.substring(0, index) + " " +
-               normalized.substring(index + 3);
-        }
 
        // Resolve occurrences of "/./" in the normalized path
        while (true) {

--- DefaultServlet.java Tue Dec 26 23:42:09 2000
+++ DefaultServletEd.java       Tue Dec 26 23:41:52 2000
@@ -729,9 +729,33 @@
      * @param path Path to be normalized
      */
     protected String normalize(String path) {
+       String normalized = path;
+
+       // Resolve encoded characters in the normalized path,
+       // which also handles encoded spaces so we can skip that later.
+       // Placed at the beginning of the chain so that encoded 
+       // bad stuff(tm) can be caught by the later checks
+       while (true) {
+           int index = normalized.indexOf("%");
+           if (index < 0)
+               break;
+           char replaceChar = 
+               (char) ( Integer.parseInt( 
+                   normalized.substring( index + 1, index + 3 ), 16 )  );
+           // check for control characters ( values 00-1f and 7f-9f), 
+           // return null if present. See:
+           // http://www.unicode.org/charts/PDF/U0000.pdf 
+           // http://www.unicode.org/charts/PDF/U0080.pdf
+           if ( Character.isISOControl( replaceChar ) ) {
+               return null;
+           }
+           normalized = normalized.substring(0, index) +
+               replaceChar +
+               normalized.substring(index + 3);
+        }
+
 
        // Normalize the slashes and add leading slash if necessary
-       String normalized = path;
        if (normalized.indexOf('\\') >= 0)
            normalized = normalized.replace('\\', '/');
        if (!normalized.startsWith("/"))
@@ -745,15 +769,6 @@
            normalized = normalized.substring(0, index) +
                normalized.substring(index + 1);
        }
-
-       // Resolve occurrences of "%20" in the normalized path
-       while (true) {
-           int index = normalized.indexOf("%20");
-           if (index < 0)
-               break;
-           normalized = normalized.substring(0, index) + " " +
-               normalized.substring(index + 3);
-        }
 
        // Resolve occurrences of "/./" in the normalized path
        while (true) {

Url Encoding/Decoding and static resources

Reply via email to