URL Decoding for %XX%XX%XX

Jun Inamori Sun, 24 Jun 2001 14:25:15 -0700
Hello,

Thank you for all your effort and enthusiasm on cool software.

IE (at least IE 5.5 on Windows95) encodes the request URL by the way of:
   one Japanese character --> %XX%XX%XX
And this results in the corrupted URL String.
I'm not sure, but I remenber that Costin pointed this is the bad behavior of IE.
Anyway, to specify the static HTML file with Japanse file name, I realy need to use 
Japanese characters in my URL.
If this can be done by Apache HTTP server, it is enough for me.
But, because I cannot find the solution on Apache, I played with Tomcat 3.2.2 to solve 
my issue.
I'll report my solution, which is not so sophisticated, but may help someone who 
encounters the similar problem.

As you know, the static HTML files are handled by
      org.apache.tomcat.request.StaticInterceptor.
StaticInterceptor uses:
      org.apache.tomcat.util.RequestUtil
to decode the encoded URL, and:
    public final static String URLDecode(String str)
is responsible for this task.
To get the right Java String from %XX%XX%XX, I modified this method.
It looks like this:
    public final static String URLDecode(String str)
        throws NumberFormatException, 
StringIndexOutOfBoundsException,IllegalArgumentException
    {
        // IE encodes one Japanese character into 3 sequense of
        // %XX, and this results in the corrupted URL.

        if (str == null)  return  null;

        ByteArrayOutputStream baos=new ByteArrayOutputStream();

        int strPos = 0;
        int strLen = str.length();

        while (strPos < strLen) {
            int laPos;        // lookahead position

            // look ahead to next URLencoded metacharacter, if any
            for (laPos = strPos; laPos < strLen; laPos++) {
                char laChar = str.charAt(laPos);
                if ((laChar == '+') || (laChar == '%')) {
                    break;
                }
            }

            // if there were non-metacharacters, copy them all as a block
            if (laPos > strPos) {
                byte[] nonmeta=(str.substring(strPos,laPos)).getBytes();
                baos.write(nonmeta,0,nonmeta.length);
                strPos = laPos;
            }

            // shortcut out of here if we're at the end of the string
            if (strPos >= strLen) {
                break;
            }

            // process next metacharacter
            char metaChar = str.charAt(strPos);
            if (metaChar == '+') {
                baos.write(((int)' '));
                strPos++;
                continue;
            } else if (metaChar == '%') {
                int some=Integer.parseInt(str.substring(strPos + 1, strPos + 3), 16);
                char c = (char)some;
                if(c == '/' || c == '\0')
                    throw new IllegalArgumentException("URL contains encoded special 
chars.");
                baos.write(some);
                strPos += 3;
            }
        }
        try{
            baos.flush();
            baos.close();
        }
        catch(IOException ex){
        }
        String dec=null;
        try{
            dec=baos.toString("UTF-8");
        }
        catch(UnsupportedEncodingException ex){
        }
        return dec;
    }

In addition, on my Linux box, StaticInterceptor fails to locate the file with Japanese 
file name.
I suppose this is not so general problem.
Because the default encoding of JVM does not match the encoding of the file system on 
my Linux, I have this issue.
(On my Linux box, the default encoding of JVM is always ISO-8859-1, but the file 
system is EUC-JP.)
To solve this problem, I added the following lines into:
    public int requestMap(Request req)
just before going to: absPath = ctx.getRealPath

        try{
            // Specify the encoding of your file system.
            pathInfo=new String(pathInfo.getBytes("EUC-JP"));
        }
        catch(UnsupportedEncodingException ex){
        }

Any questions and comments are welcome to me.

Best regards,
-- 

Happy Java programming!

Jun Inamori
OOP-Reserch
E-mail: [EMAIL PROTECTED]
URL:    http://www.oop-reserch.com/
URL Decoding for %XX%XX%XX

Reply via email to