Re: CharacterEncoding bug in Struts2?

Laurie Harper Mon, 18 Feb 2008 06:59:19 -0800

Asgaut wrote:

Hi Laurie,
And thanks for your quick answer! Here are my comments.


I tried that first, changing the default encoding (in struts.xml) to utf-8.
That works fine, in java and in our web application. The problem is our
Sybase database which is configured to ISO-8859-1. And as our JDBC driver
(jconn2) does not convert from utf-8 to iso-8859-1, it will throw an
exception when trying to update or insert the characters it does not

understand.

I'm not sure why that's relevant to getting character data to and fromthe browser though. It seems that any encoding/charset munging should behapping at the database / JDBC level, not at the HTTP request level. Theproblem is specific to the database, so the solution should be too IMHO...

So therefore I had to convert them myself. I can also add that there is a
special case when it comes to the Euro (€) character. It did not exist when
iso-8859-1 was created, but added as part of iso-8859-15. But our Sybase
database still only understands iso-8859-1, so a conversion needs to take
place. What I did was first convert it from utf-8 to iso-8859-15, then from
iso-8859-15 to iso-8859-1. Here is the code:

byte[] characters = charsBeforeConvert.getBytes("iso-8859-15");
                for (int i = 0; i < characters.length; i++) {
                    if (characters[i] == (byte) 0xa4) {
                         //0x80 is control character and has no symbol in 
iso-8859-1. It
is used for € in windows-1252
                        characters[i] = (byte) 0x80;
                    }
                }
return new String(characters, "iso-8859-1");

Kind of a hassle, but it works.

I'll take your word for that; presumably you have code to perform theinverse mapping somewhere as well?

It was a good idea to override the setCharacterEncoding method. This would
open the opportunity to move my converting logic from the filter to an
interceptor. But then another problem occurs. If I do the conversion in an
interceptor, I would need to know exactly which parameters that would need
to be converted. We are working with a solution for maintaining CV’s. I
would then have to do something like (pseudocode):
-       String firstName = Request.getParamater(“firstName”);
-       get CV object from the value stack
-       firstName = performConversion(firstName)
-       cv.setFirstName(firstName)
-       put cv back on the value stack

In some cases this would work fine, but I have so many parameters I need to
retrieve and convert that it would not work as a proper solution. My filter
takes care of all requests parameters without the need of specifying which

parameter it is.

All this seems like a lot of potential for confusion any time data comesin from the browser and gets sent back out again without the databasegetting involved (e.g. if a form fails validation). You now have theseoddly munged, not quite normal string values moving through every layerof your app.

I'd take another look to see if there's not a way to configure the JDBCdriver to do the right thing automatically and, if there really isn't,to try and move the code to cope with that limitation as close to theproblem as possible.

L.

To improve my code, I will move the converting logic to a utility class, so
the filter can stay as thin as possible.

I will post the entire code if you like to take a look at it. Any comments
would be appreciated!

Thanks


import com.google.common.collect.Maps;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Map;

/**
 * Filter to fix utf-8 to iso-8859-1 conversion
 *
 * @author Asgaut Mjolne
 * @version $Revision: 1.6 $, 05.feb.2008, modified by: $Author: fiasmjol
 */
public class CharsetEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse
servletResponse, FilterChain filterChain) throws IOException,
ServletException {
        HttpServletRequest req = (HttpServletRequest) servletRequest;

        if ("utf-8".equalsIgnoreCase(req.getCharacterEncoding())) {
            req = new CharsetRequestWrapper(req);
            req.getParameter("foo"); //Needed to fill params. Do not remove
        }

        filterChain.doFilter(req, servletResponse);
    }

    @Override
    public void destroy() {
    }

    static class CharsetRequestWrapper extends HttpServletRequestWrapper {
        private static final byte ISO_8859_15_EURO_CODE_POINT = (byte) 0xa4;
        /**
         * Not in use in ISO-8859-1
         */
        private static final byte CP_1252_EURO_CODE_POINT = (byte) 0x80;

        public CharsetRequestWrapper(HttpServletRequest httpServletRequest)
{
            super(httpServletRequest);
        }

        @Override
        public String getParameter(String s) {
            return super.getParameter(s);
        }

        Map<String, String[]> iso88591EncodedParams = null;


        /**
         * Looping through all parameters on the request, checking for
special characters.
         * If any found, convert them with the fixCharset method
         */
        @Override
        public Map<String, String[]> getParameterMap() {
            if (iso88591EncodedParams == null) {
                iso88591EncodedParams = Maps.newHashMap();
                Map<String, String[]> params = super.getParameterMap();
                for (String key : params.keySet()) {
                    String[] values = params.get(key);
                    for (int j = 0; j < values.length; j++) {
                        values[j] = fixCharset(values[j]);
                    }
                    iso88591EncodedParams.put(key, values);
                }
            }
            return iso88591EncodedParams;
        }

        /**
         * Converting special chars from utf-8 to iso-8859-1
         * Add more convertions here when needed
         */
        static String fixCharset(String charsBeforeConvert) {
            try {
                byte[] characters =
charsBeforeConvert.getBytes("iso-8859-15");

                for (int i = 0; i < characters.length; i++) {
                    if (characters[i] == ISO_8859_15_EURO_CODE_POINT) {
                        characters[i] = CP_1252_EURO_CODE_POINT;
                    }
                }
                return new String(characters, "iso-8859-1");

            } catch (UnsupportedEncodingException e) {
                return charsBeforeConvert;
            }

}

        @Override
        public String[] getParameterValues(String s) {

return super.getParameterValues(s);}

    }
}








Laurie Harper wrote:

Asgaut wrote:

I have recently been struggling with a utf-8 to ISO-8859-1 problem with
Ajax
and Struts2.

The problem is basically that our application requires iso-8859-1
characters
and Ajax is configured to only post utf-8 (ajax is utf-8 either way, can
not
be changed). So some kind of conversion has to take place at some level.

My problem can be divided into two parts:
1. Make Struts2 understand that there is a incoming utf-8 POST, even
though
struts.xml (which set the struts2 default encoding) is configured to use
iso-8859-1
2. Convert the characters from utf-8 to iso-8859-1

3. Change your default encoding to utf-8, which should have no effect onany of your code but will allow greater flexibility in the range ofcharacters you can display and read. Is there any reason you must useiso-8859-1?

[...]

If you take a look at this piece of code, you can see that it overrides
the
encoding if it is set as defaultEncoding (from struts.xml). This is OK,
the
problem is this check:
if (encoding != null) {
            try {
                request.setCharacterEncoding(encoding);
            } catch (Exception e) {
                LOG.error("Error setting character encoding to '" +
encoding
+ "' - ignoring.", e);
            }
        }

I think the correct thing would be to also do a check if the
request.getCharacterEncoding was already set. I should look like this:
if (encoding != null && request.getCharacterEncoding() == null ) {
            try {
                request.setCharacterEncoding(encoding);
            } catch (Exception e) {
                LOG.error("Error setting character encoding to '" +
encoding
+ "' - ignoring.", e);
            }
        }
With this change utf-8 would be kept as the request character encoding
and I
could do my conversion in my interceptor.
This would solve my problem number 1. Am I correct when I say this is a
bug?

I don't know if I'd call that a bug, but it does seem like a reasonableenhancement. It would probably require some testing with differentbrowsers to make sure getCharacterEncoding() really is returning null inthe 'normal' cases, but assuming that's true you could open a ticket inJIRA and attach a patch.

The way I went around it was to create a filter which is executed before
FilterDispatcher in struts2. In this filter I check if it is a uft-8 post
and if it is, I wrap the HttpServletRequest into my own
CharsetRequestWrapper. In my wrapper I will override getParameterMap
which
converts my characters, put them back into the map and return them. I
also
run a req.getParameter("foo"); after my wrapping to populate the
parameters
on the request.

It works, but it took me a couple of days to work it out.

Any comments on this?

It might be simpler for your filter to callsetCharacterEncoding("utf-8") and use a trivial request wrapper thatdelegates all calls to the wrapped request *except*setCharacterEncoding(), making that a no-op. It would make it clearerwhat the filter was acutaly doing with less code :-) Otherwise, seemslike a reasonable work-around.


L.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: CharacterEncoding bug in Struts2?

Reply via email to