Asgaut wrote:
Hi Laurie,
And thanks for your quick answer! Here are my comments.
I tried that first, changing the default encoding (in struts.xml) to utf-8.
That works fine, in java and in our web application. The problem is our
Sybase database which is configured to ISO-8859-1. And as our JDBC driver
(jconn2) does not convert from utf-8 to iso-8859-1, it will throw an
exception when trying to update or insert the characters it does not
understand.
I'm not sure why that's relevant to getting character data to and from
the browser though. It seems that any encoding/charset munging should be
happing at the database / JDBC level, not at the HTTP request level. The
problem is specific to the database, so the solution should be too IMHO...
So therefore I had to convert them myself. I can also add that there is a
special case when it comes to the Euro (€) character. It did not exist when
iso-8859-1 was created, but added as part of iso-8859-15. But our Sybase
database still only understands iso-8859-1, so a conversion needs to take
place. What I did was first convert it from utf-8 to iso-8859-15, then from
iso-8859-15 to iso-8859-1. Here is the code:
byte[] characters = charsBeforeConvert.getBytes("iso-8859-15");
for (int i = 0; i < characters.length; i++) {
if (characters[i] == (byte) 0xa4) {
//0x80 is control character and has no symbol in
iso-8859-1. It
is used for € in windows-1252
characters[i] = (byte) 0x80;
}
}
return new String(characters, "iso-8859-1");
Kind of a hassle, but it works.
I'll take your word for that; presumably you have code to perform the
inverse mapping somewhere as well?
It was a good idea to override the setCharacterEncoding method. This would
open the opportunity to move my converting logic from the filter to an
interceptor. But then another problem occurs. If I do the conversion in an
interceptor, I would need to know exactly which parameters that would need
to be converted. We are working with a solution for maintaining CV’s. I
would then have to do something like (pseudocode):
- String firstName = Request.getParamater(“firstName”);
- get CV object from the value stack
- firstName = performConversion(firstName)
- cv.setFirstName(firstName)
- put cv back on the value stack
In some cases this would work fine, but I have so many parameters I need to
retrieve and convert that it would not work as a proper solution. My filter
takes care of all requests parameters without the need of specifying which
parameter it is.
All this seems like a lot of potential for confusion any time data comes
in from the browser and gets sent back out again without the database
getting involved (e.g. if a form fails validation). You now have these
oddly munged, not quite normal string values moving through every layer
of your app.
I'd take another look to see if there's not a way to configure the JDBC
driver to do the right thing automatically and, if there really isn't,
to try and move the code to cope with that limitation as close to the
problem as possible.
L.
To improve my code, I will move the converting logic to a utility class, so
the filter can stay as thin as possible.
I will post the entire code if you like to take a look at it. Any comments
would be appreciated!
Thanks
import com.google.common.collect.Maps;
import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Map;
/**
* Filter to fix utf-8 to iso-8859-1 conversion
*
* @author Asgaut Mjolne
* @version $Revision: 1.6 $, 05.feb.2008, modified by: $Author: fiasmjol
*/
public class CharsetEncodingFilter implements Filter {
@Override
public void init(FilterConfig filterConfig) throws ServletException {
}
@Override
public void doFilter(ServletRequest servletRequest, ServletResponse
servletResponse, FilterChain filterChain) throws IOException,
ServletException {
HttpServletRequest req = (HttpServletRequest) servletRequest;
if ("utf-8".equalsIgnoreCase(req.getCharacterEncoding())) {
req = new CharsetRequestWrapper(req);
req.getParameter("foo"); //Needed to fill params. Do not remove
}
filterChain.doFilter(req, servletResponse);
}
@Override
public void destroy() {
}
static class CharsetRequestWrapper extends HttpServletRequestWrapper {
private static final byte ISO_8859_15_EURO_CODE_POINT = (byte) 0xa4;
/**
* Not in use in ISO-8859-1
*/
private static final byte CP_1252_EURO_CODE_POINT = (byte) 0x80;
public CharsetRequestWrapper(HttpServletRequest httpServletRequest)
{
super(httpServletRequest);
}
@Override
public String getParameter(String s) {
return super.getParameter(s);
}
Map<String, String[]> iso88591EncodedParams = null;
/**
* Looping through all parameters on the request, checking for
special characters.
* If any found, convert them with the fixCharset method
*/
@Override
public Map<String, String[]> getParameterMap() {
if (iso88591EncodedParams == null) {
iso88591EncodedParams = Maps.newHashMap();
Map<String, String[]> params = super.getParameterMap();
for (String key : params.keySet()) {
String[] values = params.get(key);
for (int j = 0; j < values.length; j++) {
values[j] = fixCharset(values[j]);
}
iso88591EncodedParams.put(key, values);
}
}
return iso88591EncodedParams;
}
/**
* Converting special chars from utf-8 to iso-8859-1
* Add more convertions here when needed
*/
static String fixCharset(String charsBeforeConvert) {
try {
byte[] characters =
charsBeforeConvert.getBytes("iso-8859-15");
for (int i = 0; i < characters.length; i++) {
if (characters[i] == ISO_8859_15_EURO_CODE_POINT) {
characters[i] = CP_1252_EURO_CODE_POINT;
}
}
return new String(characters, "iso-8859-1");
} catch (UnsupportedEncodingException e) {
return charsBeforeConvert;
}
}
@Override
public String[] getParameterValues(String s) {
return super.getParameterValues(s);
}
}
}
Laurie Harper wrote:
Asgaut wrote:
I have recently been struggling with a utf-8 to ISO-8859-1 problem with
Ajax
and Struts2.
The problem is basically that our application requires iso-8859-1
characters
and Ajax is configured to only post utf-8 (ajax is utf-8 either way, can
not
be changed). So some kind of conversion has to take place at some level.
My problem can be divided into two parts:
1. Make Struts2 understand that there is a incoming utf-8 POST, even
though
struts.xml (which set the struts2 default encoding) is configured to use
iso-8859-1
2. Convert the characters from utf-8 to iso-8859-1
3. Change your default encoding to utf-8, which should have no effect on
any of your code but will allow greater flexibility in the range of
characters you can display and read. Is there any reason you must use
iso-8859-1?
[...]
If you take a look at this piece of code, you can see that it overrides
the
encoding if it is set as defaultEncoding (from struts.xml). This is OK,
the
problem is this check:
if (encoding != null) {
try {
request.setCharacterEncoding(encoding);
} catch (Exception e) {
LOG.error("Error setting character encoding to '" +
encoding
+ "' - ignoring.", e);
}
}
I think the correct thing would be to also do a check if the
request.getCharacterEncoding was already set. I should look like this:
if (encoding != null && request.getCharacterEncoding() == null ) {
try {
request.setCharacterEncoding(encoding);
} catch (Exception e) {
LOG.error("Error setting character encoding to '" +
encoding
+ "' - ignoring.", e);
}
}
With this change utf-8 would be kept as the request character encoding
and I
could do my conversion in my interceptor.
This would solve my problem number 1. Am I correct when I say this is a
bug?
I don't know if I'd call that a bug, but it does seem like a reasonable
enhancement. It would probably require some testing with different
browsers to make sure getCharacterEncoding() really is returning null in
the 'normal' cases, but assuming that's true you could open a ticket in
JIRA and attach a patch.
The way I went around it was to create a filter which is executed before
FilterDispatcher in struts2. In this filter I check if it is a uft-8 post
and if it is, I wrap the HttpServletRequest into my own
CharsetRequestWrapper. In my wrapper I will override getParameterMap
which
converts my characters, put them back into the map and return them. I
also
run a req.getParameter("foo"); after my wrapping to populate the
parameters
on the request.
It works, but it took me a couple of days to work it out.
Any comments on this?
It might be simpler for your filter to call
setCharacterEncoding("utf-8") and use a trivial request wrapper that
delegates all calls to the wrapped request *except*
setCharacterEncoding(), making that a no-op. It would make it clearer
what the filter was acutaly doing with less code :-) Otherwise, seems
like a reasonable work-around.
L.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]