Hi. I have a couple of obvious thoughts:
Does the form contain an accept-charset attribute? If the form in the HTML document does not specify an accept-charset, it is allowed to use the document's charset. Did you double-check the charset being returned by the server (what's seen by the agent)? And take into consideration any transcoding allowed by servers or agents along the way; and allow for what the user agent itself has settled on, for the document's effective encoding, given its own choices. If the header is not present, no meta element is given (for Content-Type...charset=), and the element itself has no charset defined in some attribute, HTML actually specifies that the user agent does not have to follow any rules -- e.g. it doesn't fall back to 8859-1 as stated in HTTP -- the user agent may be using the user's chosen default encoding, which may be UTF-8 (or the platform encoding). It also may guess... I still do not know what you are intended to do if the agent has picked some arbitrary local encoding -- because it has not been given any charset it must presume -- and then it sends back something with no indication. It's very good to ensure the server sends clear headers; and it's nice to duplicate that in a meta element too; and also to include an attribute on the form. This at least gives the agent no excuses, and you can actually send back a "you have a buggy user agent" response with confidence. I was prompted to at least comment because form handling in my opinion is still extremely messy! It can still bite you in corner cases. There are at least 6 cross-referenced RFCs for the encoding of each part of the multipart message. Each part is supposed to have a content-type; and that contains the charset parameter for text components. And you still see things being transmitted incorrectly. Plus the file name part is allowed to munge the file name into an approximation. I don't know if any of that is useful. Good luck! I'd be curious to hear what you ultimately learn. Ciao! -Steev Coco. On Fri July 6 2007 8:43:52 pm you wrote: > I am running into an encoding problem with form parameter values that > contain non-ASCII characters in a form that contains an upload > component. Without the upload component everything works fine. The > problem appears to be in the handling of strings in > 'multipart/form-data', in that both Firefox and Mozilla seem to send > strings encoded as UTF-8, but don't specify a character set, and the > upload component interprets these as ISO-8559-1. This actually seems to > be the correct response to incorrect behavior by the browsers, but in > practice we need to find a workaround. I can't find a way to get around > this without modifying the tapestry-upload project, and I was wondering > if anyone could suggest a better solution, and if not whether Tapestry > itself will deal with this in the future > > Our simple, hack workaround is to modify > 'MultipartDecoderImpl.processFileItems(...)' to call > > wrapper.addParameter(item.getFieldName(), item.getString("UTF8")) > > instead of > > wrapper.addParameter(item.getFieldName(), item.getString()) > > To improve this, I think we would need > > 1) To have a way of passing in an appropriate default encoding to use. > We could contribute a 'HttpServletRequestHandler' that sets the > request's default character encoding, but is there a way to guarantee > that our handler would be called before the > 'MultipartServletRequestFilter'? > > 2) Even if did (1), we would need a way to use this encoding to parse > strings multipart form fields. Passing the encoding to > 'FileItem.getString()' is undesirable because it would not handle the > case where the part's 'charset' parameter was explicitly set. The > parameterless version of 'FileItem.getString()' cannot be used, however, > because it explicitly defaults to 'ISO-8859-1' if the character set is > not specified (e.g. it uses neither the request's character encoding or > the header encoding set by 'FileUpload.setHeaderEncoding'). I can't find > a nice way to do this without duplicating some code in > 'commons-fileupload' or relying on public methods of 'DiskFileItem' that > aren't in the 'FileItem' interface. > > Thanks, > Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]