As I said in a previous posting, I had sent an email to JSR154 about the setCharacterEncoding issue. I got the following answer from Yutaka Yoshida who expresses his personal opinion

Stefanos

-------------------------------------
I share the same thought. I agree we need a facility to handle
this case as many people pass non 8859-1 chars on GET params.
Also, no, I don't mind your forwarding my message, but please
make sure that this was my personal opinion.

thank you,
Yutaka Yoshida
Sun Microsystems, Inc.

Stefanos Karasavvidis wrote:

> Hi Yuta,
>
> thank you for the reply.
>
> your point of view matches the one of the tomcat developers, and I don't really have anything to object as far as standard conformance is concerned.
>
> BUT:
>
> first of all I assume that you agree that passing parameters with the GET method is at least useful (IMO it is a necessity). Consider how you would pass to someone a URL to a dynamically generated page, based on GET parameters.
>
> So there has to be a way to correctly encode text values within the URI. Prior to Servlet/JSP 2.3/1.2 every vendor had it's own way to bypass this issue (if there was any) and every developer had to "manually" decode the values and hope that the servlet container does not have it's own way of doing the decoding (e.g. Sun Web Server).
>
> We "non latin" developers could not use automatic form handling applications (they just called getParameter), and EVERY new web application introduced a new way of handling these issues.
>
> The introduction of the setCharacterEncoding method was such a BIG relieve for us, and now we have again to return to old style coding methods (I personally use servlets since 1998).
>
> I also used to teach an undergraduate "web applications" laboratory at the Technical University of Crete (trying to use Java), and it was always difficult enough to explain to the students that "1 byte is NOT 1 character" when they tried to pass Greek text values. Now we have to explain to them that the way they learnt to deal with encodings within the servlet spec has to change again. From a teaching point of view this is not a problem (it is actually legitimate). But it is not legitimate if the students have to provide so much effort to just get some parameter... these are issues that should have been easy to handle a long time ago (and they were just until now).
>
> Anyway...
> introducing a new method is IMO the right way to go. And this should be done fast because there has been a lot of confusion and even damage. Leaving this unaddressed and hoping that every developer will become an expert in character encoding issues is IMHO not acceptable.
>
> It is not enough to state that the Java as a language and as a web application development framework can handle internationally addressed applications. These simple everyday problems should have a consistent and unchallengeable way of handling.
>
> Thank you for your time
>
> Do you mind if I forward this message to the tomcat-dev list?
>
> Regards
>
> Stefanos Karasavvidis
>
>
> > how much it's useful
> > since what we have to do is just re-creating a String from getBytes
>
>
>
> Yuta Yoshida wrote:
>
>> Hi Stefanos,
>>
>> I personally believe setCharacterEncoding() should only affect
>> the body as stated in javadoc, in other words, POST. Because:
>> o your second paragraph below
>> o if this method affected the URI too, it introduces another
>> meaning. As you know there're two mappings in URI. One is
>> from characters in URI to the octet and the other is from
>> the octet to the original character. Set[ting]CharacterEncoding
>> of the POST body is direct - the body is actually encoded in
>> the encoding scheme specified by the method, however, doing so
>> of the GET query param is not direct - it is encoded in ascii
>> but the method is specifying the encoding of the original
>> characters. That's confusing.
>>
>> Considering that the original encoding of GET URI doesn't have to
>> be the same as the one of the POST body, we might need a new method
>> to specify the GET encoding. But I'm not sure how much it's useful
>> since what we have to do is just re-creating a String from getBytes.
>>
>> Anyway, I'll put this into the list we need to address in the next
>> version of the specification. I understand most containers currently
>> implement this method for both POST and GET and we need to take that
>> fact into consideration.
>>
>> Thank you for the comment,
>> Yutaka Yoshida
>> Sun Microsystems, Inc.
>>
>>> There has been a dispute lately in the tomcat-development list about whether the
>>> Request.setCharacterEncoding(String encoding)
>>> method sets the encoding for both HTTP GET and POST parameters, or only for HTTP POST parameters.
>>>
>>> The developers argued that as there is no standard way for encoding characters in the URI, there is no possibility to encode the query string of the URI (the GET parameters) differently than the first part of the URI. Thus the setCharacterEncoding method's encoding is applied ONLY to POST parameters.
>>>
>>> This change of behaviour has been applied to tomcat version 4.1.29 and 5.0.16 although there has been added a special tomcat configuration parameter (not available until the next versions will be released) which puts back the old behaviour, but the default will remain to be to not encode GET parameters according to the method.
>>>
>>> A list of bugs filed on this issue is available in the folowing posting
>>> http://www.mail-archive.com/[EMAIL PROTECTED]/msg50866.html
>>> and many related messages exist within the developer list (search for "setCharacterEncoding")
>>>
>>> As this change in the reference implementation breaks the common behaviour of other servlet engines (as well as tomcats previous to the latest releases behaviour), I ask you to clarify this issue.
>>>
>>> Regards
>>>
>>> Stefanos Karasavvidis
>>>
>>>
>>
>>
>





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to