Re: [OT] cookie issue with Tomcat 7 - does not accept the character "é"

André Warnier Tue, 04 Feb 2014 03:00:59 -0800

Mark Thomas wrote:

Cookie handling is fundamentally a complete mess. Specifications exist
but are not fully implemented, are not consistent with related
specifications, etc.


Having tried to sort this out the last time around and having read
Jeremy's great work on documenting where we stand at the present moment,
it often feels like it wouldn't be too hard to make a case that just
about any cookie name or value that isn't an token (as per RFC2616) is
either valid or invalid depending on which specification(s) you choose
to read.

I'd strongly encourage anyone thinking about commenting further on this
thread to take the time to read the wiki page [1] where the Tomcat
committers (and Jeremy in particular) are currently trying to figure out
exactly how Tomcat should handle cookies in the future.

Mark


[1] http://wiki.apache.org/tomcat/Cookies


Hi agree whith everything you say above.

About the Wiki, what seems to be missing is additional lines in the tables showing someexamples of cookie values containing what English-speaking people often call "additional"or "accented" characters (and what other people just call "characters"). For example,what happens when the cookie value is a string like "ÄÖÜäöüéèîôâ" (that's about the extentof what I can enter easily on this current German keyboard).

And let's also reflect on the fact that no matter what else we have been discussing here,we have still not provided the original OP of this thread with any useful and practicalrecommendation to resolve his problem, which seems to originate in a variation between howTomcat 6 and Tomcat 7 handle cookies with "accented characters" in their value.

Otherwise, to generalise the debate, it is not just cookies, but just about anything whichhas to do with non-US-ASCII characters under HTTP and HTML which is a mess, and has been amess for several years if not decades. The current jumble of RFCs that deal with thisissue is in the end more confusing than helpful. And all the current "solutions" in termsof implementation (browser-side as well as server-side) resemble patches over patches overwooden legs.

I am not saying that resolving the issue is simple, nor that one can simply ignore thepast and/or backward-compatibility issues. But, despite the immense respect I have forpeople like Roy Fielding and their achievements, I cannot but slowly get the impressionthat the Internet RFC mechanism is, in that respect, slowly getting "fossilised", and thatnobody seems to have the energy and drive anymore to think radically, and tackle the issuefrom the top down.

Nobody nowadays discusses anymore that Unicode and UTF-8 provide a form of "universal"solution to most of the issues in terms of alphabets, character sets and encodingssuitable for 99% of the human users of computers and of the Internet. And nobodydiscusses anymore that 99% of currently in-use hardware and software can handle arbitrarysequences of bytes and bits perfectly fine.

Yet in terms of programming "for the Internet", we still have to live with - and workaround every day - a set of standards and recommendations based on a myriad of alphabetsand encodings which can each properly represent only a tiny fraction of the languages thatpeople worldwide speak and read.And the issues related to encoding/decoding/transliterating between these differentalphabets and encodings, are costing thousands of productive hours lost every day,independently of the confusions and aggravations that they generate.

Why is it exactly that we can come up with things like websockets and HTML-5 and SOAP andjava annotations, but not with a new HTTP/HTML version which would make Unicode/UTF-8 the*default*, and everything else into exceptions ?

That for the sake of interoperability and mutual comprehension, things like HTTP header*names* would be restricted to sequences of printable characters in a limited range thatis available on all human interface devices and universally readable is one thing; but whywould HTTP header *values* or URI path or query-string components (which often have tocarry real-world multilingual textual information) be similarly limited, and confusing andinconsistent ? Why does it still have to be so difficult, in 2014, to create a webuser-interface application which insures that people from different countries can entertheir name and place of residence as they know it, and not have the server-side orclient-side application mangle them ?

If someone were to take the text of RFC 2616 and replace any direct or indirect mention ofUS-ASCII and ISO-8859-1 in it, by Unicode/UTF-8, and present this as an RFC for HTTP 2.0,would the Internet instantly crumble ?

Hoe does one go about doing this ?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: [OT] cookie issue with Tomcat 7 - does not accept the character "é"

Reply via email to