Mark Thomas wrote:
Cookie handling is fundamentally a complete mess. Specifications exist
but are not fully implemented, are not consistent with related
specifications, etc.
Having tried to sort this out the last time around and having read
Jeremy's great work on documenting where we stand at the present moment,
it often feels like it wouldn't be too hard to make a case that just
about any cookie name or value that isn't an token (as per RFC2616) is
either valid or invalid depending on which specification(s) you choose
to read.
I'd strongly encourage anyone thinking about commenting further on this
thread to take the time to read the wiki page [1] where the Tomcat
committers (and Jeremy in particular) are currently trying to figure out
exactly how Tomcat should handle cookies in the future.
Mark
[1] http://wiki.apache.org/tomcat/Cookies
Hi agree whith everything you say above.
About the Wiki, what seems to be missing is additional lines in the tables showing some
examples of cookie values containing what English-speaking people often call "additional"
or "accented" characters (and what other people just call "characters"). For example,
what happens when the cookie value is a string like "ÄÖÜäöüéèîôâ" (that's about the extent
of what I can enter easily on this current German keyboard).
And let's also reflect on the fact that no matter what else we have been discussing here,
we have still not provided the original OP of this thread with any useful and practical
recommendation to resolve his problem, which seems to originate in a variation between how
Tomcat 6 and Tomcat 7 handle cookies with "accented characters" in their value.
Otherwise, to generalise the debate, it is not just cookies, but just about anything which
has to do with non-US-ASCII characters under HTTP and HTML which is a mess, and has been a
mess for several years if not decades. The current jumble of RFCs that deal with this
issue is in the end more confusing than helpful. And all the current "solutions" in terms
of implementation (browser-side as well as server-side) resemble patches over patches over
wooden legs.
I am not saying that resolving the issue is simple, nor that one can simply ignore the
past and/or backward-compatibility issues. But, despite the immense respect I have for
people like Roy Fielding and their achievements, I cannot but slowly get the impression
that the Internet RFC mechanism is, in that respect, slowly getting "fossilised", and that
nobody seems to have the energy and drive anymore to think radically, and tackle the issue
from the top down.
Nobody nowadays discusses anymore that Unicode and UTF-8 provide a form of "universal"
solution to most of the issues in terms of alphabets, character sets and encodings
suitable for 99% of the human users of computers and of the Internet. And nobody
discusses anymore that 99% of currently in-use hardware and software can handle arbitrary
sequences of bytes and bits perfectly fine.
Yet in terms of programming "for the Internet", we still have to live with - and work
around every day - a set of standards and recommendations based on a myriad of alphabets
and encodings which can each properly represent only a tiny fraction of the languages that
people worldwide speak and read.
And the issues related to encoding/decoding/transliterating between these different
alphabets and encodings, are costing thousands of productive hours lost every day,
independently of the confusions and aggravations that they generate.
Why is it exactly that we can come up with things like websockets and HTML-5 and SOAP and
java annotations, but not with a new HTTP/HTML version which would make Unicode/UTF-8 the
*default*, and everything else into exceptions ?
That for the sake of interoperability and mutual comprehension, things like HTTP header
*names* would be restricted to sequences of printable characters in a limited range that
is available on all human interface devices and universally readable is one thing; but why
would HTTP header *values* or URI path or query-string components (which often have to
carry real-world multilingual textual information) be similarly limited, and confusing and
inconsistent ? Why does it still have to be so difficult, in 2014, to create a web
user-interface application which insures that people from different countries can enter
their name and place of residence as they know it, and not have the server-side or
client-side application mangle them ?
If someone were to take the text of RFC 2616 and replace any direct or indirect mention of
US-ASCII and ISO-8859-1 in it, by Unicode/UTF-8, and present this as an RFC for HTTP 2.0,
would the Internet instantly crumble ?
Hoe does one go about doing this ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org