On 09/01/2019 00:50, Garret Wilson wrote: > Hi, Mark, and thanks for some quick response. You provided some info I > wasn't aware of. Some responses below: > > On 1/8/2019 9:57 PM, Mark Thomas wrote: >> On 08/01/2019 21:31, Garret Wilson wrote: >> >> <snip/> >> >>> But as discussed above, this is completely wrong: the resource >>> character encoding of a request sent in >>> `application/x-www-form-urlencoded` should have absolutely no bearing >>> on how the encoded octets within that resource are decoded. >> >> That is not the correct interpretation of section 3.12 of the Servlet >> 4.0 specification (note the section numbers do vary between spec >> versions). Tomcat implements the correct interpretation - i.e. the >> charset from the request content-type defines how encoded octets are >> decoded and, if none is specified, ISO-8859-1 is used as the default. > > > Ah, I hadn't seen that in the servlet spec. Yes, it seems as if Tomcat > is correctly following the spec, but I would still say the servlet spec > is wrong to make any linkage at all between resource encoding and %nn > interpretation. In fact reading the prose it's not clear to me that the > servlet spec is even strongly tying the %nn interpretation to the > encoding. It just sees to say that, unless otherwise specified, the %nn > interpretation should be ISO-8859-1. And actually that's a step up from > the HTML 4.0.1 spec, which in > https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 indicates > that they should be interpreted as US-ASCII codes. :( > > You indicate that this is all out of date, and I think we're in > agreement there. We really, really need to get the next servlet > specification to remove this part. In fact the servlet specification > should defer to the official `application/x-www-form-urlencoded` > specification, which at this point I think is the W3C HTML5 spec, which > in turn defers to the WHATWG spec (which clearly says that UTF-8) should > be used. What makes all of this more of a mess is that there seems to be > no way to work around this from the client side, e.g. by putting > something in the HTML to indicate UTF-8, as > `application/x-www-form-urlencoded` doesn't support a `charset` parameter. > > Anyway if there are any openings on the committee to update the servlet > spec, let me know.
That has moved to Eclipse. The process to update the spec is still being defined. The Jakarta EE Servlet API project is the project to get involved in. >> ... >> As of Servlet 4.0 there is a specification compliant configuration >> option to change this default to any encoding of your choice. >> Obviously, UTF-8 is one of the options. You can do this by adding the >> following to your web.xml: >> >> <request-character-encoding>UTF-8</request-character-encoding> > > Oh, that is really good to know, thanks!! Still I say that the request > character encoding is orthogonal to the %nn encoding, but, still, it's > good to have an implementation-agnostic way to do it. > >> >> >> Whether Tomcat should ship with this setting present in conf/web.xml >> by default is something that should probably be discussed for Tomcat >> 10. Given the current state of the web, there is a reasonable case for >> doing so. I'll add that to the TOMCAT-NEXT discussion list. > > > Yes please! If I can help in any way, let me know. > > >> >> The Tomcat Wiki also needs to be updated to take account of this new >> configuration option (and the related <response-character-encoding>). >> Since it is a wiki and this is clearly an issue you care about would >> you like to tackle that? > > > Yes, I'd love to. Let me know what permissions I need, etc. Create yourself an account at https://wiki.apache.org/tomcat (click login then create an account) and let the list know your ID. Then one of the admins can add you to the allowed editors. Apologies for the hoop jumping required but without the manual approval step for new accounts, the ASF project wiki's were being deluged in spam. Mark > > I have an international flight boarding right now so I have to go, and I > may not reply for the next few hours, but definitely sign me up. > > Thanks, > > Garret > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org