Re: DefaultServlet doesn't set charset

Markus Schönhaber Fri, 08 Aug 2008 11:15:20 -0700

Mark Thomas wrote:

>> As I understand it, this is a violation of the HTTP 1.1 spec, since RFC
>> 2616 says in section 3.7.1:
>> |  The "charset" parameter is used with some media types to define the
>> |  character set (section 3.4) of the data. When no explicit charset
>> |  parameter is provided by the sender, media subtypes of the "text"
>> |  type are defined to have a default charset value of "ISO-8859-1" when
>> |  received via HTTP. Data in character sets other than "ISO-8859-1" or
>> |  its subsets MUST be labeled with an appropriate charset value. See
>> |  section 3.4.1 for compatibility problems.
> Yes, but... it is debatable in a container environment who is responsible 
> for ensuring this requirement is met.


I don't see that as debatable. In my understanding a web server that
serves non-ISO-8859-1-encoded content of type text/* without declaring
the charset is lying wrt the spec.

> If you have multiple text files each 
> with a different character set Tomcat is going to have to start guessing 
> the charset from the content - a path I wouldn't want to go down.

Agreed. I also consider having text resources with different encodings
as something non-standard, non-default which one shouldn't expect the
DefaultServlet to handle correctly. That's where the administrator's or
developer's responsibility starts.
But I'm talking about what I'd call the "default" case: where text
resources are created using the default platform encoding. And this is
something that, IMO, the DefaultServlet should be able to cope with.

Thinking about this, there are two things that seem odd to me:
1. I could find no place in the docs where it is mentioned that the
DefaultServlet is unable to serve text resources correctly if they are
not encoded in ISO-8859-1.
2. The existence of the fileEncoding init-param. Why should one care (or
be able to change) which encoding is used when reading text files from
disk if there's only one encoding for which serving them actually works?

>> one gets when UTF-8 is decoded using an 8-bit charset (provided, the
>> browser doesn't do some guessing of the charset based on the content).
> And most of them do, don't they?

I don't know. My Firefox doesn't. And I have yet to see a Firefox
installation where the charset guessing is turned on by default. The
same applies to what I can say wrt to Opera. Looking at IE when it comes
to standards compliance seems to be nonsense to me.
But that's only my experience - YMMV.
Furthermore, whether charset guessing done by the client conforms to the
spec seems doubtful to me when I look at section 3.4.1.

Anyway, my question is whether or not Tomcat behaves correctly (which
seems not to be the case) not whether some - or even most - browsers do
something that reduces the impact of a server's wrong behaviour.

> That said I wouldn't be against a patch that introduced a 
> useFileEncodingInCharset parameter (although a shorter name would be better ;)

Great! I'll dig into DefaultServlet's source and see what I can come up
with.
Speaking of the parameter name - that indeed seems problematic :-)

> HTH,

It did. Thanks for your response.

Regards
  mks

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultServlet doesn't set charset

Reply via email to