> They submit it in utf-8 only if your html form allows them to do that or
> they don't follow html specification and try to exploit your form.

If no explicit encoding is given, all modern browsers will attempt to 
"autodetect" the encoding based on the page contents, often with unpredictable 
results. Most web developers really don't understand the whole encoding thing, 
and many aren't aware of it at all. If they aren't taking care of the encoding 
question in their server side code, what makes anyone believe that they are 
specifying the encoding in their response headers, or HTML?

I can tell you for certain that if no encoding is specified, Chrome can and 
will decide that the data is UTF8, at least under certain conditions (because I 
watched it recently when working on an encoding problem in some legacy code.)

> Set form input charset to iso-8859-1

I can't believe I just saw someone recommend that ;)

Yes, you *could* use Latin-1...for which the Euro sign, ellipsis, decorative 
quotes, trademark, em dash, and a number of other frequently pasted characters 
are still out of range.

Then, when you eventually decide that latin1 isn't meeting your needs, you'll 
get to go through the wonderful process of trying to convert all of your legacy 
data to UTF8.

Single byte just doesn't cut the mustard anymore, especially on the web. The 
world is too small. We should be trying to move PHP *away* from this, not 
towards it.

John Crenshaw
Priacta, Inc.

Reply via email to