On Sun, Aug 7, 2016 at 8:37 AM, Alex Masidlover <alex.masidlo...@zednax.com> wrote:
> On Sun, 2016-04-03 at 17:03 +0200, Vincent Veyron wrote: > > On Sun, 03 Apr 2016 14:11:23 +0100 > > Alex Masidlover <alex.masidlo...@zednax.com> wrote: > > > > > This has all worked perfectly up until I upgraded to Apache 2.4 / > > > mod_perl 2.0.10 - > > After upgrading to Apache 2.4 and mod_perl 2.0.9, I had to make those > > two changes to my application : > > > > In a PerlOutputFilterHandler, change '$content .= $buffer' to > > '$content .= decode_utf8($buffer)' > > > > And in response handlers, change '$args{$_} = $req->param($_)' to > > '$args{$_} = decode_utf8($req->param($_))' > > > > Not sure it applies to your case, but something changed in Apache 2.4 > > concerning UTF-8 data. > > > > If I understood correctly, anything that goes through APR::Table is > > considered UTF-8, however the SvUTF8 flag is not set, so you get > > double encoding when processing your data. > Just to be clear, everything in APR::Table is expected to be ASCII (or EBCDIC, on those oddball architectures)... and "opaque text", e.g. characters 128-255 on ASCII architectures is permitted but not defined. It's easy to verify that they qualify as UTF-8, because the coding is very predictable. E.g. 0xFE or 0xFF are not characters, and others are valid only when in the correct sequence, c.f. the validation logic in; http://svn.apache.org/repos/asf/apr/apr/trunk/misc/win32/utf8.c > What a browser sends is not guaranteed, it may be sending ISO-8859-1, or UTF8 (or even problematic ISO-2022-JP where the ASCII char ' ' or '\' may occur as a continuation character causing parsing issues. Modern browsers appear to all be defaulting to UTF-8 finally, but you may have many legacy browsers out there. In your decode_utf8 logic, ensure that your app checks for failure! And where the output fails, try treating it in a code page such as 8859-1.