On Sun, Aug 7, 2016 at 8:37 AM, Alex Masidlover <alex.masidlo...@zednax.com>
wrote:

> On Sun, 2016-04-03 at 17:03 +0200, Vincent Veyron wrote:
> > On Sun, 03 Apr 2016 14:11:23 +0100
> > Alex Masidlover <alex.masidlo...@zednax.com> wrote:
> >
> > > This has all worked perfectly up until I upgraded to Apache 2.4 /
> > > mod_perl 2.0.10 -
> > After upgrading to Apache 2.4 and mod_perl 2.0.9, I had to make those
> > two changes to my application :
> >
> > In a PerlOutputFilterHandler, change '$content .= $buffer' to
> > '$content .= decode_utf8($buffer)'
> >
> > And in response handlers, change '$args{$_} = $req->param($_)' to
> > '$args{$_} = decode_utf8($req->param($_))'
> >
> > Not sure it applies to your case, but something changed in Apache 2.4
> > concerning UTF-8 data.
> >
> > If I understood correctly, anything that goes through APR::Table is
> > considered UTF-8, however the SvUTF8 flag is not set, so you get
> > double encoding when processing your data.
>

Just to be clear, everything in APR::Table is expected to be ASCII
(or EBCDIC, on those oddball architectures)... and "opaque text",
e.g. characters 128-255 on ASCII architectures is permitted but
not defined.

It's easy to verify that they qualify as UTF-8, because the coding is very
predictable.  E.g. 0xFE or 0xFF are not characters, and others are valid
only when in the correct sequence, c.f. the validation logic in;

  http://svn.apache.org/repos/asf/apr/apr/trunk/misc/win32/utf8.c


> What a browser sends is not guaranteed, it may be sending ISO-8859-1,
or UTF8 (or even problematic ISO-2022-JP where the ASCII char ' ' or '\'
may occur as a continuation character causing parsing issues. Modern
browsers appear to all be defaulting to UTF-8 finally, but you may have
many legacy browsers out there.

In your decode_utf8 logic, ensure that your app checks for failure!  And
where the output fails, try treating it in a code page such as 8859-1.

Reply via email to