Firstly may I appologise for the length of this, I fear I won't get to
the bottom of it otherwise.

All of the following is being done on OS X 10.3
Server: Apache/2.0.52 (Unix) mod_perl/1.999.21 Perl/v5.8.6

I noticed today that some of my utf8 data was becoming corrupted but
couldn't see why.
Hebrew and Arabic were fine but characters like à were becoming garbled.

So I went back to a basic test script I wrote for testing utf8 and
this was fine running under cgi mode.

So I converted the cgi script to a package. 

Wrapped the main code in sub handler {} and added return Apache::OK

both the cgi script and package have:

use strict;
use warnings;
use 5.008006;

use utf8;
use DBI;
use CGI (':standard');
use Encode qw/is_utf8 decode/;

binmode(STDOUT, ":utf8");

When I load the cgi script everything is fine every time.

When I load the package it's fine the first time but becomes garbled
on susbsequent loads.

Restarting apache makes the first load fine again. Quitting the
browser and relaunching it also solves the problem.

Here's the httpd.conf info:

   PerlModule ModPerl::Registry

<FilesMatch "\.html$">
        SetHandler perl-script 
        Perlhandler unidbtest
</FilesMatch>

I'm not sure if ModPerl::Registry is involved or not. I've tried both
perl-script and modperl as the handler.

First load I get like this as the output (whole page so you can see
the meta tags etc):

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
        PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; lang="en-US"
xml:lang="en-US"><head><title>Simple UTF 8 test</title>
</head><body>


<form method="get" action="/test.html"
enctype="application/x-www-form-urlencoded">
<h1>Unicode test: Page 1</h1><table border="1"
cellpadding="5"><tr><td>description</td><td>language</td><td>char in
unicode</td></td>
<tr><td>abimer</td><td>french</td><td><input type="text" name="abimer"
value="abÃmer" size="50" maxlength="80" /></td></td>
<tr><td>angies</td><td>mixed</td><td><input type="text" name="angies"
value="ÅÃâÄâÄreÃenuâ" size="50" maxlength="80" /></td></td>

<tr><td>aogonec</td><td>polish</td><td><input type="text"
name="aogonec" value="Ä" size="50" maxlength="80" /></td></td>
<tr><td>citroen</td><td>french</td><td><input type="text"
name="citroen" value="citroÃn" size="50" maxlength="80" /></td></td>
<tr><td>disco</td><td>french</td><td><input type="text" name="disco"
value="discothÃque" size="50" maxlength="80" /></td></td>
<tr><td>hebrew_alef</td><td>hebrew</td><td><input type="text"
name="hebrew_alef" value="×" size="50" maxlength="80" /></td></td>
<tr><td>lslash</td><td>polish</td><td><input type="text" name="lslash"
value="Å" size="50" maxlength="80" /></td></td>
<tr><td>recenu</td><td>french</td><td><input type="text" name="recenu"
value="reÃenu" size="50" maxlength="80" /></td></td>

<tr><td>smiley</td><td>none</td><td><input type="text" name="smiley"
value="â" size="50" maxlength="80" /></td></td>
<tr><td>zcaron</td><td>czech</td><td><input type="text" name="zcaron"
value="Å" size="50" maxlength="80" /></td></td>
</table><input type="hidden" name="VertDo" value="test unicode" 
/><input type="submit" name="Do" value="test unicode"
/><div></div></form>

</body></html>

on subsequent loads the following happens:

abÃmer  becomes abïmer
citroÃn  becomes citroïn

and so on.

If I leave my machine alone for a while and then load the page I get
correct output again, subsequent loads are garbled again.

really very confused about what's going on.

What to do next? I'm happy to put both scripts somewhere for download
if anyone wants to replicate the problem.

Thanks

Angie

Reply via email to