Hi Laurent,

Op di 18 jan 2005 om 11:48:05 +0100 schreef Laurent Fousse:
> I've found more combined log lines that trigger the bug. They all
> contain a referer field which is a result from a search engine with
> encoded accentuated characters in the url (like %E9).
> 
> Using "lr_log2report -o xml" you can see the produced xml declares an
> utf-8 encoding but the encoding actually used is iso-8859-1, hence the
> bug.

You are absolutely right.

For the record, here's the proof:

----------

$ echo 'silence.lateralis.org - - [11/Jan/2005:02:27:17 +0100] "GET
 /pl/synth/ HTTP/1.1" 200 14356
 
"http://www.google.fr/search?num=100&hl=fr&ie=ISO-8859-1&q=images+de+synth%E8se&btnG=Rechercher&meta=";
 "Mozilla/4.0 (compatible; MSIE 5.17; Mac_PowerPC)"' | \
   lr_log2report -o xml combined > ~/tmp/291063.xml

$ lr_xml2report -o txt ~/tmp/291063.xml > /dev/null

Formatting report as txt in -...
lr_xml2report: ERROR
not well-formed (invalid token) at line 1004, column 46, byte 53763 at
/usr/lib/perl5/XML/Parser.pm line 187


$ recode cp1252/..u8 < tmp/291063.xml > tmp/291063.utf8.xml

$ lr_xml2report -o txt ~/tmp/291063.utf8.xml > /dev/null
Formatting report as txt in -...
$

----------

The latest lire upstream snapshot, lire-2.0.1.99.1, suffers from the
same bug.  The generated XML file has a wrong header "<?xml
version="1.0" encoding="UTF-8"?>".

Changing the header to '<?xml version="1.0" encoding="ISO-8859-1"?>'
is another way to work around the problem.

I'll investigate more.

Thanks for your well documented bugreport!

Bye,

Joost

-- 
.    .                                        http://logreport.com/
| '.|                        /^LogReport$/
| Lire                                        http://logreport.org/

Attachment: signature.asc
Description: Digital signature

Reply via email to