small (although important for UNIMARC guys) bug in MARC::File::XML ?

2006-10-11 Thread Paul POULAIN

Hello perl4lib,

I have some problems decoding unimarc XML datas.

> $record = MARC::Record::new_from_xml( $marcxml,"utf8","unimarc");

gives me a wrong encoding result, while

> $record = MARC::Record::new_from_xml( $marcxml,"utf8");

gives me a correct one.

As I have unimarc records, i'm not happy with that... so I have 
investigated MARC::File::XML and found at line 448 :


> $parser->{ Handler }{ toMARC8 } = (lc($format) ne 'unimarc' && $enc 
&& lc($enc) =~ /^utf-?8$/o) ? 0 : 1;


meaning that for "unimarc" we want to set toMARC8 flag for MARC8 decoding.

unimarc not in utf-8 are in latin1, meaning no MARC8 decoding is needed.

Thus, I would have written :

> $parser->{ Handler }{ toMARC8 } = (lc($format) *eq* 'unimarc' && $enc 
&& lc($enc) =~ /^utf-?8$/o) ? 0 : 1;


to flag the toMARC8 ONLY for MARC21 not already in utf8.

Am I right or missing something ?

--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)
Tel : 04 91 31 45 19


(another proposal for M:F:XML for unimarc)

2006-10-11 Thread Paul POULAIN

in UNIMARC, the field 100 contains the encoding (position 26/27)

Actually, are handled :
01 : iso646 (ascii 7 bits)
50 : unicode

Many libraries although uses 03, as extended latin1. Could we add, at 
line 492 (sub _unimarc_encoding { )


> } elsif ($enc eq '01') {
> return 'ISO-8859-1';

To deal 03 as 01.

The 03 is related to 5426, which is not exactly iso-8859-1, but close. 
and it is better, imho, than die.


--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)
Tel : 04 91 31 45 19