Re: MARC::File::XML utf8 output problem

2011-03-08 Thread Saiful Amin
> > However, when I try to generate MARCXML output with following code the >> > Arabic characters gets corrupted: >> >> > $xml = MARC::File::XML->out( $file ); >> > my $record = MARC::Record->new(); >> > ... >> > $xml->write( $record ); >> > ... >> > $xml->close(); >> >> > I've tried setting the en

Re: MARC::File::XML utf8 output problem

2011-03-08 Thread Saiful Amin
> > However, when I try to generate MARCXML output with following code the > > Arabic characters gets corrupted: > > > $xml = MARC::File::XML->out( $file ); > > my $record = MARC::Record->new(); > > ... > > $xml->write( $record ); > > ... > > $xml->close(); > > > I've tried setting the encoding in

Re: MARC::File::XML utf8 output problem

2011-03-08 Thread Frédéric DEMIANS
> However, when I try to generate MARCXML output with following code the > Arabic characters gets corrupted: > $xml = MARC::File::XML->out( $file ); > my $record = MARC::Record->new(); > ... > $xml->write( $record ); > ... > $xml->close(); > I've tried setting the encoding in various ways withou

Re: MARC::File::XML and parsing.

2007-10-19 Thread Mike Rylander
On 10/3/07, Henri-Damien LAURENT <[EMAIL PROTECTED]> wrote: [snip] > > Woulditnot possible to add some feature to M::F::X which would allow > people to collect UTF8 data as such without checking MARC21 ? > Sorry for the delay in reply, folks. Actually, you should be able to do exactly what you wa

Re: MARC::File::XML and parsing.

2007-10-03 Thread Henri-Damien LAURENT
Doran, Michael D a écrit : > Hi Henri, > > >> Is there a reason why MARC::File::XML considers only a very >> strict subset of utf-8 as valid ? >> > > I would guess that it has to do with adhering to the MARC-21 repertoire of > characters, so as to facilitate the round-trip conversion betw

RE: MARC::File::XML and parsing.

2007-09-27 Thread Doran, Michael D
Hi Henri, > Is there a reason why MARC::File::XML considers only a very > strict subset of utf-8 as valid ? I would guess that it has to do with adhering to the MARC-21 repertoire of characters, so as to facilitate the round-trip conversion between the MARC-8 and Unicode character sets [1,2].

Re: MARC::File::XML 0.85

2007-04-16 Thread Ed Summers
I apologize, but I'm finding it hard to trace what exactly this script is doing. I did take a look at the first failure and sure enough the record leader says it's 463 bytes but the record itself is 464 bytes. So a failure is warranted -- given the current behavior of MARC::Record. Perhaps dumbin

Re: MARC::File::XML 0.85

2007-04-16 Thread Paul POULAIN
Joshua M. Ferraro a écrit : - "Mike Rylander" <[EMAIL PROTECTED]> wrote: CVS is updated with that now, and after anyone willing makes sure it's not breaking anything I think we should release again. Just updated and ran make test successfully. I'm happy to roll a release today if there are

Re: MARC::File::XML 0.85

2007-04-16 Thread Dan Scott
With MARC::Charset 0.96, MARC::Record 2.0.0, and MARC::File::XML from CVS, I get failed tests for 7 out of 8 of Joshua's tests. Test STDIN and STDOUT output attached. Dan Scott On 16/04/07, Joshua M. Ferraro <[EMAIL PROTECTED]> wrote: - "Mike Rylander" <[EMAIL PROTECTED]> wrote: > CVS is up

Re: MARC::File::XML 0.85

2007-04-16 Thread Joshua M. Ferraro
- "Mike Rylander" <[EMAIL PROTECTED]> wrote: > CVS is updated with that now, and after anyone willing makes sure > it's not breaking anything I think we should release again. Just updated and ran make test successfully. I'm happy to roll a release today if there are no objections. This may be

Re: MARC::File::XML 0.85

2007-04-13 Thread Mike Rylander
On 4/13/07, Paul POULAIN <[EMAIL PROTECTED]> wrote: Mike Rylander a écrit : > On 4/13/07, Joshua M. Ferraro <[EMAIL PROTECTED]> wrote: > $parser->{ Handler }{ toMARC8 } = (lc($format) =~ /^unimarc/o || ( > $enc && lc($enc) =~ /^utf-?8$/o )) ? 0 : 1; > mmm... I agree with you. In fact, I think h

Re: MARC::File::XML 0.85

2007-04-13 Thread Paul POULAIN
Mike Rylander a écrit : On 4/13/07, Joshua M. Ferraro <[EMAIL PROTECTED]> wrote: $parser->{ Handler }{ toMARC8 } = (lc($format) =~ /^unimarc/o || ( $enc && lc($enc) =~ /^utf-?8$/o )) ? 0 : 1; mmm... I agree with you. In fact, I think having a test that, if true make 0 as result and if false

Re: MARC::File::XML 0.85

2007-04-13 Thread Mike Rylander
On 4/13/07, Joshua M. Ferraro <[EMAIL PROTECTED]> wrote: Hi folks, A new version of MARC::File::XML has been uploaded to CPAN. It's a one character fix that solves a bug for UNIMARC users who don't have MARC-8 encoded records. Thanks to Paul Poulain for pointing it out. Unfortunately, this

Re: MARC::* concern (Re: MARC::File::XML suggestion)

2006-07-14 Thread Thomas Dukleth
On Fri, July 14, 2006 2:17 pm, Edward Summers wrote: > This is all fine, but lets talk in unit tests for MARC::Record if we > can. They will make plain what the actual behavior is ... I was commenting here about my concern raised by the findings of Paul Poulain. I had no test of my own showing an

Re: MARC::* concern (Re: MARC::File::XML suggestion)

2006-07-14 Thread Edward Summers
This is all fine, but lets talk in unit tests for MARC::Record if we can. They will make plain what the actual behavior is, and will let us talk about what the preferred behavior could be. Sorry to be so short, but there's only so much time in the day. //Ed On Jul 13, 2006, at 2:55 PM, Tho

MARC::* concern (Re: MARC::File::XML suggestion)

2006-07-13 Thread Thomas Dukleth
The MARC record management libraries need to allow management of the legacy records we actually have and the records that existing systems currently use. Some problems that MARC record management libraries need to manage: 1. CHARACTER VARIANCE FOR FIELD CODES. Large union catalogue systems such

Re: MARC::File::XML suggestion

2006-07-13 Thread Edward Summers
On Jul 13, 2006, at 12:41 PM, Paul POULAIN wrote: sometimes, I parse XML that contains invalid subfieldcode (like a capital letter) M::F::X definetly dies in this case. MARC::Field seems to allow a subfield with a capital letter--as it should since there really is no requirement that subfie

Re: MARC::File::XML => ed Rocks !

2006-01-10 Thread Ed Summers
On 1/10/06, Paul POULAIN <[EMAIL PROTECTED]> wrote: > Repeat after me : Ed Summers is the best Perl coder for librarians ! Oh shucks, thanks, I *really* wish that was true. The reality is that I wrote that module and was familiar with one of its limitations. The real people who should be praised

Re: MARC::File::XML perfs

2006-01-09 Thread Ed Summers
I should have mentioned that MARC::File::XML uses XML::SAX for XML parsing. XML::SAX can use a variety of backend XML parsers, but I believe by default it will use the XML::SAX::PurePerl parser if it can't find any other ones installed, which is exceptionally slow. I recommend taking a look at ins

Re: MARC::File::XML perfs

2006-01-09 Thread Ed Summers
On 1/9/06, Paul POULAIN <[EMAIL PROTECTED]> wrote: > what am I doing wrong ? Using Perl? :-) Seriously though, i'd be interested in DProf [1] output from your program if you have the energy. //Ed [1] http://search.cpan.org/~ilyaz/DProf-19990108/DProf.pm

Re: MARC::File::XML

2006-01-06 Thread Paul POULAIN
Ed Summers a écrit : I'm curious what people would find to be the best default behavior for MARC::File::XML when it creates a MARC::Record object from XML. Should the character encoding by default by transposed from UTF-8 to MARC-8? Or should it be left as UTF-8? maybe a new_from_xml($xml,$encodi