Re: MARC Records, XML, and encoding

2006-05-19 Thread Joshua Ferraro
Hi Andy, Thanks for checking on that. I can confirm that the record has E2 in it and that when I replace E2 with a ? character, new_from_xml() doesn't croak. I was able to find the E2 character in the code tables provided by LOC (which I assume are what MARC::Charset uses). Here is what they've g

RE: MARC Records, XML, and encoding

2006-05-19 Thread Houghton,Andrew
> From: Joshua Ferraro [mailto:[EMAIL PROTECTED] > Sent: 19 May, 2006 13:40 > To: Edward Summers > Cc: perl4lib > Subject: Re: MARC Records, XML, and encoding > > Hi all, > > Here is an OCLC record: > > http://liblime.com/public/oclc1.dat > > So ...

Re: MARC Records, XML, and encoding

2006-05-19 Thread Joshua Ferraro
Hi all, Here is an OCLC record: http://liblime.com/public/oclc1.dat I feed it into the as_xml method and I get what appears to be valid XML: http://liblime.com/public/oclc1.xml When I take that xml and feed it to the new_from_xml method and print it to a file I get the error: Cannot decode st

Re: MARC Records, XML, and encoding

2006-05-18 Thread Joshua Ferraro
Thanks everyone for the help thusfar. Ed and I have been chatting on code4lib ... it seems there are two problems. One is with the 9C character, which I now have a workaround for. I added the following to Charset.pm line 151: if ($marc8 =~ /\x{9C}/) { $utf8 .= ' ';

RE: MARC Records, XML, and encoding

2006-05-18 Thread Doran, Michael D
From: Edward Summers [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 18, 2006 11:17 AM > To: perl4lib > Subject: Re: MARC Records, XML, and encoding > > So I got curious (thanks to your convo in #code4lib). I isolated the > problem to one record: > > http:/

Re: MARC Records, XML, and encoding

2006-05-18 Thread Joshua Ferraro
Hi Ed, Interesting ... when I run marcdump I get: Recs Errs Filename - - 192 0 sample.mrc Here's the file posted on a web server (maybe a problem with the list truncating the attachment?): http://liblime.com/public/sample.mrc Could you try downloading from there and ru

Re: MARC Records, XML, and encoding

2006-05-18 Thread Colin Campbell
Edward Summers wrote: On May 18, 2006, at 6:48 AM, Joshua Ferraro wrote: Anyway, if anyone can shed some light on this I'd be grateful. I believe the data loss you are seeing is due to your source records--not to do with character translation. Just a quick look but I think in many cases the

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
So I got curious (thanks to your convo in #code4lib). I isolated the problem to one record: http://www.inkdroid.org/tmp/one.dat Your roundtrip conversion complains: -- no mapping found at position 8 in Price : <9c> 7.99;Inv.# B 476913;Date 06/03/98; Supplier : Dawson UK;

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
On May 18, 2006, at 10:03 AM, Joshua Ferraro wrote: http://liblime.com/public/sample.mrc Could you try downloading from there and running marcdump again? Yes that one has the same amount of records but now passes through marcdump fine. Now, when running your script I get a lot of warnings

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
On May 18, 2006, at 6:48 AM, Joshua Ferraro wrote: Anyway, if anyone can shed some light on this I'd be grateful. I believe the data loss you are seeing is due to your source records-- not to do with character translation. Just running marcdump on them generates a ton of errors (see below).