Re: MARC Records, XML, and encoding

2006-05-18 Thread Joshua Ferraro
Thanks everyone for the help thusfar. Ed and I have been chatting on code4lib ... it seems there are two problems. One is with the 9C character, which I now have a workaround for. I added the following to Charset.pm line 151: if ($marc8 =~ /\x{9C}/) { $utf8 .= ' ';

RE: MARC Records, XML, and encoding

2006-05-18 Thread Doran, Michael D
> So I took a look at that position in the marc record and > found a 0x9C character at that position, as the error > message indicates. I can't find a 0x9C in either of the > mapping tables that this record purports to use: 0x9C is a C1 control character that is generally assigned the function of

Re: MARC Records, XML, and encoding

2006-05-18 Thread Joshua Ferraro
Hi Ed, Interesting ... when I run marcdump I get: Recs Errs Filename - - 192 0 sample.mrc Here's the file posted on a web server (maybe a problem with the list truncating the attachment?): http://liblime.com/public/sample.mrc Could you try downloading from there and ru

Re: MARC Records, XML, and encoding

2006-05-18 Thread Colin Campbell
Edward Summers wrote: On May 18, 2006, at 6:48 AM, Joshua Ferraro wrote: Anyway, if anyone can shed some light on this I'd be grateful. I believe the data loss you are seeing is due to your source records--not to do with character translation. Just a quick look but I think in many cases the

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
So I got curious (thanks to your convo in #code4lib). I isolated the problem to one record: http://www.inkdroid.org/tmp/one.dat Your roundtrip conversion complains: -- no mapping found at position 8 in Price : <9c> 7.99;Inv.# B 476913;Date 06/03/98; Supplier : Dawson UK;

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
On May 18, 2006, at 10:03 AM, Joshua Ferraro wrote: http://liblime.com/public/sample.mrc Could you try downloading from there and running marcdump again? Yes that one has the same amount of records but now passes through marcdump fine. Now, when running your script I get a lot of warnings

Re: MARC Records, XML, and encoding

2006-05-18 Thread Edward Summers
On May 18, 2006, at 6:48 AM, Joshua Ferraro wrote: Anyway, if anyone can shed some light on this I'd be grateful. I believe the data loss you are seeing is due to your source records-- not to do with character translation. Just running marcdump on them generates a ton of errors (see below).