> So I took a look at that position in the marc record and > found a 0x9C character at that position, as the error > message indicates. I can't find a 0x9C in either of the > mapping tables that this record purports to use:
0x9C is a C1 control character that is generally assigned the function of STRING TERMINATOR and like Ed states is not a valid MARC-21 character. Only a small subset of the C0 and C1 control characters are allowed for in the MARC-21 standard: Character Function (in MARC-21) ---------- --------------------- 0x1B ESCAPE 0x1D RECORD TERMINATOR 0x1E FIELD TERMINATOR 0x1F SUBFIELD DELIMITER 0x88 NON-SORT BEGIN 0x89 NON-SORT END 0x8D JOINER 0x8E NON-JOINER > This character conversion stuff is a major pain. "An apparently simple subject which turns out to be brutally complicated" -- in reference to coded character sets -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Edward Summers [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 18, 2006 11:17 AM > To: perl4lib > Subject: Re: MARC Records, XML, and encoding > > So I got curious (thanks to your convo in #code4lib). I isolated the > problem to one record: > > http://www.inkdroid.org/tmp/one.dat > > Your roundtrip conversion complains: > > -- > > no mapping found at position 8 in Price : <9c> 7.99; Inv.# B > 476913; Date 06/03/98; Supplier : Dawson UK; Recd 20/03/98; > Contents : 1. The problem : 1. Don't bargain over positions; 2. > The method : 2. Separate the people from the problem; 3. > Focus on interests, not positions; 4. Invent options for mutual > gain; 5. Insist on using objective criteria; 3. Yes, but : > 6. What if they are more powerful? 7. What if they won't > play? 8. What if they use dirty tricks? 4. In conclusion; 5. > Ten questions people ask about getting to yes; g0=ASCII_DEFAULT > g1=EXTENDED_LATIN at /usr/local/lib/perl5/site_perl/5.8.7/MARC/ > Charset.pm line 126. > > -- > > So I took a look at that position in the marc record and > found a 0x9C > character at that position, as the error message indicates. I can't > find a 0x9C in either of the mapping tables that this record > purports > to use: > > BasicLatin (ASCII): http://lcweb2.loc.gov/cocoon/codetables/42.html > Extended Latin (ANSEL): > http://lcweb2.loc.gov/cocoon/codetables/45.html > > Looks like you might want to preprocess those records before > translating. Since this character routinely occurs in the 586 field > you could use MARC::Record to remove the offending character before > writing as XML. > > Hope that helps somewhat. This character conversion stuff is a major > pain. > > //Ed >