> So I took a look at that position in the marc record and 
> found a 0x9C character at that position, as the error
> message indicates. I can't find a 0x9C in either of the
> mapping tables that this record purports to use:

0x9C is a C1 control character that is generally assigned the function
of STRING TERMINATOR and like Ed states is not a valid MARC-21
character.  Only a small subset of the C0 and C1 control characters are
allowed for in the MARC-21 standard:

 Character  Function (in MARC-21)
----------  ---------------------
        0x1B  ESCAPE
        0x1D  RECORD TERMINATOR 
        0x1E  FIELD TERMINATOR 
        0x1F  SUBFIELD DELIMITER 
        0x88  NON-SORT BEGIN 
        0x89  NON-SORT END 
        0x8D  JOINER 
        0x8E  NON-JOINER

> This character conversion stuff is a major pain.

        "An apparently simple subject which 
        turns out to be brutally complicated"
        -- in reference to coded character sets

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/ 

> -----Original Message-----
> From: Edward Summers [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, May 18, 2006 11:17 AM
> To: perl4lib
> Subject: Re: MARC Records, XML, and encoding
> 
> So I got curious (thanks to your convo in #code4lib). I isolated the  
> problem to one record:
> 
>       http://www.inkdroid.org/tmp/one.dat
> 
> Your roundtrip conversion complains:
> 
> --
> 
> no mapping found at position 8 in Price : <9c> 7.99;    Inv.#  B  
> 476913;    Date   06/03/98; Supplier : Dawson UK;  Recd 20/03/98;   
> Contents : 1. The problem :     1. Don't bargain over positions;  2.  
> The method :     2. Separate the people from the problem;     3.  
> Focus on interests, not positions;     4. Invent options for mutual  
> gain;     5. Insist on using objective criteria;  3. Yes, but :      
> 6. What if they are more powerful?     7. What if they won't  
> play?     8. What if they use dirty tricks?  4. In conclusion;  5.  
> Ten questions people ask about getting to yes; g0=ASCII_DEFAULT  
> g1=EXTENDED_LATIN at /usr/local/lib/perl5/site_perl/5.8.7/MARC/ 
> Charset.pm line 126.
> 
> --
> 
> So I took a look at that position in the marc record and 
> found a 0x9C  
> character at that position, as the error message indicates. I can't  
> find a 0x9C in either of the mapping tables that this record 
> purports  
> to use:
> 
> BasicLatin (ASCII): http://lcweb2.loc.gov/cocoon/codetables/42.html
> Extended Latin (ANSEL): 
> http://lcweb2.loc.gov/cocoon/codetables/45.html
> 
> Looks like you might want to preprocess those records before  
> translating. Since this character routinely occurs in the 586 field  
> you could use MARC::Record to remove the offending character before  
> writing as XML.
> 
> Hope that helps somewhat. This character conversion stuff is a major  
> pain.
> 
> //Ed
> 

Reply via email to