Re: MARC Records, XML, and encoding

Joshua Ferraro Thu, 18 May 2006 16:05:24 -0700

Thanks everyone for the help thusfar. Ed and I have been chatting
on code4lib ... it seems there are two problems. One is with
the 9C character, which I now have a workaround for. I added the 
following to Charset.pm line 151:


        if ($marc8 =~ /\x{9C}/) {
                $utf8 .= ' ';
                $index +=1;
                next CHAR_LOOP; 
        }

It's not ideal, but it gets rid of that problem well enough for
me.

The next problem happens with the following record (number 54 in
the original batch I posted):

http://liblime.com/public/prob2.mrc

When I run the roundtrip conversion script I get the following error:

Cannot decode string with wide characters at 
/usr/local/lib/perl/5.8.4/Encode.pm line 188.

This time, the script just dies completely and nothing is 
written to disk. The record passes marcdump's tests.

Ed, I'm still waiting for SF to update so I can nab that test
script.

In the meantime, any ideas how to track this one down?

Cheers,

--
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
[EMAIL PROTECTED] |Full Demos at http://liblime.com/koha |1(888)KohaILS

On Thu, May 18, 2006 at 11:16:52AM -0500, Edward Summers wrote:
> So I got curious (thanks to your convo in #code4lib). I isolated the  
> problem to one record:
> 
>       http://www.inkdroid.org/tmp/one.dat
> 
> Your roundtrip conversion complains:
> 
> --
> 
> no mapping found at position 8 in Price : <9c> 7.99;    Inv.#  B  
> 476913;    Date   06/03/98; Supplier : Dawson UK;  Recd 20/03/98;   
> Contents : 1. The problem :     1. Don't bargain over positions;  2.  
> The method :     2. Separate the people from the problem;     3.  
> Focus on interests, not positions;     4. Invent options for mutual  
> gain;     5. Insist on using objective criteria;  3. Yes, but :      
> 6. What if they are more powerful?     7. What if they won't  
> play?     8. What if they use dirty tricks?  4. In conclusion;  5.  
> Ten questions people ask about getting to yes; g0=ASCII_DEFAULT  
> g1=EXTENDED_LATIN at /usr/local/lib/perl5/site_perl/5.8.7/MARC/ 
> Charset.pm line 126.
> 
> --
> 
> So I took a look at that position in the marc record and found a 0x9C  
> character at that position, as the error message indicates. I can't  
> find a 0x9C in either of the mapping tables that this record purports  
> to use:
> 
> BasicLatin (ASCII): http://lcweb2.loc.gov/cocoon/codetables/42.html
> Extended Latin (ANSEL): http://lcweb2.loc.gov/cocoon/codetables/45.html
> 
> Looks like you might want to preprocess those records before  
> translating. Since this character routinely occurs in the 586 field  
> you could use MARC::Record to remove the offending character before  
> writing as XML.
> 
> Hope that helps somewhat. This character conversion stuff is a major  
> pain.
> 
> //Ed

Re: MARC Records, XML, and encoding

Reply via email to