Hi Michael:

I've sent this message to [EMAIL PROTECTED] since the list at Rice is 
being retired. If you need information about the new list visit
http://perl4lib.perl.org. 

On Wed, 30 Jul 2003, Michael D Doran wrote:
> 1) In the process of converting MARC-8 to UTF-8, are escape sequences, if
> present, removed?  

Yes, the escape sequences are no longer present in the UTF-8, since they 
no longer have any meaning.

> 2) Am I assuming correctly that a complete MARC record can not be used as
> input to MARC-Charset, since character set conversion may change the length
> of a string (in octets) thereby invalidating leader and directory data?

It is not possible to pass a complete MARC::Record object to 
MARC::Charset, however it might be worthwhile to create a method for doing 
this. MARC::Charset is really just designed to make it easy to convert a 
string of characters into UTF8. 

You are right, there is some difficulty computing the record length properly, 
since MARC::Record uses character length, rather than byte length, which are 
the same thing when you are dealing with 8 bit characters. Richard Lammert at 
Concordia Theological Seminary did some experiments encoding MARC records in 
UTF-8 using MARC::Record and MARC::Charset, I could get you his email if you 
are interested in getting in touch w/ him.

> I realize I could test these assumptions by actually *using* the module, but
> I have not yet upgraded my Perl from 5.6.1 to 5.8.  (BTW, the README has
> Perl 5.6.0 dependency whereas Charset.pm has "use 5.8.0;".)

Thanks for pointing that out. I'll update the README. 5.8 is unfortunately 
a requirement since it is the first version of Perl that has "stable" 
Unicode support.

Best,
//Ed


Reply via email to