Hi Michael: I've sent this message to [EMAIL PROTECTED] since the list at Rice is being retired. If you need information about the new list visit http://perl4lib.perl.org.
On Wed, 30 Jul 2003, Michael D Doran wrote: > 1) In the process of converting MARC-8 to UTF-8, are escape sequences, if > present, removed? Yes, the escape sequences are no longer present in the UTF-8, since they no longer have any meaning. > 2) Am I assuming correctly that a complete MARC record can not be used as > input to MARC-Charset, since character set conversion may change the length > of a string (in octets) thereby invalidating leader and directory data? It is not possible to pass a complete MARC::Record object to MARC::Charset, however it might be worthwhile to create a method for doing this. MARC::Charset is really just designed to make it easy to convert a string of characters into UTF8. You are right, there is some difficulty computing the record length properly, since MARC::Record uses character length, rather than byte length, which are the same thing when you are dealing with 8 bit characters. Richard Lammert at Concordia Theological Seminary did some experiments encoding MARC records in UTF-8 using MARC::Record and MARC::Charset, I could get you his email if you are interested in getting in touch w/ him. > I realize I could test these assumptions by actually *using* the module, but > I have not yet upgraded my Perl from 5.6.1 to 5.8. (BTW, the README has > Perl 5.6.0 dependency whereas Charset.pm has "use 5.8.0;".) Thanks for pointing that out. I'll update the README. 5.8 is unfortunately a requirement since it is the first version of Perl that has "stable" Unicode support. Best, //Ed