Sorry I didn't make it clear in my original posting that the record that I modifed, using MARC::Record, DID have unicode-encoding. I didn't just change the leader/09 to try to fool it into thinking it was Unicode; the record came out of a database that had been converted to Unicode. And the 245 field had 3 characters with diacritics in it, those character+diacritic sequences did consume 2 bytes each.
Anne L. Highsmith Consortia Systems Coordinator 5000 TAMU Evans Library Texas A&M University College Station, TX 77843-5000 [EMAIL PROTECTED] 979-862-4234 979-845-6238 (fax) >>> "Doran, Michael D" <[EMAIL PROTECTED]> 03/07/05 09:06AM >>> Hi Ed, > How would people feel about the next version of MARC-Record (perhaps > a v2.0) which handled utf8 properly and required a modern perl? Definitely a *good* thing. Worth upgrading Perl version for, if necessary. > Perhaps if people could respond to the list (or me if you prefer) with > the version of Perl that you use MARC::Record with I could keep > tallies and report back to the list. I have MARC::Record installed on two machines: 1) Perl 5.6.1 & MARC::Record 0.94 2) Perl 5.8.5 & MARC::Record 1.4 > > Here's my main question -- is that the principal > > concern/question/problem, i.e. that directory lengths will not be > > computed correctly using the existing MARC::Record module with a > > Unicode record? Or is it only in certain situations that > > the directory length would not be computed correctly? > > Yes, but only if the record actually contains unicode :) My understanding of Anne's posting was that the record she tested *did* contain unicode: "I started with the Unicode version of the record and modified it...". -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Ed Summers [mailto:[EMAIL PROTECTED] > Sent: Monday, March 07, 2005 8:37 AM > To: perl4lib@perl.org > Subject: Re: MARC::Record and UTF-8 & related threads > > On Fri, Mar 04, 2005 at 09:18:00AM -0500, Anne L. Highsmith wrote: > > Here's my main question -- is that the principal > > concern/question/problem, i.e. that directory lengths will not be > > computed correctly using the existing MARC::Record module with a > > Unicode record? Or is it only in certain situations that > the directory > > length would not be computed correctly? > > Yes, but only if the record actually contains unicode :) If you are > looking for an example of how MARC::Record breaks when there is utf8 > in the record you can look at t/utf8.t which is a test > distributed with > the MARC-Record package. Currently, this test is skipped > because otherwise > it would fail. > > > If anyone is inspired to make the necessary updates to the > MARC::Record module to handle unicode records, I'd certainly > be happy to test. I'd also be eternally grateful, since my > alternative might be re-writing 8 or 10 job streams in the > next 10 weeks so that I can: 1) export the records from my > database in MARC8; 2) edit them; 3) reload them doing a > MARC8-Unicode conversion utility provided by the lms vendor. > > I've been meaning to write to the list about this for > sometime now. How > would people feel about the next version of MARC-Record (perhaps a > v2.0) which handled utf8 properly and required a modern perl? > By modern > perl I mean a version >= 5.8.1. The reason why 5.8.1 is > required is that > it's the first perl with a byte oriented substr() (available via the > bytes pragma). > > Perhaps if people could respond to the list (or me if you prefer) with > the version of Perl that you use MARC::Record with I could > keep tallies > and report back to the list. > > //Ed >