RE: MARC::Record and UTF-8 & related threads

Anne Highsmith Mon, 07 Mar 2005 08:29:12 -0800

Sorry I didn't make it clear in my original posting that the record that
I modifed, using MARC::Record, DID have unicode-encoding. I didn't just
change the leader/09 to try to fool it into thinking it was Unicode; the
record came out of a database that had been converted to Unicode. And
the 245 field had 3 characters with diacritics in it, those
character+diacritic sequences did consume 2 bytes each.


Anne L. Highsmith
Consortia Systems Coordinator
5000 TAMU
Evans Library
Texas A&M University
College Station, TX   77843-5000
[EMAIL PROTECTED]
979-862-4234
979-845-6238 (fax)

>>> "Doran, Michael D" <[EMAIL PROTECTED]> 03/07/05 09:06AM >>>
Hi Ed,

> How would people feel about the next version of MARC-Record (perhaps
> a v2.0) which handled utf8 properly and required a modern perl? 

Definitely a *good* thing.  Worth upgrading Perl version for, if
necessary.
 
> Perhaps if people could respond to the list (or me if you prefer)
with
> the version of Perl that you use MARC::Record with I could keep
> tallies and report back to the list.

I have MARC::Record installed on two machines:
1) Perl 5.6.1 & MARC::Record 0.94
2) Perl 5.8.5 & MARC::Record 1.4

> > Here's my main question -- is that the principal
> > concern/question/problem, i.e. that directory lengths will not be
> > computed correctly using the existing MARC::Record module with a
> > Unicode record? Or is it only in certain situations that 
> > the directory length would not be computed correctly?
> 
> Yes, but only if the record actually contains unicode :)

My understanding of Anne's posting was that the record she tested
*did*
contain unicode: "I started with the Unicode version of the record and
modified it...".

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED] 
# http://rocky.uta.edu/doran/ 

> -----Original Message-----
> From: Ed Summers [mailto:[EMAIL PROTECTED] 
> Sent: Monday, March 07, 2005 8:37 AM
> To: perl4lib@perl.org 
> Subject: Re: MARC::Record and UTF-8 & related threads
> 
> On Fri, Mar 04, 2005 at 09:18:00AM -0500, Anne L. Highsmith wrote:
> > Here's my main question -- is that the principal
> > concern/question/problem, i.e. that directory lengths will not be
> > computed correctly using the existing MARC::Record module with a
> > Unicode record? Or is it only in certain situations that 
> the directory
> > length would not be computed correctly?
> 
> Yes, but only if the record actually contains unicode :) If you are
> looking for an example of how MARC::Record breaks when there is utf8

> in the record you can look at t/utf8.t which is a test 
> distributed with
> the MARC-Record package. Currently, this test is skipped 
> because otherwise 
> it would fail.
> 
> > If anyone is inspired to make the necessary updates to the 
> MARC::Record module to handle unicode records, I'd certainly 
> be happy to test. I'd also be eternally grateful, since my 
> alternative might be re-writing 8 or 10 job streams in the 
> next 10 weeks so that I can: 1) export the records from my 
> database in MARC8; 2) edit them; 3) reload them doing a 
> MARC8-Unicode conversion utility provided by the lms vendor.
> 
> I've been meaning to write to the list about this for 
> sometime now. How
> would people feel about the next version of MARC-Record (perhaps a
> v2.0) which handled utf8 properly and required a modern perl? 
> By modern
> perl I mean a version >= 5.8.1. The reason why 5.8.1 is 
> required is that
> it's the first perl with a byte oriented substr() (available via the
> bytes pragma).
> 
> Perhaps if people could respond to the list (or me if you prefer)
with
> the version of Perl that you use MARC::Record with I could 
> keep tallies
> and report back to the list.
> 
> //Ed
>

RE: MARC::Record and UTF-8 & related threads

Reply via email to