Hi Matthew,

Thanks for the advice. For this particular script, I'm not doing any data 
manipulation, so using :raw is probably the approach I want to take. I'm just 
feeding my script a list of record IDs and a MARC file in order to pull out 
records that have the record ID I'm looking for.

Thanks,
Shelley

----- Original Message -----
> From: "PHILLIPS M.E." <m.e.phill...@durham.ac.uk>
> To: "Shelley Doljack" <sdolj...@stanford.edu>, perl4lib@perl.org
> Sent: Wednesday, August 1, 2012 1:56:17 AM
> Subject: RE: printing UTF-8 encoded MARC records with as_usmarc
> 
> > -----Original Message-----
> > From: Shelley Doljack [mailto:sdolj...@stanford.edu]
> > Sent: 31 July 2012 20:18
> >
> > The problem was I wasn't telling perl to output UTF-8. Now that I
> > added
> > binmode(FILE, ':utf8') to my script, the problem is fixed. However,
> > it sounds
> > like once I set binmode to UTF-8 everything will be interpreted as
> > such, even
> > when the record is in MARC-8. Is that right? So this means that I
> > can only use
> > my script with a file of records where all of them are encoded in
> > UTF-8. If I
> > want to run the script against a file with all MARC-8 encoding,
> > then I'd need
> > to remove the binmode line.
> 
> It depends how much manipulation of the records you are doing in the
> script.  One approach is to use
> 
> binmode(FILE, ':raw');
> 
> for both input and output.  Perl will then keep the bytes of the
> records exactly as they are.  You won't be able to test  for exotic
> characters so easily, and amending field content would be
> inadvisable, but if all you are doing is something like reading in
> the records and filtering out any that have no 245 field, or
> something fairly basic like that, this could be the best approach.
> 
> The MARC::Record module does not seem to care how the records are
> encoded.  It's only once you start altering field content, testing
> field content, or adding fields that the character set being used
> becomes an issue.  Removing fields would be fine too.
> 
> MARC-8 can be very complex, particularly if other code tables like
> CJK are invoked, or even just Greek or Cyrillic.  If you were
> manipulating field content in that kind of way they converting
> everything to UTF-8 would make things very much easier.
> 
> Matthew
> 
> --
> Matthew Phillips
> Electronic Systems Librarian, Durham University
> Durham University Library, Stockton Road, Durham, DH1 3LY
> +44 (0)191 334 2941
> 
> 
> 

Reply via email to