Hi Matthew, Thanks for the advice. For this particular script, I'm not doing any data manipulation, so using :raw is probably the approach I want to take. I'm just feeding my script a list of record IDs and a MARC file in order to pull out records that have the record ID I'm looking for.
Thanks, Shelley ----- Original Message ----- > From: "PHILLIPS M.E." <m.e.phill...@durham.ac.uk> > To: "Shelley Doljack" <sdolj...@stanford.edu>, perl4lib@perl.org > Sent: Wednesday, August 1, 2012 1:56:17 AM > Subject: RE: printing UTF-8 encoded MARC records with as_usmarc > > > -----Original Message----- > > From: Shelley Doljack [mailto:sdolj...@stanford.edu] > > Sent: 31 July 2012 20:18 > > > > The problem was I wasn't telling perl to output UTF-8. Now that I > > added > > binmode(FILE, ':utf8') to my script, the problem is fixed. However, > > it sounds > > like once I set binmode to UTF-8 everything will be interpreted as > > such, even > > when the record is in MARC-8. Is that right? So this means that I > > can only use > > my script with a file of records where all of them are encoded in > > UTF-8. If I > > want to run the script against a file with all MARC-8 encoding, > > then I'd need > > to remove the binmode line. > > It depends how much manipulation of the records you are doing in the > script. One approach is to use > > binmode(FILE, ':raw'); > > for both input and output. Perl will then keep the bytes of the > records exactly as they are. You won't be able to test for exotic > characters so easily, and amending field content would be > inadvisable, but if all you are doing is something like reading in > the records and filtering out any that have no 245 field, or > something fairly basic like that, this could be the best approach. > > The MARC::Record module does not seem to care how the records are > encoded. It's only once you start altering field content, testing > field content, or adding fields that the character set being used > becomes an issue. Removing fields would be fine too. > > MARC-8 can be very complex, particularly if other code tables like > CJK are invoked, or even just Greek or Cyrillic. If you were > manipulating field content in that kind of way they converting > everything to UTF-8 would make things very much easier. > > Matthew > > -- > Matthew Phillips > Electronic Systems Librarian, Durham University > Durham University Library, Stockton Road, Durham, DH1 3LY > +44 (0)191 334 2941 > > >