Hello all, I'm brand new to this list, and I need some help with a particular issue. I searched through the mailing list archives but didn't find anything directly addressing this--despite the seeming popularity of questions about character sets--so I thought I'd ask.
I've written a perl script that extracts data from a MySQL database, uses MARC::Record to map that data to MARC, and outputs the MARC record (based on a script written by Brian Surratt of Texas A&M University). The resulting records need to have all data encoded in MARC-8 format (for loading into OCLC and into our local catalog). The data in the MySQL database is encoded using ISO 8859-1 (latin-1). The MARC records output by the script work fine so long as they don't contain diacritics (or other weird stuff). When they do contain diacritics, those diacritics come out incorrectly when the MARC record is read by a program expecting MARC-8 (because the diacritics are encoded in latin-1). So, is there an easy way to translate from latin-1 encoding to MARC-8/ANSEL? I've been unable to find any perl modules that help me with this outside of MARC::Charset. Unfortunately, we're having trouble getting that module installed on our machine. During the test install, it says it requires the module DB_File, and during the test install of that, it fails (not sure what the error message is--I'd have to ask the admin of that machine). We're running Perl v5.8.3. FWIW, I did try manually searching/replacing diacritics in the extracted database fields before converting to MARC and it worked fine (I tried it on a record that contained Spanish, so there were limited characters that applied). In order for this approach to be viable, I'd have to map ALL the latin-1 characters to their MARC-8 counterparts, which would be a time-consuming process. On top of this, there are a few records containing the characters hex EF BF BD, which is the UTF-8 replacement character. I'm a bit mystified as to where this is coming from, and it would be trivial enough to simply strip it out, but this approach doesn't guarantee that the script will catch all non-MARC-8 characters. That's why I'd really prefer to use MARC::Charset for this--it needs to be robust enough that I won't have to baby-sit it all the time. So, I suppose my question is two-fold. 1. Has anyone had similar problems getting MARC::Charset installed? Could you offer any advice that I can pass along as to how to get it installed? 2. Are there any other perl modules that will convert latin-1 to MARC-8/ANSEL? Thanks in advance for any help you can offer. Jason Thomale Metadata Librarian Texas Tech University Libraries (806) 742-2240