A while back I sent out an email asking if MARC::Record users would be interested (or not) in a version of MARC::Record that could handle utf8 properly and required a Perl >= v5.8.1.

Just to recap: the v5.8.1 requirement is there because Perl needs the bytes::substr() and bytes::length() functions to be able to correctly calculate and use the byte offsets in the record directory. The substr() and length() functions calculate offsets fine when 1 byte == 1 character, but when a character can be multiple bytes (unicode) the bytes pragma needs to be used. The two functions in the bytes pragma are available as of v5.8.1.

I received several responses all of which were positive, so there is now a release candidate of MARC::Record which requires Perl v5.8.2 which will handle utf8 in records. This version is not available on CPAN yet as ideally some of the folks who want utf8 support will get a chance to try it out before it sees a wider release. The tarball is available from sourceforge, or:

        http://tinyurl.com/9vcg4

Perl >= v5.8.2 is required because OS X 10.3 shipped with a Perl 5.8.1 release candidate that strangely lacked bytes::substr(). So to simplify the Makefile.PL and avoid spurious errors from the relentless CPAN testers >= 5.8.2 is required.

The only api change is a new method MARC::Record::encoding() which can be used to get/set the encoding on a record. Note: this doesn't actually transform the encoding of characters in the record, it just toggles position 9 in the leader. It assumes you know what you are doing when you set it :-)

I'd be interested in any/all feedback if you try it out. Special thanks to Anne Highsmith of Texas A&M who provided support for this enhancement.

//Ed



Reply via email to