Re: MARC::Record and UTF-8

2005-01-06 Thread Ron Davies
At 07:50 7/01/2005, [EMAIL PROTECTED] wrote: Does anyone know of any work underway to adapt MARC::Record for utf-8 encoding ? I will have a similar project in a few months' time, converting a whole bunch of processing from MARC-8 to UTF-8. I would be very happy to assist in testing or development

MARC::Record and UTF-8

2005-01-06 Thread Ian . Hamilton
As I understand it MARC::Record cannot be used with utf-8 encoded MARC data (i.e. it is limited to MARC-8 encodings) Is this the case ? Does anyone know of any work underway to adapt MARC::Record for utf-8 encoding ? Ian PS We're in the process of migrating from a MARC21 MARC-8 ANSEL environmen

Re: MARC::Record tests and MicroLIF.pm

2005-01-06 Thread Bryan Baldus
MARC::File::MicroLIF::decode() has been updated to change all \0a and \0d to the \n of the platform prior to decoding. t/81.decode.t has been updated to look at lineendings-0a.lif, lineendings-0d.lif, and lineendings-0d0a.lif, instead of sample1.lif. I also revised MARC::Doc::Tutorial.pod to re

Re: MARC::Record tests

2005-01-06 Thread Ed Summers
On Thu, Jan 06, 2005 at 10:52:14AM -0600, [EMAIL PROTECTED] wrote: > _get_chunk() is coded to handle "any combination of \r and \n of any > length". Is it not functioning that way? Thanks for the clarification Mike. I didn't look close enough at _get_chunk() to see it is handling the three differ

RE: MARC::Record tests

2005-01-06 Thread Bryan Baldus
On Thursday, January 06, 2005 10:52 AM, Mike O'Regan wrote: >_get_chunk() is coded to handle "any combination of \r and \n of any >length". Is it not functioning that way? _get_chunk() seems to work fine. The problem is with decode(), which depends upon the line endings having been converted to \

Re: MARC::Record tests

2005-01-06 Thread moregan
Yes, \r is legitimate. In MicroLIF you would terminate lines as natural to the platform. The presumption was that if you transfer files among platforms they would be treated as text files and line endings translated as necessary. As always in the bibliographic data world, practices varied, a

Re: MARC::Record tests

2005-01-06 Thread Ed Summers
On Thu, Jan 06, 2005 at 10:03:46AM -0600, Bryan Baldus wrote: > Perhaps the: > # for ease, make the newlines match this platform > $lifrec =~ s/[\x0a\x0d]+/\n/g if defined $lifrec; > > in _next() should be moved (or added as duplicate code) to decode() just > between the lines: > my $marc = MA

RE: MARC::Record tests

2005-01-06 Thread Bryan Baldus
>There is code in MARC::File::MicroLIF::_get_chunk that handles DOS >(\r\n) and Unix (\n) line endings, but not Mac (\r). This is true, and it seems to work. Unfortunately, it is not reached by the test, since the test calls decode() directly, instead of going through _next() or _get_chunk. Perha

Re: MARC::Record tests

2005-01-06 Thread Ed Summers
I'm thinking that the MicroLIF failure is due to line endings being different on Mac versions < OS X. There is code in MARC::File::MicroLIF::_get_chunk that handles DOS (\r\n) and Unix (\n) line endings, but not Mac (\r). Does anyone know if \r is a legit line ending in MicroLIF? //Ed

Re: inserting diacrtics

2005-01-06 Thread Leif Andersson
And I am sure some guys will be more happy with a capital F in France. Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281 -Ursprungligt meddelande- Från: Mani