Thank you for all the input, and I think I have resolved my particular issue.
Battle won. War still raging.
Using the script suggested by Galen as an starting point, I wrote the following
hack outputting integers denoting MARC records containing non-UTF-8 characters,
but the script output noth
Eric,
> How can I figure out whether or not a MARC record contains ONLY characters
> from the UTF-8 character set?
You can use a regex to check if a string is utf-8. There are various examples
floating around the internet. An example is the one here:
http://www.w3.org/International/questions
Hi,
On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan wrote:
> Put another way, how can I determine whether or not position #9 of a given
> MARC leader is accurate? If position #9 is an "a", then how can I read the
> balance of the record to determine whether or not all the characters really
>
helley
- Original Message -
> From: "Eric Lease Morgan"
> To: perl4lib@perl.org
> Sent: Wednesday, March 27, 2013 2:11:26 PM
> Subject: Re: reading and writing of utf-8 with marc::batch [double encoding]
>
>
> On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan
> wro
On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan wrote:
> When it calls as_usmarc, I think MARC::Batch tries to honor the value set in
> position #9 of the leader. In other words, if the leader is empty, then it
> tries to output records as MARC-8, and when the leader is a value of "a", it
> tr
On Mar 27, 2013, at 2:20 PM, Eric Lease Morgan wrote:
> A number of people have alluded to the problem of double encoding, and I'm
> beginning to think this is true.
When it calls as_usmarc, I think MARC::Batch tries to honor the value set in
position #9 of the leader. In other words, if the
Hi,
On Wed, Mar 27, 2013 at 11:20 AM, Eric Lease Morgan wrote:
> I have isolated a number of problem records. They all contain diacritics,
> but they do not have an "a" in position #9 of the leader --
> http://dh.crc.nd.edu/tmp/original.marc Can someone verify that the file
> contains UTF-8 cha
A number of people have alluded to the problem of double encoding, and I'm
beginning to think this is true.
I have isolated a number of problem records. They all contain diacritics, but
they do not have an "a" in position #9 of the leader --
http://dh.crc.nd.edu/tmp/original.marc Can someone
Sent: Tuesday, March 26, 2013 1:22:03 PM
> Subject: reading and writing of utf-8 with marc::batch
>
>
> For the life of me I can't figure out how to do reading and writing
> of UTF-8 with MARC::Batch.
>
> I have a UTF-8 encoded file of MARC records. Dumping the records
Hi Eric,
On Wed, Mar 27, 2013 at 10:26 AM, Eric Lease Morgan wrote:
> While I'm not positive my terminal is doing UTF-8, I think it is. When I
> dump in the beginning the output to the terminal is correct. After I run my
> script the output to the same terminal is incorrect.
>
Would you be will
On Mar 26, 2013, at 5:57 PM, Leif Andersson wrote:
> my first guess would be your terminal is not utf8.
While I'm not positive my terminal is doing UTF-8, I think it is. When I dump
in the beginning the output to the terminal is correct. After I run my script
the output to the same terminal i
Hi,
On Wed, Mar 27, 2013 at 7:01 AM, Jon Gorman wrote:
> One piece of advice is not to trust the terminal directly but pipe
> into xxd. (And if possible, just try transforming the offending
> record). Or use yaz-marcdump -v, which will also give the hex if I
> remember correctly. (If it's c3 a9
Ok, I can't claim to be an expert, but from my own experience, I'd say
Paul is very likely right about double-encoding occuring. However,
the question ends up being where that happens, and in this case I
suspect how MARC::Batch will work could depend heavily on what version
of perl you're running
e-first-time
Morgan!"
Mike
> -Original Message-
> From: Leif Andersson [mailto:leif.anders...@sub.su.se]
> Sent: Tuesday, March 26, 2013 5:57 PM
> To: Eric Lease Morgan; perl4lib@perl.org
> Subject: Re: reading and writing of utf-8 with marc::batch
>
> Hi Eric,
>
t: reading and writing of utf-8 with marc::batch
For the life of me I can't figure out how to do reading and writing of UTF-8
with MARC::Batch.
I have a UTF-8 encoded file of MARC records. Dumping the records and greping
for a particular string illustrates the validity:
$ marcdump
rsson
Stockholm UL
Från: Eric Lease Morgan [emor...@nd.edu]
Skickat: den 26 mars 2013 21:22
Till: perl4lib@perl.org
Ämne: reading and writing of utf-8 with marc::batch
For the life of me I can't figure out how to do reading and writing of UTF-8
with MA
Do your records have the utf8 encoding byte set in the LDR? (Byte 9 should
be 'a' for utf8).
-Tim
Timothy Prettyman
University of Michigan LIbrary/LIT
On Tue, Mar 26, 2013 at 4:22 PM, Eric Lease Morgan wrote:
>
> For the life of me I can't figure out how to do reading a
On Tue, Mar 26, 2013 at 04:22:03PM -0400, Eric Lease Morgan wrote:
> For the life of me I can't figure out how to do reading and writing of
> UTF-8 with MARC::Batch.
>
> I have a UTF-8 encoded file of MARC records. Dumping the records and
> greping for a particular st
For the life of me I can't figure out how to do reading and writing of UTF-8
with MARC::Batch.
I have a UTF-8 encoded file of MARC records. Dumping the records and greping
for a particular string illustrates the validity:
$ marcdump und.marc | grep Sainte-Face
und.marc
1000 re
19 matches
Mail list logo