Re: Character sets - kind of solved

John Hammer Wed, 08 Dec 2004 13:31:38 -0800

That's different from what I get. What I get is:

1c1
< 00000000  30 32 33 35 36 63 61 6d  20 20 32 32 30 30 34 38  |02356cam  220048|
---
> 00000000  30 32 33 36 34 63 61 6d  20 20 32 32 30 30 34 38  |02364cam  220048|
21,30c21,30
<differences in the directory not shown>
105,149c105,149
< 00000680  20 1f 61 42 69 73 e5 61  f2 74 e5 69 2c 20 4d 75  | .aBis_________, 
Mu|
< 00000690  f2 68 61 6d 6d 61 64 2e  1f 74 43 6f 6e 76 65 72  
|___ammad..tConver|
< ... not shown>
< 00000930  73 20 1e 1d 0a 0a                                 |s ....|
< 00000936
---
> 00000680  20 1f 61 42 69 73 ef bf  bd 61 ef bf bd 74 ef bf  | .aBis___a___t___
> 00000690  bd 69 2c 20 4d 75 ef bf  bd 68 61 6d 6d 61 64 2e  |i, Mu___hammad.|
< ... not shown>
> 00000930  69 61 20 47 61 6c 65 27  73 20 1e 1d              |ia Gale's ..|
> 0000093c


How would deleting the illegal characters cause changes to the characters in 
lines 680 and 690 above?

John

On Wed, 8 Dec 2004 10:23:38 -0600
Ed Summers <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 07, 2004 at 12:53:44PM -0600, John Hammer wrote:
> > Attached are the two files. The Marc file seems to be using a Windows font 
> > (1251?). As for the program, the same changes occur if I just read the Marc 
> > file and write it back out with no changes. The Perl I am using is 5.8.3
> 
> Ok, I've confirmed that simply reading this record in and writing it out
> will yield a different file. The unix diff program confirms this, but
> does not isolate the difference, since MARC records are not multiline
> documents. 
> 
> Using diff with hexdump provides some more concrete data. First hexdump the
> original file and the processed file like so:
> 
>     % hexdump -C original.dat > original.dump
>     % hexdump -C processed.dat > processed.dump
> 
> Then compare these two files with diff:
> 
>     % diff original.dump processed.dump
> 
> You should see this:
> 
> 148,149c148,149
> < 00000930  73 20 1e 1d 0a 0a                                 |s ....|
> < 00000936
> ---
> > 00000930  73 20 1e 1d                                       |s ..|
> > 00000934
> 
> What this shows is that the original file has two trailing 0a bytes at
> the end of the record, and that the processed file does not. This makes
> sense because MARC::Record was adjusted back in v1.24 (Apr 2003) to
> remove certain illegal characters between records that some library
> systems place there. See line 58 in MARC::File::USMARC in the latest
> version of the MARC-Record distribution if you are curious :-)
> 
> So unless you are unable to reproduce this I think this mystery is solved.
> 
> //Ed

Re: Character sets - kind of solved

Reply via email to