RE: Character sets - kind of solved

2004-12-09 Thread Bryan Baldus
>For that matter I think it's time to remove MARC.pm as well :) I can do the latter unless there are objections. I'm taking inspiration from MARC.pm for the MARC::File::MARCMaker module I'm working on, but I've got a local copy of v. 1.13 from the SourceForge files page. Removing the older v. 1.07

Re: Character sets - kind of solved

2004-12-09 Thread Ed Summers
On Thu, Dec 09, 2004 at 10:32:25AM -0600, John Hammer wrote: > That fixed the problem, going back a version. That will teach me not to > use a beta version for production. Perhaps v1.39_01 should be removed from CPAN to avoid any further confusion. For that matter I think it's time to remove MARC

Re: Character sets - kind of solved

2004-12-09 Thread John Hammer
That fixed the problem, going back a version. That will teach me not to use a beta version for production. Thanks to all who took the time to help me with this, especially Ed. John On Wed, 8 Dec 2004 19:57:26 -0600 Ed Summers <[EMAIL PROTECTED]> wrote: > On Wed, Dec 08, 2004 at 05:47:23PM -060

Re: Character sets - kind of solved

2004-12-08 Thread Ed Summers
On Wed, Dec 08, 2004 at 05:47:23PM -0600, John Hammer wrote: > MARC::Record version 1.39_01. Using diff there is no difference in the > files when using Perl to read in and write out the data. Can you try downgrading to v1.38? v1.39_01 has some experimental utf8 handling code in it which was rele

Re: Character sets - kind of solved

2004-12-08 Thread John Hammer
MARC::Record version 1.39_01. Using diff there is no difference in the files when using Perl to read in and write out the data. John On Wed, 8 Dec 2004 15:43:29 -0600 Ed Summers <[EMAIL PROTECTED]> wrote: > On Wed, Dec 08, 2004 at 03:31:18PM -0600, John Hammer wrote: > > How would deleting the

Re: Character sets - kind of solved

2004-12-08 Thread Ed Summers
On Wed, Dec 08, 2004 at 03:31:18PM -0600, John Hammer wrote: > How would deleting the illegal characters cause changes to the characters in > lines 680 and 690 above? It doesn't explain it :) What version of MARC::Record are you using? What happens when you use perl to read in the data and write

Re: Character sets - kind of solved

2004-12-08 Thread John Hammer
That's different from what I get. What I get is: 1c1 < 30 32 33 35 36 63 61 6d 20 20 32 32 30 30 34 38 |02356cam 220048| --- > 30 32 33 36 34 63 61 6d 20 20 32 32 30 30 34 38 |02364cam 220048| 21,30c21,30 105,149c105,149 < 0680 20 1f 61 42 69 73 e5 61 f2 74 e5 69 2

Re: Character sets - kind of solved

2004-12-08 Thread Ed Summers
On Tue, Dec 07, 2004 at 12:53:44PM -0600, John Hammer wrote: > Attached are the two files. The Marc file seems to be using a Windows font > (1251?). As for the program, the same changes occur if I just read the Marc > file and write it back out with no changes. The Perl I am using is 5.8.3 Ok, I

Re: Character sets - kind of solved

2004-12-07 Thread Ed Summers
John Hammer wrote: > You are correct in assuming the locale environment is set up for UTF-8 > on my computer. However, that wouldn't explain why the record is > different pre-processing vs. post-processing with MARC::Record. Viewing > the two records with the same app (in this case vi) gives differ

Re: Character sets - kind of solved?

2004-12-06 Thread John Hammer
On Mon, 6 Dec 2004 08:54:21 -0600 "Doran, Michael D" <[EMAIL PROTECTED]> wrote: > The original record from John Hammer did not contain UTF-8, it contained > MARC-8. I believe that the fact that the combining MARC-8 characters > were replaced by a generic replacement character only indicates that

Updating MARC::File::XML (was Re: Character sets - kind of solved?)

2004-12-06 Thread Mike Rylander
ld rather I not. :) -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org > > > -- Michael > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 cell > # [E

RE: Character sets - kind of solved?

2004-12-06 Thread Doran, Michael D
s Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Mike Rylander [mailto:[EMAIL PROTECTED] > Sent: Saturday, December 04, 2004 1:31 PM > To: [EMAIL PROTECTED] >

Re: Character sets - kind of solved?

2004-12-05 Thread Ed Summers
On Sat, Dec 04, 2004 at 02:30:53PM -0500, Mike Rylander wrote: > I've got a working patch that correctly transcodes records from > USMARC(MARC-8) to MARC21slim(UTF8) and back again. Mike, would you like CVS access priveledges on the sourceforge site so you can commit this stuff? I'm not actively u

Re: Character sets - kind of solved?

2004-12-04 Thread Mike Rylander
I've run into some record encoding issues myself, though not the problem from below. In any case, this got me thinking about the current state of MARC::File::XML, specifically that it could not handle MARC8 encoded records. I submitted a patch a while back to hack around this, but that just lets

RE: Character sets - kind of solved?

2004-12-03 Thread Doran, Michael D
First off, Ashley's suggestion that the original encoding was likely MARC-8 is correct. The author's Arabic name, transliterated into the Latin alphabet, should be "Bis{latin small letter a with macron}{latin small letter t with dot below}{latin small letter i with macron}, Mu{latin small letter h