Hi Jane, These answers assume that the data you are processing: 1) is encoded in the MARC-8 character set, and 2) consists of the MARC-8 default basic and extended Latin characters.
> Dave,Ayod\2003 > Paòt,Kaâs\2002 > Baks,Dasa\2003 > ,Viâs\2002 > > Problem 1: As you can see, I don't really want the first four > characters, I want the first four SEARCHABLE characters. How > can I tell MARC Record to give me the first four characters, > excluding diacritics? Assuming that you asking how to strip out the MARC-8 combining diacritic characters, try inserting the substitution commands listed (as shown below) just prior to the substr commands: > my $ME = $field->subfield('a'); $ME =~ s/[\xE1-\xFE]//g; > my $four100 = substr( $ME, 0, 4 ); > my $TITLE = $field->subfield('a'); $TITLE =~ s/[\xE1-\xFE]//g; > my $four245 = substr( $TITLE, 0, 4 ); -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Jacobs, Jane W [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 11, 2005 12:30 PM > To: perl4lib@perl.org > Subject: Ignoring Diacritics accessing Fixed Field Data > > Hi folks, > > I'm trying to write a routine to construct a text file of > OCLC search key from a group of existing records. What I > want is something like: > > Brah,vasa/2003 > > That is 1st four letters of 100 + comma + 1st four letters of > 245 + slash + date. > > In principle I have this working with: > > > open( FOURS, ">4-4-date.txt" ); > > > while ( my $r = $batch->next() ) { > > my @fields = $r->field( '100' ); > foreach my $field ( @fields ) { > my $ME = $field->subfield('a'); > my $four100 = substr( $ME, 0, 4 ); > > print FOURS "$four100"; > } > > my @fields = $r->field( '245' ); > foreach my $field ( @fields ) { > my $TITLE = $field->subfield('a'); > my $four245 = substr( $TITLE, 0, 4 ); > print FOURS ",$four245"; > } > > my @fields = $r->field( '260' ); > foreach my $field ( @fields ) { > my $PD = $field->subfield('c'); > my $four260 = substr( $PD, 0, 4); > print FOURS "\\$four260\n"; > } > > > My result was something like: > > Dave,Ayod\2003 > Paòt,Kaâs\2002 > Baks,Dasa\2003 > ,Viâs\2002 > > Problem 1: As you can see, I don't really want the first four > characters, I want the first four SEARCHABLE characters. How > can I tell MARC Record to give me the first four characters, > excluding diacritics? > > Problem 2: In these examples 260 $c works OK, but I could > get a cleaner result by accessing the date from the fixed > field (008 07-10). How would I do that? I was looking in > the tutorial, but couldn't seem to find anything that seemed > to help. If I'm missing something there please point it up. > > Thanks in advance to anyone who can help. > > > JJ > > > > **Views expressed by the author do not necessarily represent > those of the Queens Library.** > > Jane Jacobs > Asst. Coord., Catalog Division > Queens Borough Public Library > 89-11 Merrick Blvd. > Jamaica, NY 11432 > > tel.: (718) 990-0804 > e-mail: [EMAIL PROTECTED] > FAX. (718) 990-8566 > >