A bulletin from the "haste makes waste" department...
> $ME =~ s/[\xE1-\xFE]//g;
> $TITLE =~ s/[\xE1-\xFE]//g;
Ooops, that should be "E0" instead of "E1" as the first hex value in the
substitutions:
$ME =~ s/[\xE0-\xFE]//g;
$TITLE =~ s/[\xE0-\xFE]//g;
Sorry,
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -----Original Message-----
> From: Doran, Michael D
> Sent: Tuesday, January 11, 2005 2:13 PM
> To: [email protected]
> Subject: RE: Ignoring Diacritics accessing Fixed Field Data
>
> Hi Jane,
>
> These answers assume that the data you are processing:
> 1) is encoded in the MARC-8 character set, and
> 2) consists of the MARC-8 default basic and extended Latin characters.
>
> > Dave,Ayod\2003
> > Pa�t,Ka�s\2002
> > Baks,Dasa\2003
> > ,Vi�s\2002
> >
> > Problem 1: As you can see, I don't really want the first four
> > characters, I want the first four SEARCHABLE characters. How
> > can I tell MARC Record to give me the first four characters,
> > excluding diacritics?
>
> Assuming that you asking how to strip out the MARC-8
> combining diacritic characters, try inserting the
> substitution commands listed (as shown below) just prior to
> the substr commands:
>
> > my $ME = $field->subfield('a');
> $ME =~ s/[\xE1-\xFE]//g;
> > my $four100 = substr( $ME, 0, 4 );
>
> > my $TITLE = $field->subfield('a');
> $TITLE =~ s/[\xE1-\xFE]//g;
> > my $four245 = substr( $TITLE, 0, 4 );
>
> -- Michael
>
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 cell
> # [EMAIL PROTECTED]
> # http://rocky.uta.edu/doran/
>
> > -----Original Message-----
> > From: Jacobs, Jane W [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 11, 2005 12:30 PM
> > To: [email protected]
> > Subject: Ignoring Diacritics accessing Fixed Field Data
> >
> > Hi folks,
> >
> > I'm trying to write a routine to construct a text file of
> > OCLC search key from a group of existing records. What I
> > want is something like:
> >
> > Brah,vasa/2003
> >
> > That is 1st four letters of 100 + comma + 1st four letters of
> > 245 + slash + date.
> >
> > In principle I have this working with:
> >
> >
> > open( FOURS, ">4-4-date.txt" );
> >
> >
> > while ( my $r = $batch->next() ) {
> >
> > my @fields = $r->field( '100' );
> > foreach my $field ( @fields ) {
> > my $ME = $field->subfield('a');
> > my $four100 = substr( $ME, 0, 4 );
> >
> > print FOURS "$four100";
> > }
> >
> > my @fields = $r->field( '245' );
> > foreach my $field ( @fields ) {
> > my $TITLE = $field->subfield('a');
> > my $four245 = substr( $TITLE, 0, 4 );
> > print FOURS ",$four245";
> > }
> >
> > my @fields = $r->field( '260' );
> > foreach my $field ( @fields ) {
> > my $PD = $field->subfield('c');
> > my $four260 = substr( $PD, 0, 4);
> > print FOURS "\\$four260\n";
> > }
> >
> >
> > My result was something like:
> >
> > Dave,Ayod\2003
> > Pa�t,Ka�s\2002
> > Baks,Dasa\2003
> > ,Vi�s\2002
> >
> > Problem 1: As you can see, I don't really want the first four
> > characters, I want the first four SEARCHABLE characters. How
> > can I tell MARC Record to give me the first four characters,
> > excluding diacritics?
> >
> > Problem 2: In these examples 260 $c works OK, but I could
> > get a cleaner result by accessing the date from the fixed
> > field (008 07-10). How would I do that? I was looking in
> > the tutorial, but couldn't seem to find anything that seemed
> > to help. If I'm missing something there please point it up.
> >
> > Thanks in advance to anyone who can help.
> >
> >
> > JJ
> >
> >
> >
> > **Views expressed by the author do not necessarily represent
> > those of the Queens Library.**
> >
> > Jane Jacobs
> > Asst. Coord., Catalog Division
> > Queens Borough Public Library
> > 89-11 Merrick Blvd.
> > Jamaica, NY 11432
> >
> > tel.: (718) 990-0804
> > e-mail: [EMAIL PROTECTED]
> > FAX. (718) 990-8566
> >
> >
>