> Is there a good method to do this? I need to remove the stop 
> words from the comment field of every record. There are about 
> 20,000 records. The comments look like this: 
> 
> Yersinia pestis strain Nepal (aka CDC 516 or 369 isolated 
> from human) 16S-23S in tergenic region amplified with 16UNIX 
> and 23UNII primers. Sequencing primers were UNI1 and UNI2   5/25/99^^
> 
> I should remove 'and' 'in' 'with' 'The', etc. I have set up 
> the stop words array. Is there a efficient way to do this?

How about:

 ----code----
 #!perl -w
 use strict;
 
 my ($r,$tmp) = '' x 2;
 my $input = 'blah srand and spin in with within the their';
 my @s_words = qw(and in with the);
 
 for(@s_words) {
   $tmp .= " \\b$_\\b";
   $tmp .= '|' unless $_ eq $s_words[$#s_words];
 }
 $r = qr/$tmp/is;
 print $r;
 
 print "\n\n$input\n\n";
 $input =~ s/$r//g;
 print "$input\n";
 ----end----

It builds a regex using your search words and then applies it to a
string.

HTH,

 -dave



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to