Re: There has to be a way to do this

Jeff 'japhy' Pinyan Mon, 23 Jun 2003 08:31:11 -0700

On Jun 23, [EMAIL PROTECTED] said:

>SRED. SREDNE
>SEV.  SEVERN


># Match it at beginning of line
>$cgname =~ s/^SRED\.(?=[\W\s\-\d]+)/SREDNE:/g ;

Three things -- the + modifier on the [...] isn't needed, you don't need
to put \s and - in a character class you've already put \W in, and the /g
modifier is totally worthless here... there's only ONE beginning of the
line!

  $cgname =~ s/^SRED\.(?=[\W\d])/SREDNE:/;

># Match it within the line
>$cgname =~ s/[\W\s\-]+SRED\.(?=[\W\s\-\d]+)/:SREDNE:/g ;

I have a feeling you want to use \b instead of [\W\s-].  It's cleaner and
doesn't actually absorb a character.

  $cgname =~ s/\bSRED\.(?=[\W\d])/:SREDNE:/g;

># Match it at end of line
>$cgname =~ s/[\W\s\-]+SRED\.$/:SREDNE:/g ;

Again, use \b, but there's no need for /g here.

  $cgname =~ s/\bSRED\.$/:SREDNE:/;

># Match if it begins & ends line
>$cgname =~ s/^SRED\.$/:SREDNE:/g ;

Ah, here's an interesting case.  This is actually already handled by my
modifications.  The problem is that you were using

  /[\W\s\-]+SRED\.$/

but if the string is "SRED.", then [\W\s\-] can't match anything.  So
that's why using a word boundary (\b) is smarter.  Also, we can change the
look-aheads to go from positive to negative.

Instead of saying "and I am followed by a non-letter", why not say "and I
am NOT followed by a letter"?

  $cgname =~ s/^SRED\.(?![A-Za-z])/SREDNE:/;     # front
  $cgname =~ s/\bSRED\.(?![A-Za-z])/:SREDNE:/g;  # middle
  $cgname =~ s/\bSRED\.$/:SREDNE:/;              # end

If you're worried about hardcoding the letter set (A-Za-z), then you can
use this character class instead:  [^\W\d_].  It means "match anything
that's not:  a non-word character, a digit, or an underscore".  It's a
sneaky way of matching anything that would be matched by \w WITHOUT
matching \d or _.

  $cgname =~ s/^SRED\.(?![^\W\d_])/SREDNE:/;     # front
  $cgname =~ s/\bSRED\.(?![^\W\d_])/:SREDNE:/g;  # middle
  $cgname =~ s/\bSRED\.$/:SREDNE:/;              # end

>Right now I'm generating the regexes in a standalone script, then inserting
>the output code into the subroutine that processes names into a "matchable"
>form.
>
>What I'd like to be able to do is take a *set* of abbreviation
>"dictionaries," concatenate them together and dynamically generate the
>regex code in the routine that is going to execute it.

So you want to take the dictionary files, and use them to create a
function that does all the regexes on its input?

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: There has to be a way to do this

Reply via email to