Re: Repeated Words

Sean O'Leary Wed, 25 Apr 2001 11:13:05 -0700
At 10:48 AM 4/25/2001, you wrote:
> > I believe if you add the (g)lobal modifier and optionally(i), Paul's
> > line of code may work:
> >
> >  $line =~ /(\b\w+\b).*\1/ogi;

You don't need /o, because there are no variables in the pattern, so Perl 
will compile the regex at compile time anyway.  I guess it doesn't hurt, 
though.

I looked through some stuff for items like this, and found something in the 
Perl Cookbook that almost fit the bill.  I modified it *very* slightly so 
that it can catch punctuation between words.  Take a look at the comments 
and you'll know what to pull out if you don't want that behavior.

And before anyone goes thinking I'm a regex guru, I'm not.  I really 
literally copied 99% of this out of the Perl Cookbook.  (BTW, the three 
guys who wrote it are *really* good people to copy off of. :-) )  I know 
Perl kinda well, but of all it's features, I'm weakest on re's.

Anyway, here it be.

while (<>) {
     while ( m{
                \b         # Start at a word boundary
                (\S+)      # \S means non-whitespace who's
                           # beginning and end are alphanums
                           # if you put \b's around it.
                \b         # until you hit another boundary
                \W?        # match a non-word char
                (
                    \s+    # Then some space
                    \1     # Then what we matched in (\S+) above
                    \b     # with a boundary after it
                )
               }xig        # eXtended (Comments), case Insensitive
                           # Global
           )
     {
         print "Dupe word '$1' at line $.\n";
     }
}

Thank you for your time,

Sean.
Re: Repeated Words

Reply via email to