[Sorry for the line wrapping on my previous post :-)]
Angie Ahl wrote: > > on 2003-10-10 James Edward Gray II said: > > >Keep your replies on the list, so you can get help from all the people > >smarter than me. ;) > > If there are people smarter than you out there I must be an amoeba ;) > > >Okay, why put this inside an if block. If it doesn't find a match it > >will fail and do nothing, which is what you want, right? I don't think > >you need the if. > > Good point. > > >Why don't we work on your Regular Expression a little and see if we can > >do it all in one move. We want to find all occurrences of the keyword, > >as long as they're not on a line beginning with qz, right? This seems > >to do that for me: > > > >$content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ? "$1\n$2\n" : > >"$1$2"/mge; > > > > Ok. I had to stop to pick myself up off the floor then. WOW. > > This has actually made it possible to cut the whole thing down massively. > > here's the code now: Some thoughts and suggestions if you don't mind. :-) > # get line breaks to make <br>'s at the end > $content =~ s/\n/-qbr-/g; I will assume that $content contains only printable characters so why replace one printable character with five printable characters (which might occur in $content for some reason) when you could replace it with a non-printable character? $content =~ s/\n/\0/g; OR $content =~ s/\n/\x7F/g; OR $content =~ s/\n/\xFF/g; > # find markup and add markers so it doesn't get processed by regex, > # no keyword links to be made inside other tags > $content =~ s/(\[(img|page|link|mp3)=.*?\])/\nqz$1\n/g; Your alternation is using capturing parens but you are not using $2 so you should probably use non-capturing parens. $content =~ s/(\[(?:img|page|link|mp3)=.*?\])/\nqz$1\n/g; > # find HTML so it doesn't get processed by regex, > # no keyword links to be made inside valid HTML > $content =~ s/(<.*?>)/\nqz$1\n/g; This may or may not work as intended as regexs aren't really good for parsing HTML. http://groups.google.com/groups?as_q=parse+HTML&num=50&as_scoring=r&hl=en&ie=ISO-8859-1&btnG=Google+Search&as_oq=regex+regexp+%22regular+expression%22&as_ugroup=comp.lang.perl.misc > for my $href ( @Keywords ) { > > # get each keyword and llok for it in content. > for $kw ( keys %$href ) { You should make $kw a lexical variable (you are using strict?) for my $kw ( keys %$href ) { > if ($content =~ /\b($kw)\b/g) { You don't need the if statement as the substitution below will do the right thing on its own. > # do the very clever reg with help from and thanks to > # [EMAIL PROTECTED] > $content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ? > "$1\nqz[link=\"$href->{$kw}\" title=\"$2\"]\n" : "$1$2"/mge; .* is the same as [^\n]* if you are not using the /s option. Your match above uses word boundaries (\b) around $kw so you should use them here. You can use negative look-ahead to simplify that a bit. $content =~ s/^(?!qz)(.*)\b($kw)\b/$1\nqz[link="$href->{$kw}" title="$2"]\n/mg; > } > } > } > > # clean up those line breaks and markers; > $content =~ s/\n(qz)?//g; > > # put in <br>'s > $content =~ s/-qbr-/<br>\n/g; > > print $content; > _________________________ > > As you can see I've adapted your regex a little to put in the full markup around > the keyword. > > The regex itself made perfect sense, it was the > > "" ? "" : "" bit that I've never seen before. That's really useful. > > I assume it means > > "if statement" ? "do if true" : "do if false" Yes, but it is more like: "if expression" ? "return if true" : "return if false" > Please do correct me if I'm wrong. What do you call that? I think I'm going it > be using that quite a bit ;) That is called the "Conditional Operator". You can find it in the perlop.pod document. > do I even need the if false bit in this case? In the example provided, yes. Note that the conditional operator can also be used as an lvalue (you can assign to it.) $x = $y == 41 ? $a : $b; ( $y == 41 ? $a : $b ) = $x; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]