Birgit Kellner wrote:
> 
> I have an html file and would like to extract image file names and
> extensions:
> 
> my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg">
> aölkjd oiae lkajf lksjfkjs df<br><img
> src="http://wlaskjfd.sdlkj/sdlk/LKJ_slkdjf_lkdjfslkj.gif";>|;
> 
> Image file names may contain numers, letters or underscores.
> 
> my %imageextension;
> while ($content =~ /<img src="(.*?)([a-zA-Z0-9_]+)\.(\w{3})">/g) {
>         print "yep!\n";
>         $imageextension{$2} = $3;}
> 
> foreach (keys %imageextension) {
> print "$_: $imageextension{$_}\n";}
> 
> Why does this code correctly extract "LKJ_slkdjf_lkdjfslkj" for the first
> image, but only "_image", and not "2_image" for the second?
> 
> I thought of adding "\\|\/" in the regexp after "(.*?)", but then the first
> image is not extracted at all.
> 

I think the problem isn't the regexp, it's in the line:

my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg">

You used the qq| ... | notation which is equivalent to " ... ".
So there's interpolation and for especially \2 could be interpolated.
If I right remember it's an old version for $2 which have every value.

Try, what happens when you use the q| ... | syntax instead of qq| ... |.


Best Wishes,
Andrea

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to