Birgit Kellner wrote: > > I have an html file and would like to extract image file names and > extensions: > > my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg"> > aölkjd oiae lkajf lksjfkjs df<br><img > src="http://wlaskjfd.sdlkj/sdlk/LKJ_slkdjf_lkdjfslkj.gif">|; > > Image file names may contain numers, letters or underscores. > > my %imageextension; > while ($content =~ /<img src="(.*?)([a-zA-Z0-9_]+)\.(\w{3})">/g) { > print "yep!\n"; > $imageextension{$2} = $3;} > > foreach (keys %imageextension) { > print "$_: $imageextension{$_}\n";} > > Why does this code correctly extract "LKJ_slkdjf_lkdjfslkj" for the first > image, but only "_image", and not "2_image" for the second? > > I thought of adding "\\|\/" in the regexp after "(.*?)", but then the first > image is not extracted at all. >
I think the problem isn't the regexp, it's in the line: my $content = qq|<img src="e:\somedir\otherdir\2_image.jpg"> You used the qq| ... | notation which is equivalent to " ... ". So there's interpolation and for especially \2 could be interpolated. If I right remember it's an old version for $2 which have every value. Try, what happens when you use the q| ... | syntax instead of qq| ... |. Best Wishes, Andrea -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]