Ok, let me preface this by saying that I know very little about html or web
pages, so what I say might not be entirely correct...but I think I can point
you in the right direction.
First, the part that says
($url = $info) =~ m/.../;
does not do what you want it to( I think ).
I think you want to match the regex against info and then assign the results
of the (.*) to $url. What yours does is assign $info to $url and then
perform a match against it (the (.*) will be in $1). To do the former, you
need to type:
($url) = $info =~ m/.../;
m// returns back a list of the parenthetical matches (of which you want the
first one ( and the only one in this case ).
The second problem is that (.*) is going to eat up too much information. It
will match as far as possible gobbling up quite a few " along the way
until it reaches the very LAST ". What you want it to do is match the
very FIRST " that it finds. An easy (and perhaps too simplistic) way of
doing this is to make (.*) match minimally by adding a ? after the *.
(.*?). This tells .* to match as little as possible and therefore only goes
to the first ". As I said, this might be too simplistic for what you
want, but it works with the sample data you provided.
Finally, If the __DATA__ section has embeded newlines, you will want to add
the /s flag to the regex to make it see all of the lines...I didn't know if
my mail program added the newlines or if they were there already.
So, the final two lines are:
($url) = $info =~ m/a.*href="(.*?)"/s;
($image) = $info =~ m/img.*src="(.*?)"/s;
These lines work with the data you provided. However, as your application
gets more complex, you might want to consider using the /g flag to match
multiple hrefs and imgs. If you do that, you will want to do minimal
matching on the first .* as well or else you will gobble up all of your data
the first time.
Good Luck!
Tanton
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]