Paul Johnson wrote:
On Tue, May 02, 2006 at 04:43:34PM -0500, JupiterHost.Net wrote:
Basically, right now I just need the HTML to Text output, like I explained.
"I want to grab strings between the p tags in this exact block of HTML"
to which I would reply:
my @strings = $html =~ m{<p>(.*)</p>}g;
Yeah. Depending on "this exact block of HTML", that'd probably be
wrong. (It's greedy. And even if it wasn't it might still be wrong.)
Good point, make it non greedy and it will work as outlined, assuming I
read the OT's mind correctly :)
And why not post an example of your catch to illustrate it for the
benefit of the list?
$ perl -mstrict -MData::Dumper -wle 'my $html ="<p>foo</p>\n <p>bar</p>
<p>baz</p>"; print Dumper [$html =~ m{<p>(.*)</p>}g];'
$VAR1 = [
'foo',
'bar</p><p>baz'
];
$ perl -mstrict -MData::Dumper -wle 'my $html ="<p>foo</p>\n <p>bar</p>
<p>baz</p>"; print Dumper [$html =~ m{<p>(.*?)</p>}g];'
$VAR1 = [
'foo',
'bar',
'baz'
];
$
But if you know "this exact block of HTML", how about:
my @strings = ( "string 1", "string 2", ... );
Because most likeley the string he is trying to grab will be changing,
why else woul he be trygin to parse them out?
Or how about a solution involving "links -dump" ?
ATTN casual readers: *That is the worst idea ever* don't do it!
a) its not perl its a system command
b) its not portable by any means (what if "links" is not in their
path? what if "links" isn't even installed, what if "links" should have
been "lynx" what if the -dump flag on your OSs links needs to be --grab
on their OSs links, etc etc ?)
c) how does that help you get the string between the p tags in any
usefull form (IE you still have to get that data out of the output of
that command
d) Hypothetical unknown behavior: what if it creates a temp file and
is unable to remove it and it gets run a million times, now you've
potentially filled up the user's quota, potentially filled up a
partitian, etc etc
And all because you didn't use Perl's most fundamental tool: regexs or
one of the zillions of HTML parsing modules to get what you want into a
data structure that is native to the script you want to use the data in.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>