On Wed, May 03, 2006 at 10:04:26AM -0500, JupiterHost.Net wrote: > Paul Johnson wrote: > >On Tue, May 02, 2006 at 04:43:34PM -0500, JupiterHost.Net wrote: > > > >>>Basically, right now I just need the HTML to Text output, like I > >>>explained. > > > >>"I want to grab strings between the p tags in this exact block of HTML" > >> > >>to which I would reply: > >> > >>my @strings = $html =~ m{<p>(.*)</p>}g; > > > >Yeah. Depending on "this exact block of HTML", that'd probably be > >wrong. (It's greedy. And even if it wasn't it might still be wrong.) > > Good point, make it non greedy and it will work as outlined, assuming I > read the OT's mind correctly :) > > And why not post an example of your catch to illustrate it for the > benefit of the list?
Because I was busy and I knew you would do it ;-) > >But if you know "this exact block of HTML", how about: > > > > my @strings = ( "string 1", "string 2", ... ); > > Because most likeley the string he is trying to grab will be changing, > why else woul he be trygin to parse them out? Right. So in fact you don't know "this exact block of HTML". Now, based on what we do know your solution will probably work. But we don't really know too much. What if "this exact block of HTML" contained <p>h<!--</p>-->a</p> for example? Yeah, I know, that'll never happen. > >Or how about a solution involving "links -dump" ? > > ATTN casual readers: *That is the worst idea ever* don't do it! I'm not sure it's quite that bad. I might have suggested using Java. > a) its not perl its a system command > b) its not portable by any means (what if "links" is not in their > path? what if "links" isn't even installed, what if "links" should have > been "lynx" what if the -dump flag on your OSs links needs to be --grab > on their OSs links, etc etc ?) > c) how does that help you get the string between the p tags in any > usefull form (IE you still have to get that data out of the output of > that command > d) Hypothetical unknown behavior: what if it creates a temp file and > is unable to remove it and it gets run a million times, now you've > potentially filled up the user's quota, potentially filled up a > partitian, etc etc But what if chucking the output into a file does exactly what you want? Slavish adherence to portability concerns shouldn't get in the way of your getting your job done. (perlfaq5) > And all because you didn't use Perl's most fundamental tool: regexs or > one of the zillions of HTML parsing modules to get what you want into a > data structure that is native to the script you want to use the data in. In general parsing HTML with a regular expression is going to bite you. You might find situations where it works, and I've even done so myself (with XML rather than HTML) but I don't think anyone could call it robust whilst keeping a straight face. Unless you are talking about Perl 6 that is ... -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>