Hi all, I have to parse some thousand of html files, so I'd like to use some html parser, and not my own regexpes. Htmls I am parsing are quite complex, so I need your help. First of all, is HTML::Tree good and fast module?
Because, I am not sure if I have to look for some criteria using if( my $h = $tree->look_down('_tag', 'sometag') ) { } it is not slow ? When I used Dumped through Data::Dumper, from 300 kb html file is 13mb dump output... Ok, and now to the problem, html looks like: <table width="600%" border="3" align="center" cellspacing="2" cellpadding="2" bgcolor='#eeffff'> <tr> <td align="left" valign="top" width="20%"> <span class="tl">TEST: </span></td> <td align="left" width="80%"><table width="100%" border="0"> <tr> <td width="67%"> <span class='ra'> Vysoká </span> <span class='ra'> 9 </span><br> <span class='ra'> Bratislava </span> <span class='ra'> 810 00 </span><br></td> <td width="33%" valign='top'> <span class='ra'>something</span></td> </tr> </table><table width="100%" border="0"> <tr> <td width="67%"> <span class='ro'> Nám. SNP </span> <span class='ro'> 15 </span><br> <span class='ro'> Bratislava </span> <span class='ro'> 810 00 </span><br></td> <td width="33%" valign='top'> <span class='ro'>something</span></td> </tr> </table><table width="100%" border="0"> <tr> <td width="67%"> <span class='ro'> Bratislava </span><br></td> <td width="33%" valign='top'> <span class='ro'>something</span></td> </tr> </table></td> </tr> </table> (I hope you will see it ok, if not http://www.2ge.us/perl/html.txt ). Ok, and now to the problem - nearly whole html is full of this kind tables. And now how to extract values from there ? I have to look out, if class = "tl" and value is /TEST:/i, if yes, give me all values till end of whole table. Should be someone so neat and give me some help ? Hint: in table is always one class='ra' and optional 0 or more class='ro' thanks for any help! -- --. ,-- ,- ICQ: 7552083 \|||/ `//EB: www.2ge.us ,--' | - |-- IRC: [2ge] (. .) ,\\SN: 2ge!2ge_us `====+==+=+===~ ~=============-o00-(_)-00o-================~ John Tesh might drive (John says ride) a Celica. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>