On Sunday, March 31, 2002, at 11:50 , M z wrote:

> hello,
>
> in conjunction, I was looking into this module HTML to
> take out all the HTML I have in several files.
> Namely, the data I want is between tags
> <tag>data</tag>

I would look at getting the HTML::TreeBuilder module - sounds
like you need to get a copy of nmake - or find a ppm for installing
these where they belong.

As for code illustrations, try:

http://www.wetware.com/drieux/src/unix/perl/OK.UglyCode.txt

an illustration of the full on wackaDoodle code, where I was
working on an 'all singing, all dancing' - cgi and command line tool.

you would want to look at the

  sub parseTreeBack {

        ....

        my $tree = HTML::TreeBuilder->new; # empty tree
        $tree->parse($res->content);


     my @title = $tree->look_down("_tag", "title");

     my $page = '';

     foreach my $t (@title) {
         foreach my $item_r ( $t->content_refs_list ) {
                 next if ref $$item_r;
                 $page .=  "$$item_r \n";
         }
         $page .= "\n";
     }

    ....
  }

that basic structure is how I get the 'content' of the 'title'
out of the html....

I repeat that basic trick set to parse out the rows and tables
for other stuff - since I need to parse out of :

" <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML><HEAD><TITLE>List Grovellor Says</TITLE>
</HEAD><BODY><H1 ALIGN="center">List Grovellor Says</H1><hr><TABLE 
WIDTH="60%" ALIGN="center"><TR VALIGN="TOP"><TH ALIGN="center" COLSPAN="2"
 > = Frodo found in hobbits =</TH></TR> <TR VALIGN="TOP"><TD>Frodo Baggins<
/TD> <TD>[EMAIL PROTECTED]</TD></TR></TABLE><br><hr align="center" 
width="50%"><br></BODY></HTML> "

the fact that I found "Frodo" on the hobbits mailing lists, and
that he has the email address [EMAIL PROTECTED] -


which is to say I found the TreeBuilder simpler to use than trying
to work out the HTML::Parser and HTML::FormatText stuff directly,
it provides some 'class extensions' - and the specific trick above
is bootlegged from the POD. But it works.

ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to