small modification below

> -----Original Message-----
> From: Toby Stuart [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 19, 2004 2:37 PM
> To: 'Christian Wattengård'; [EMAIL PROTECTED]
> Subject: RE: Extracting data from html structure.
> 
> 
> > -----Original Message-----
> > From: Christian Wattengård [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, February 18, 2004 4:41 AM
> > To: [EMAIL PROTECTED]
> > Subject: Extracting data from html structure.
> > 
> > 
> > I have the following html structure:
> > --------------------------------------------------------------
> > ----------
> 
> [long HTML snipped]
> 
> > --------------------------------------------------------------
> > -----------
> > And I want to extract from it the chekbox values and their 
> respective
> > channel names (contained in the link beside the checkbox).
> > I have checked a lot of modules on cpan but I haven't found 
> > one that does it
> > just the way I want it to yet. Actually I havent found any 
> > that I can get to
> > work at all.
> > 
> > Any tips?
> > 
> > Christian...
> > 
> 
> I snipped the HTML you provided cause it was sooooo long.  
> Try and trim it
> down next time.
> Anyhow, I think the code below does what you want.
> 
> 
> use strict;
> use warnings;
> 
> use HTML::Parser;
> 
> 
> my $HTML = <<EOF;
> <table border=0 cellpadding=0 cellspacing=0 width=156>
> <tr>
> <td colspan=2 bgcolor=#CDC9C0><b><font
> face=verdana,arial,helvetica,sans-serif size=-2
> color=#666666>&nbsp;Norske</font></b></td>
> </tr>
> <tr>
> <td width=78 valign=top><font class=link-00-ul-l size=1>
> <input type="checkbox" name=kanal_id[] value=1 CHECKED>
> <a 
> href="index.html?kanal_id=1&dag=0&fra_tid=0&til_tid=24&kategor
> i_id=">NRK
> 1</a><br>
> <input type="checkbox" name=kanal_id[] value=3 >
> <a 
> href="index.html?kanal_id=3&dag=0&fra_tid=0&til_tid=24&kategor
> i_id=">TV
> 2</a><br>
> <input type="checkbox" name=kanal_id[] value=5 >
> <a
> href="index.html?kanal_id=5&dag=0&fra_tid=0&til_tid=24&kategor
> i_id=">TVNorge
> </a><br>
> </font></td>
> </tr>
> </table>
> EOF
> 
> 
> 
> my $current_tag; # i'm not happy with using this.
>                  # is there a better way? anyone?
> 
> my $p = HTML::Parser->new(
>       api_version => 3,
>       start_h     => [ \&start_tag, 'tagname,attr' ],
>       text_h      => [ \&text,      'text'         ]
> );
> 
> $p->parse($HTML);
> $p->eof;
> 
> sub start_tag
> {
>       my $name  = shift;
>       my $attrs = shift;

#       my $text  = shift; # removed

>       
>       $current_tag = $name;
> 
>       if ($name eq 'input' and $attrs->{'type'} eq 'checkbox')
>       {
>               print $attrs->{'value'}, "=";
>       }
> }
> 
> sub text
> {
>       my $text  = shift;
>       if ($current_tag eq 'a')
>       {
>               print "$text\n";
>       }
>       
> }
> 
> 
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to