small modification below > -----Original Message----- > From: Toby Stuart [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 19, 2004 2:37 PM > To: 'Christian Wattengård'; [EMAIL PROTECTED] > Subject: RE: Extracting data from html structure. > > > > -----Original Message----- > > From: Christian Wattengård [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, February 18, 2004 4:41 AM > > To: [EMAIL PROTECTED] > > Subject: Extracting data from html structure. > > > > > > I have the following html structure: > > -------------------------------------------------------------- > > ---------- > > [long HTML snipped] > > > -------------------------------------------------------------- > > ----------- > > And I want to extract from it the chekbox values and their > respective > > channel names (contained in the link beside the checkbox). > > I have checked a lot of modules on cpan but I haven't found > > one that does it > > just the way I want it to yet. Actually I havent found any > > that I can get to > > work at all. > > > > Any tips? > > > > Christian... > > > > I snipped the HTML you provided cause it was sooooo long. > Try and trim it > down next time. > Anyhow, I think the code below does what you want. > > > use strict; > use warnings; > > use HTML::Parser; > > > my $HTML = <<EOF; > <table border=0 cellpadding=0 cellspacing=0 width=156> > <tr> > <td colspan=2 bgcolor=#CDC9C0><b><font > face=verdana,arial,helvetica,sans-serif size=-2 > color=#666666> Norske</font></b></td> > </tr> > <tr> > <td width=78 valign=top><font class=link-00-ul-l size=1> > <input type="checkbox" name=kanal_id[] value=1 CHECKED> > <a > href="index.html?kanal_id=1&dag=0&fra_tid=0&til_tid=24&kategor > i_id=">NRK > 1</a><br> > <input type="checkbox" name=kanal_id[] value=3 > > <a > href="index.html?kanal_id=3&dag=0&fra_tid=0&til_tid=24&kategor > i_id=">TV > 2</a><br> > <input type="checkbox" name=kanal_id[] value=5 > > <a > href="index.html?kanal_id=5&dag=0&fra_tid=0&til_tid=24&kategor > i_id=">TVNorge > </a><br> > </font></td> > </tr> > </table> > EOF > > > > my $current_tag; # i'm not happy with using this. > # is there a better way? anyone? > > my $p = HTML::Parser->new( > api_version => 3, > start_h => [ \&start_tag, 'tagname,attr' ], > text_h => [ \&text, 'text' ] > ); > > $p->parse($HTML); > $p->eof; > > sub start_tag > { > my $name = shift; > my $attrs = shift;
# my $text = shift; # removed > > $current_tag = $name; > > if ($name eq 'input' and $attrs->{'type'} eq 'checkbox') > { > print $attrs->{'value'}, "="; > } > } > > sub text > { > my $text = shift; > if ($current_tag eq 'a') > { > print "$text\n"; > } > > } > > > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > <http://learn.perl.org/> <http://learn.perl.org/first-response> > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>