On Mon, 2001-11-12 at 16:31, Steve Tattersall wrote: > For example I want to extract the line: (see the html code below) > GB 0152 MSS.126/NUDL > > and also the title which is: > > National Union of Dock, Riverside and General Workers in Grea > t Britain and Ireland > > does anyone know how to go about this please, I would be extremly grateful.
We've had a regexp answer, but for readability I'd use the HTML::TokeParser module. It'd work like this. # Prep an object. $html contains the html to parse. my $p = HTML::TokeParser->new( \$html ) or die "$!"; # Find an <a> tag, and get everything outside of it up to </a>. my $token = $p->get_tag("a"); my $reference = $p->get_trimmed_text("/a"); # From there, find a </b> tag, and snarf everything up to <br>. my $token = $p->get_tag("/b"); my $title = $p->get_trimmed_text("br"); You'll have some small tidying up to do on both, but it's a /much/ more readable (and maintainable) way of parsing the HTML. Hope this helps, (from one Manchester perl bod to another ;-) ~C. -- $a="printf.net"; Chris Ball | chris@void.$a | www.$a | finger: chris@$a "In the beginning there was nothing, which exploded." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]