Steve Tattersall wrote:
> 
> Please help I am trying to extract the line begining with GB and also the
> Title between html tags from multiple html files.
> 
> For example I want to extract the line: (see the html code below)
>  GB 0152 MSS.126/NUDL
> 
> and also the title which is:
> 
> National Union of Dock, Riverside and General Workers in Grea
> t Britain and Ireland
> 
> does anyone know how to go about this please, I would be extremly grateful.
> 
> --------------------------------------------------------
> <br><b>Reference</b>:
> <a target = "new" title = "Repository contact details from AR
> CHON - opens new window"  href = "http://www.hmc.gov.uk/archo
> n/searches/locresult.asp?LR=152">
> GB 0152 MSS.126/NUDL
>    </a>
> <br><b>Title</b>:
> 
> National Union of Dock, Riverside and General Workers in Grea
> t Britain and Ireland
> 
> <br><b>Dates of creation</b>:
> -------------------------------------------------------------
> 
I assume the html text is in the variable $html.
Then

my ($repository) = $html =~ /<br><b>Reference</b>:\n<a.*?\>\n(.*?)\n/s;
my ($title) = $html =~ /<nr><b>Title</b>:\n\n(.*?)\n/s;

should extract what you need.

Best Wishes,
Andrea

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to