HTML parsing

Daniel Smith Mon, 28 Mar 2005 23:30:15 -0800

Hi all,

I'm brand new to Perl, and have just a little programming background.  I was 
tasked with parsing a set of .html files in order to extract the data contained 
within some terribly formatted tables.  Here is a sample of what I have.....


<tr>
<th align="left" width="10%"><font size="-1">Data to be extracted </font></th>
<td width="30%"><font size="-1">
DATA DATA DATA
</font></td>
<th align="left" width="10%"><font size="-1">Need this too</font></th>
<td colspan="3" valign="top"><font size="-1">More data I need to get 
out</font></td>
</tr>

This is one row from the typical four row table that is returned as a search 
result.  There are 25 of these four row tables per page.  Could someone point 
me in the right direction as to how I might go about doing this?  A colleague 
of mine told me "put the file into an array and use the 'split' 
command"....while I vaguely understand the concept, I'm not sure about the 
syntax.  Can anyone shed some light?

Thanks in advance,

Dan

HTML parsing

Reply via email to