Tips Re Pattern Matching / REGEX

egonslokar Thu, 27 Mar 2008 14:06:49 -0700

Hello Python Community,

I have a large text file (1GB or so) with structure similar to the
html example below.


I have to extract content (text between div and tr tags) from this
file and put it into a spreadsheet or a database - given my limited
python knowledge I was going to try to do this with regex pattern
matching.

Would someone be able to provide pointers regarding how do I approach
this? Any code samples would be greatly appreciated.

Thanks.

Sam



<html>

\\ there are hundreds of thousands of items

\\Item1

<div class="ItemHead">123</div>
....
<div class="special">Text1: What do I do with these lines
That span several rows? </div>
...
<tr tag="ItemFoot">Foot</tr>

\\Item2

<div class="ItemHead">First Line Can go here
But the second line can go here</div>
...
<tr tag="ItemFoot">Foot
Can span
Over several <b>pages</b></tr>


\\Item3

<div class="ItemHead">First Line Can go here
But the second line can go here</div>
...
<div class="special">This can
Span several rows</div>

</html>


-- 
http://mail.python.org/mailman/listinfo/python-list

Tips Re Pattern Matching / REGEX

Reply via email to