Hello Python Community, I have a large text file (1GB or so) with structure similar to the html example below.
I have to extract content (text between div and tr tags) from this file and put it into a spreadsheet or a database - given my limited python knowledge I was going to try to do this with regex pattern matching. Would someone be able to provide pointers regarding how do I approach this? Any code samples would be greatly appreciated. Thanks. Sam <html> \\ there are hundreds of thousands of items \\Item1 <div class="ItemHead">123</div> .... <div class="special">Text1: What do I do with these lines That span several rows? </div> ... <tr tag="ItemFoot">Foot</tr> \\Item2 <div class="ItemHead">First Line Can go here But the second line can go here</div> ... <tr tag="ItemFoot">Foot Can span Over several <b>pages</b></tr> \\Item3 <div class="ItemHead">First Line Can go here But the second line can go here</div> ... <div class="special">This can Span several rows</div> </html> -- http://mail.python.org/mailman/listinfo/python-list