[EMAIL PROTECTED] writes: > Hi All, > > Apologies for the newbie question but I've searched and tried all > sorts for a few days and I'm pulling my hair out ;[ > > I have a 'reference' HTML file and a 'test' HTML file from which I > need to pull 10 strings, all of which are contained within <h2> tags, > e.g.: > <h2 class=r><a href="http://www.someplace.com/">Go Someplace</a></h2> > > Once I've found the 10 I'd like to write them to another 'results' > html file. Perhaps a 'reference results' and a 'test results' file. >>From where I would then like to 'diff' the results to see if they > match. > > Here's the rub: I cannot find a way to pull those 10 strings so I can > save them to the results pages. > Can anyone please suggest how this can be done? > > I've tried allsorts but I've been learning Python for 1 week and just > don't know enough to mod example scripts it seems. don't even get me > started on python docs.. ayaa ;] Please feel free to teach me to suck > eggs because it's all new to me :) > > Thanks in advance, > > Mark.
Take a look at BeautifulSoup. It is easy to use and works well with some malformed HTML that you might find ahead. -- Jorge Godoy <[EMAIL PROTECTED]> -- http://mail.python.org/mailman/listinfo/python-list