Hi, I've experimented with regular expressions to solve my problems in the past but I have seen so many comments about HTMLParser and sgmllib that I thought I would try a different approach this time so I tried using HTMLParser.
I want to search through my SGML file for various strings of text and find out what section they're in. What I have here does this to a certain extent but I was wondering if I could make handle_data and regular expressions work together to make this work a little better. For instance, when I search for "above" as I am here, I just get something like this: '174.114[1]':'above' but this isn't very useful b/c I want to know the context of above (i.e., the informaiton on either side the above) and maybe even us a regular expression to filter the search a little more. Any ideas? As always, I'd appreciate feedback on my efforts. Thanks, Greg ### from HTMLParser import HTMLParser import os, re root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the file: ") print given,ext = os.path.splitext(fname) inputFile = open(os.path.join(root,fname), 'r') data = inputFile.read() class PartFinder(HTMLParser): _full = None _secDict = dict() def found(self): return self._secDict def handle_starttag(self, tag, attrs): if tag == "sec-main": self._main = dict(attrs).get('no') self._full = self._main if tag == "sec-sub1": self._subone = dict(attrs).get('no') self._full = self._main + '[' + self._subone + ']' if tag == "sec-sub2": self._subtwo = dict(attrs).get('no') self._full = self._main + '[' + self._subone + ']' + '[' + self._subtwo + ']' def handle_data(self, data): if "Pt" in data: if not self._secDict.has_key(self._main): self._secDict[self._full] = [data] print self._secDict if __name__ == "__main__": parser = PartFinder() parser.feed(data) x = parser.found() output_part = given + '.parts' outputFile = file(os.path.join(root,output_part), 'w') outputFile.write(str(x)) outputFile.close() -- http://mail.python.org/mailman/listinfo/python-list