Hello, I need help in using sgmlparser to parse a html file and keep track of the number of times each tag is being used.
In the end of this program I need to print out the number of times each tag was seen(presumably any type of tag can be used) and the linked text. I need help in getting past the first steps. I already have this basic program to return hyperlinks. I cant seem to understand how to parse any tag and keep track of it to print it out at a later time.... very frustrated and help is appreciated!!!!! -------------------------------------------------------------------------- import sgmllib, urllib class HtmParser(sgmllib.SGMLParser): def __init__(self, verbose=0): "Initialise an object, passing 'verbose' to the superclass." sgmllib.SGMLParser.__init__(self, verbose) self.hyperlinks = [] self.descriptions = [] self.inside_a_element = 0 def start_a(self, attributes): "Process a hyperlink and its 'attributes'." for name, value in attributes: if name == "href": self.hyperlinks.append(value) def get_hyperlinks(self): "Return the list of hyperlinks." return self.hyperlinks parser = HtmParser() inptAdrs = raw_input('Please input the absolute path to the url\n') print 'you entered: ', inptAdrs content = urllib.urlopen(inptAdrs) bufff = content.read() print 'Statistics for ', inptAdrs print 'There is', len(bufff), 'characters in the web page' parser.feed(bufff) print parser.get_hyperlinks() parser.close() --------------------------------------------------------------------------------- any help is much appreciated -- http://mail.python.org/mailman/listinfo/python-list