Dennis Lee Bieber wrote: > On Tue, 05 Feb 2008 04:03:04 GMT, Odysseus > <[EMAIL PROTECTED]> declaimed the following in > comp.lang.python: > >> Sorry, translation problem: I am acquainted with Python's "for" -- if >> far from fluent with it, so to speak -- but the PS operator that's most >> similar (traversing a compound object, element by element, without any >> explicit indexing or counting) is called "forall". PS's "for" loop is >> similar to BASIC's (and ISTR Fortran's): >> >> start_value increment end_value {procedure} for >> >> I don't know the proper generic term -- "indexed loop"? -- but at any >> rate it provides a counter, unlike Python's command of the same name. >> > The convention is Python is to use range() (or xrange() ) to > generate a sequence of "index" values for the for statement to loop > over: > > for i in range([start], end, [step]): > > with the caveat that "end" will not be one of the values, start defaults > to 0, so if you supply range(4) the values become 0, 1, 2, 3 [ie, 4 > values starting at 0]. > If you have a sequence of values s and you want to associate each with its index value as you loop over the sequence the easiest way to do this is the enumerate built-in function:
>>> for x in enumerate(['this', 'is', 'a', 'list']): ... print x ... (0, 'this') (1, 'is') (2, 'a') (3, 'list') It's usually (though not always) much more convenient to bind the index and the value to separate names, as in >>> for i, v in enumerate(['this', 'is', 'a', 'list']): ... print i, v ... 0 this 1 is 2 a 3 list [...] > The whole idea behind the SGML parser is that YOU add methods to > handle each tag type you need... Also, FYI, there IS an HTML parser (in > module htmllib) that is already derived from sgmllib. > > class PageParser(SGMLParser): > def __init__(self): > #need to call the parent __init__, and then > #initialize any needed attributes -- like someplace to collect > #the parsed out cell data > self.row = {} > self.all_data = [] > > def start_table(self, attrs): > self.inTable = True > ..... > > def end_table(self): > self.inTable = False > ..... > > def start_tr(self, attrs): > if self.inRow: > #unclosed row! > self.end_tr() > self.inRow = True > self.cellCount = 0 > ... > > def end_tr(self): > self.inRow = False > # add/append collected row data to master stuff > self.all_data.append(self.row) > ... > > def start_td(self, attrs): > if self.inCell: > self.end_td() > self.inCell = True > ... > > def end_td(self): > self.cellCount = self.cellCount + 1 > ... > > def handle_data(self, text): > if self.inTable and self.inRow and self.inCell: > if self.cellCount == 0: > #first column stuff > self.row["Epoch1"] = convert_if_needed(text) > elif self.cellCount == 1: > #second column stuff > ... > > > Hope you don't have nested tables -- it could get ugly as this style > of parser requires the start_tag()/end_tag() methods to set instance > attributes for the purpose of tracking state needed in later methods > (notice the complexity of the handle_data() method just to ensure that > the text is from a table cell, and not some random text). > There is, of course, nothing to stop you building a recursive data structure, so that encountering a new opening tag such as <table> adds another level to some stack-like object, and the corresponding closing tag pops it off again, but this *does* add to the complexity somewhat. It seems natural that more complex input possibilities lead to more complex parsers. > And somewhere before you close the parser, get a handle on the > collected data... > > > parsed_data = parser.all_data > parser.close() > return parsed_data > > >> Why wouldn't one use a dictionary for that? >> > The overhead may not be needed... Tuples can also be used as the > keys /in/ a dictionary. > regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list