"Khoa Nguyen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I am trying to come up with a grammar that describes the following:
record = f1,f2,...,fn END_RECORD All the f(i) has to be in that order. Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD) Number of f(i)'s can vary. For example, the followings are allowed: f1,f2 END_RECORD f1,f2,,f4,,f6 END_RECORD Any suggestions? Thanks, Khoa -------- pyparsing includes a built-in expression, commaSeparatedList, for just such a case. Here is a simple pyparsing program to crack your input text: data = """f1,f2,f3,f4,f5,f6 END_RECORD f1,f2 END_RECORD f1,f2,,f4,,f6 END_RECORD""" from pyparsing import commaSeparatedList for tokens,start,end in commaSeparatedList.scanString(data): print tokens This returns: ['f1', 'f2', 'f3', 'f4', 'f5', 'f6 END_RECORD'] ['f1', 'f2 END_RECORD'] ['f1', 'f2', '', 'f4', '', 'f6 END_RECORD'] Note that consecutive commas in the input return empty strings at the corresponding places in the results. Unfortunately, commaSeparatedList embeds its own definition of what is allowed between commas, so the last field looks like it always has END_RECORD added to the end. We could copy the definition of commaSeparatedList and exclude this, but it is simpler just to add a parse action to commaSeparatedList, to remove END_RECORD from the -1'th list element: def stripEND_RECORD(s,l,t): last = t[-1] if last.endswith("END_RECORD"): # return a copy of t with last element trimmed of "END_RECORD" return t[:-1] + [last[:-(len("END_RECORD"))].rstrip()] commaSeparatedList.setParseAction(stripEND_RECORD) for tokens,start,end in commaSeparatedList.scanString(data): print tokens This returns: ['f1', 'f2', 'f3', 'f4', 'f5', 'f6'] ['f1', 'f2'] ['f1', 'f2', '', 'f4', '', 'f6'] As one of my wife's 3rd graders concluded on a science report - "wah-lah!" Python also includes a csv module if this example doesn't work for you, but you asked for a pyparsing solution, so there it is. -- Paul -- http://mail.python.org/mailman/listinfo/python-list