> record = f1,f2,...,fn END_RECORD > All the f(i) has to be in that order. > Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD) > Number of f(i)'s can vary. For example, the followings are allowed: > f1,f2 END_RECORD > f1,f2,,f4,,f6 END_RECORD > > Any suggestions? >
> > -------- > pyparsing includes a built-in expression, commaSeparatedList, for just such > a case. Here is a simple pyparsing program to crack your input text: > > > data = """f1,f2,f3,f4,f5,f6 END_RECORD > f1,f2 END_RECORD > f1,f2,,f4,,f6 END_RECORD""" > > from pyparsing import commaSeparatedList > > for tokens,start,end in commaSeparatedList.scanString(data): > print tokens > > > This returns: > ['f1', 'f2', 'f3', 'f4', 'f5', 'f6 END_RECORD'] > ['f1', 'f2 END_RECORD'] > ['f1', 'f2', '', 'f4', '', 'f6 END_RECORD'] > > Note that consecutive commas in the input return empty strings at the > corresponding places in the results. > > Unfortunately, commaSeparatedList embeds its own definition of what is > allowed between commas, so the last field looks like it always has > END_RECORD added to the end. We could copy the definition of > commaSeparatedList and exclude this, but it is simpler just to add a parse > action to commaSeparatedList, to remove END_RECORD from the -1'th list > element: > > def stripEND_RECORD(s,l,t): > last = t[-1] > if last.endswith("END_RECORD"): > # return a copy of t with last element trimmed of "END_RECORD" > return t[:-1] + [last[:-(len("END_RECORD"))].rstrip()] > > commaSeparatedList.setParseAction(stripEND_RECORD) > > > for tokens,start,end in commaSeparatedList.scanString(data): > print tokens > > > This returns: > > ['f1', 'f2', 'f3', 'f4', 'f5', 'f6'] > ['f1', 'f2'] > ['f1', 'f2', '', 'f4', '', 'f6'] > Thanks for your reply. This looks promising, but I have a few more questions: 1. If f(i) is non-terminal (e.g f(i) is another grammar expression), how would I adapt your idea to a more generic way? 2. The field delimiter is not always ',' in my case. So I guess I'll have to use delimtedList instead? Thanks again, Khoa -- http://mail.python.org/mailman/listinfo/python-list