"Khoa Nguyen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > for tokens,start,end in commaSeparatedList.scanString(data): > print tokens > > > This returns: > > ['f1', 'f2', 'f3', 'f4', 'f5', 'f6'] > ['f1', 'f2'] > ['f1', 'f2', '', 'f4', '', 'f6'] >
<snip> > On 2nd thought, I don't think this will check for the correct order of > the fields. For example, the following would be incorrectly accepted: > > f1,f5,f2 END_RECORD > > Thanks, > Khoa Well, what are the rules for the comma-separated entries? Are they distinguished by type, or are they in ascending lexical or arithmetic order, or by ascending length? Two approaches you can take: - if at parse time you can determine if f5 is out of position because it is a specific type, then you can define your grammar like: Optional(f1SpecificFormat) + "," + Optional(f2SpecificFormat) + "," + ... and so on. Then f5 would only match if in the fifth position. Or, if even the commas are optional (as in f2,f5 END_RECORD), then you would need a grammar such as: Optional(f1SpecificFormat) + Optional(Optional(",") + f2SpecificFormat) + ... + "END_RECORD" - if f5 is out of order because it is followed by f2, but would have been ok if followed only by f6-fN values, then you'll need to read everything in, and then test for validity, most easily in a parse action. If the validation rule fails, then have the parse action raise a ParseException, so that the match would be rejected. -- Paul -- http://mail.python.org/mailman/listinfo/python-list