Mark - Let me weigh in with a pyparsing entry to your puzzle. It wont be blazingly fast, but at least it will give you another data point in your comparison of approaches. Note that the parser can do the string-to-int conversion for you during the parsing pass.
If @rv@ and @pv@ are record type markers, then you can use pyparsing to create more of a parser than just a simple tokenizer, and parse out the individual record fields into result attributes. Download pyparsing at http://pyparsing.sourceforge.net. -- Paul test1 = "@hello@@world@@[EMAIL PROTECTED]" test2 = """@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44 @pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @""" from pyparsing import * AT = Literal("@") atQuotedString = AT.suppress() + Combine(OneOrMore((~AT + SkipTo(AT)) | (AT + AT).setParseAction(replaceWith("@")) )) + AT.suppress() # extract any @-quoted strings for test in (test1,test2): for toks,s,e in atQuotedString.scanString(test): print toks print # parse all tokens (assume either a positive integer or @-quoted string) def makeInt(s,l,toks): return int(toks[0]) entry = OneOrMore( Word(nums).setParseAction(makeInt) | atQuotedString ) for t in test2.split("\n"): print entry.parseString(t) Prints out: ['[EMAIL PROTECTED]@foo'] ['rv'] ['db.locks'] ['//depot/hello.txt'] ['mh'] ['mh'] ['pv'] ['db.changex'] ['mh'] ['mh'] [':@: :@@: '] ['rv', 2, 'db.locks', '//depot/hello.txt', 'mh', 'mh', 1, 1, 44] ['pv', 0, 'db.changex', 44, 44, 'mh', 'mh', 1118875308, 0, ':@: :@@: '] -- http://mail.python.org/mailman/listinfo/python-list