Within a larger pyparsing grammar, I have something that looks like:: wsj/00/wsj_0003.mrg
When parsing this, I'd like to keep around both the full string, and the AAA_NNNN substring of it, so I'd like something like:: >>> foo.parseString('wsj/00/wsj_0003.mrg') (['wsj/00/wsj_0003.mrg', 'wsj_0003'], {}) How do I go about this? I was using something like:: >>> digits = pp.Word(pp.nums) >>> alphas = pp.Word(pp.alphas) >>> wsj_name = pp.Combine(alphas + '_' + digits) >>> wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name + ... '.mrg') But of course then all I get back is the full path:: >>> wsj_path.parseString('wsj/00/wsj_0003.mrg') (['wsj/00/wsj_0003.mrg'], {}) I could leave off the final Combine and add a parse action:: >>> wsj_path = alphas + '/' + digits + '/' + wsj_name + '.mrg' >>> def parse_wsj_path(string, index, tokens): ... wsj_name = tokens[4] ... return ''.join(tokens), wsj_name ... >>> wsj_path.setParseAction(parse_wsj_path) >>> wsj_path.parseString('wsj/00/wsj_0003.mrg') ([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {}) But that then allows whitespace between the pieces of the path, which there shouldn't be:: >>> wsj_path.parseString('wsj / 00 / wsj_0003.mrg') ([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {}) How do I make sure no whitespace intervenes, and still have access to the sub-expression? Thanks, STeVe -- http://mail.python.org/mailman/listinfo/python-list