On Oct 22, 4:18 am, "Just Another Victim of the Ambient Morality" <[EMAIL PROTECTED]> wrote: > I'm trying to parse with pyparsing but the grammar I'm using is somewhat > unorthodox. I need to be able to parse something like the following: > > UPPER CASE WORDS And Title Like Words > > ...into two sentences: > > UPPER CASE WORDS > And Title Like Words > > I'm finding this surprisingly hard to do. The problem is that pyparsing > implicitly assumes whitespace are ignorable characters and is (perhaps > necessarily) greedy with its term matching. All attempts to do the > described parsing either fails to parse or incorrectly parses so: > > UPPER CASE WORDS A > nd Title Like Words > > Frankly, I'm stuck. I don't know how to parse this grammar with > pyparsing. > Does anyone know how to accomplish what I'm trying to do? > Thank you...
Yes, whitespace skipping does get in the way sometimes. In your case, you need to clarify that each word that is parsed must be followed by whitespace. See the options and comments in the code below: from pyparsing import * data = "UPPER CASE WORDS And Title Like Words" # Option 1 - qualify Word instance with asKeyword=True upperCaseWord = Word(alphas.upper(), asKeyword=True) titleLikeWord = Word(alphas.upper(), alphas.lower(), asKeyword=True) # Option 2 - explicitly state that each word must be followed by whitespace upperCaseWord = Word(alphas.upper()) + FollowedBy(White()) titleLikeWord = Word(alphas.upper(), alphas.lower()) + FollowedBy(White()) # Option 3 - use regex's - note, still have to use lookahead to avoid matching # 'A' in 'And' upperCaseWord = Regex(r"[A-Z]+(?=\s)") titleLikeWord = Regex(r"[A-Z][a-z]*") # create grammar, with some friendly results names grammar = (OneOrMore(upperCaseWord)("allCaps") + OneOrMore(titleLikeWord)("title")) # dump out the parsed results print grammar.parseString(data).dump() All three options print out: ['UPPER', 'CASE', 'WORDS', 'And', 'Title', 'Like', 'Words'] - allCaps: ['UPPER', 'CASE', 'WORDS'] - title: ['And', 'Title', 'Like', 'Words'] Once you have this, you can rejoin the words with " ".join, or whatever you like. -- Paul -- http://mail.python.org/mailman/listinfo/python-list