I really enjoyed your article. I will try to understand this. Will you be doing more of this in the future with more complicated examples?
Paul McGuire wrote: > "Dave" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> OK, I'm stumped. >> >> I'm trying to find newline characters (\n, specifically) that are NOT >> in comments. >> >> So, for example (where "<-" = a newline character): >> ========================================== >> 1: <- >> 2: /*<- >> 3: ----------------------<- >> 4: comment<- >> 5: ----------------------<- >> 6: */<- >> 7: <- >> 8: CODE CODE CODE<- >> 9: <- >> ========================================== >> >> I want to return the newline characters at lines 1, 6, 7, 8, and 9 but >> NOT the others. >> > > Dave - > > Pyparsing has built-in support for detecting line breaks and comments, and > the syntax is pretty simple, I think. Here's a pyparsing program that > gives your desired results: > > =============================== > from pyparsing import lineEnd, cStyleComment, lineno > > testsource = """ > /* > ---------------------- > comment > ---------------------- > */ > > CODE CODE CODE > > """ > > # define the expression you want to search for > eol = lineEnd > > # specify that you don't want to match within C-style comments > eol.ignore(cStyleComment.leaveWhitespace()) > > # loop through all the occurrences returned by scanString > # and print the line number of that location within the original string > for toks,startloc,endloc in eol.scanString(testsource): > print lineno(startloc,data) > =============================== > > The expression you are searching for is pretty basic, just a plain > end-of-line, or pyparsing's built-in expression, lineEnd. The curve you > are throwing is that you *don't* want eol's inside of C-style comments. > Pyparsing allows you to designate an "ignore" expression to skip > undesirable content, and fortunately, ignoring comments happens so often > during parsing, that pyparsing includes common comment expressions for C, > C++, Java, Python, > and HTML. Next, pyparsing's version of re.search is scanString. > scanString returns a generator that gives the matching tokens, start > location, and end location of every occurrence of the given parse > expression, in your case, > eol. Finally, in the body of our for loop, we use pyparsing's lineno > function to give us the line number of a string location within the > original string. > > About the only real wart on all this is that pyparsing implicitly skips > over > leading whitespace, even when looking for expressions to be ignored. In > order not to lose eols that are just before a comment (like your line 1), > we have to modify cStyleComment to leave leading whitespace. > > Download pyparsing at http://pyparsing.sourceforge.net. > > -- Paul -- http://mail.python.org/mailman/listinfo/python-list