On Tue, Mar 10, 2015 at 1:34 AM, Antoon Pardon <antoon.par...@rece.vub.ac.be> wrote: >> There is str.isidentifier, which returns True if something is a valid >> identifier name: >> >> >>> '℮'.isidentifier() >> True > > Which is not very usefull in a context of lexical analysis. I don't need to > know > if a particular string is useful as an identifier, I want to know which parts > of > a text are identifiers.
If you're doing lexical analysis, you probably want a lexer. For Python, I would recommend parsing to AST and doing your analysis on that; I've had pretty good success doing that, and then using the line/column info to go back to the original text if I need it. A regex is probably not going to be sufficient for that kind of work. What exactly are you trying to accomplish here? More info would guide the recommendations, obviously. ChrisA -- https://mail.python.org/mailman/listinfo/python-list