Op 09-03-15 om 15:39 schreef Chris Angelico: > On Tue, Mar 10, 2015 at 1:34 AM, Antoon Pardon > <antoon.par...@rece.vub.ac.be> wrote: >>> There is str.isidentifier, which returns True if something is a valid >>> identifier name: >>> >>>>>> '℮'.isidentifier() >>> True >> Which is not very usefull in a context of lexical analysis. I don't need to >> know >> if a particular string is useful as an identifier, I want to know which >> parts of >> a text are identifiers. > If you're doing lexical analysis, you probably want a lexer. For > Python, I would recommend parsing to AST and doing your analysis on > that; I've had pretty good success doing that, and then using the > line/column info to go back to the original text if I need it. A regex > is probably not going to be sufficient for that kind of work.
Maybe I am getting behind, but until now the lexers that I used require a regular expression per kind of token you want to recognize. At least PLY still seems to work like that. So if an identifier is one such kind of token, I need a regular expression that matches what an identifier is. -- Antoon Pardon -- https://mail.python.org/mailman/listinfo/python-list