On Sep 21, 10:59 am, Nobody <nob...@nowhere.com> wrote: > I have a similar question. > > What I want: a tokeniser generator which can take a lex-style grammar (not > necessarily lex syntax, but a set of token specifications defined by > REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of > bytes. It must allow the syntax to be defined at run-time. > > What I don't want: anything written by someone who doesn't understand the > field (i.e. anything which doesn't use a DFA).
lepl will do this, but it's integrated with the rest of the parser (which is recursive descent). for example: float = Token(Float()) word = Token(Word(Lower()) punctuation = ~Token(r'[\.,]') line = (float | word)[:, punctuation] parser = line.string_parser() will generate a lexer with three tokens. here two are specified using lepl's matchers and one using a regexp, but in all three cases they are converted to dfas internally. then a parser is generated that will match a sequence of floats and words, separated by punctuation. spaces are discarded by the lexer by default, but that can be changed through the configuration (which would be passed to the string_parser method). it's also possible to specify everything using matchers and then get lepl to compile "as much as possible" of the matcher graph to nfas before matching (nfas rather than dfas because they are implemented with a stack to preserve the backtracking abilities of the recursive descent parser they replace). the problem here is that not all matchers can be converted (matchers can contain arbitrary python functions, while my nfa+dfa implementations cannot, and also my "compiler" isn't very smart), while using tokens explicitly gives you an error if the automatic compilation fails (in which case the simple fix is to just give the regexp). (also, you say "sequence of bytes" rather than strings - lepl will parse the byte[] type in python3 and even has support for matching binary values). disclaimer: newish library, python 2.6+ only, and while i have quite a few users (or, at least, downloads), i doubt that many use these more advanced features, and everything is pure python with little performance tuning so far. andrew -- http://mail.python.org/mailman/listinfo/python-list