Jack wrote: > Thanks for all the replies! > > SPARK looks promising. Its doc doesn't say if it handles unicode > (CJK in particular) encoding though. > > Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/ > > There's also PyGgy http://lava.net/~newsham/pyggy/ > > I may also give Antlr a try. > > If anyone has experiences using any of the parser generators with CJK > languages, I'd be very interested in hearing that.
I'm going to echo Tommy's reply. If you want to parse natural language, conventional parsers are going to be worse than useless (because you'll keep thinking, "Just one more tweak and this time it'll work for sure!"). Instead, go look at what the interactive fiction community uses. They analyse the statement in multiple passes, first picking out the verbs, then the noun phrases. Some of their parsers can do on-the-fly domain-specific spelling correction, etc, and all of them can ask the user for clarification. (I'm currently cobbling together something similar for pre-teen users.) -- http://mail.python.org/mailman/listinfo/python-list