On 2 Mrz., 23:14, Clarendon <jine...@hotmail.com> wrote: > Thank you, Lie and Andrew for your help. > > I have studied NLTK quite closely but its parsers seem to be only for > demo. It has a very limited grammar set, and even a parser that is > supposed to be "large" does not have enough grammar to cover common > words like "I". > > I need to parse a large amount of texts collected from the web (around > a couple hundred sentences at a time) very quickly, so I need a parser > with a broad scope of grammar, enough to cover all these texts. This > is what I mean by 'random'. > > An advanced programmer has advised me that Python is rather slow in > processing large data, and so there are not many parsers written in > Python. He recommends that I use Jython to use parsers written in > Java. What are your views about this? > > Thank you very much.
You'll most likely need a GLR parser. There is http://www.lava.net/~newsham/pyggy/ which I tried once and found it to be broken. Then there is the Spark toolkit http://pages.cpsc.ucalgary.ca/~aycock/spark/ I checked it out years ago and found it was very slow. Then there is bison which can be used with a %glr-parser declaration and PyBison bindings http://www.freenet.org.nz/python/pybison/ Bison might be solid and fast. I can't say anything about the quality of the bindings though. -- http://mail.python.org/mailman/listinfo/python-list