NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data sets and tutorials supporting research and development in natural language processing. It comes with 50k lines of code, 300Mb of datasets, and a 360 page book which teaches both Python and Natural Language Processing. NLTK has been adopted in at least 40 university courses. NLTK is hosted on sourceforge, and is ranked in the top 200 projects.
http://nltk.sourceforge.net/ Quotes -- what users have said about NLTK: "... the quite remarkable Natural Language Toolkit (NLTK), a wonderful tool for teaching, and working in, computational linguistics using Python." http://www.ibm.com/developerworks/linux/library/l-cpnltk.html "Natural Language Toolkit (nltk) is an amazing library to play with natural language." http://www.biais.org/blog/index.php/2007/01/31/25-spelling-correction-using-the-python-natural-language-toolkit-nltk "... a wonderful lightweight framework that provides a wealth of NLP tools." http://harnly.net/2007/blog/geek/lang/ruby/nltks-ing-words-variations/ "A good place to start for those learning about NLP for the first time, this has been used in many academic situations. It is extremely well documented, with tutorials which not only explain the tool, but also give an overview of the subject (eg document clustering). I was able to go from downloading it for the first time, to creating and training a 2004 Task 1A system (bigram gene name tagger) in about and hour." http://compbio.uchsc.edu/corpora/bcresources.html "Students with no previous programming experience will be able to spend more of their time thinking about the logical steps involved in getting the computer to process language data, and less time mastering and using the arcana involved in getting the computer to do anything at all." http://linguistlist.org/issues/14/14-3165.html Steven Bird http://www.csse.unimelb.edu.au/~sb/ -- http://mail.python.org/mailman/listinfo/python-list