> does anyone know of a library which permits to summarise text? > i've been looking at nltk but haven't found anything yet. any > help would be very welcome.
Well, summarizing text is one of those things that generally takes a brain-cell or two to do. Automating the process would require doing it either smartly (some sort of neural-net/NLP/Markov-chain technology, which is a non-trivial task--something one might consider braving in the 3rd or 4th-year of a university computer-science program), or doing it fairly dumbly. As an example of a "dumb" solution, you can use regexps to trim off the first few words and the last few words and call that a "summary": >>> import re >>> r = re.compile(r'^(.{8}.*?\b)\s.*\s(\b.{8}.*?)', re.DOTALL) >>> s = """This is the first line ... and it has a second line ... and a third line ... and the last line is the fourth line.""" >>> result = r.sub(r"\1...\2",s.strip()) >>> result 'This is the...fourth line.' You can adjust the "{8}" portions for more or less leader/trailing context characters. The regexp might need a bit of tweaking for somewhat short strings, but if they're fairly short, one might not need to summarize them ;) -tkc -- http://mail.python.org/mailman/listinfo/python-list