>> http://lab.arc90.com/experiments/readability/ >> >> Readability is a javascript bookmarklet that "makes reading on the >> Web more enjoyable by removing the clutter around what you're >> reading." >> >> Does anyone know of something similar in Python? > > Well, that sounds like a browser tool.
yes, it's a bookmarklet, a tiny javascript code that when clicked runs on the current document in the browser. > Could you be a bit more specific about what kind of "similar" > functionality you would expect from a "similar" Python tool? > How would you tell it "what you're reading", for example? I'm not sure I understand your question corectly, but anyway. What I need is a package that given a random html document (a page from any random website) would extract the meaningful content, and filter the junk (advertisments, non-content elements, any other UI etc.) Readability seems to do some herustictical manipulation of the DOM, but I'm not that good at reading/understanding it's source-code. Of course it can't be 100% correct, but it's good enough in many cases. http://code.google.com/p/arc90labs- readability/source/browse/trunk/js/readability.js -- дамјан ((( http://damjan.softver.org.mk/ ))) war is peace freedom is slavery restrictions are enablement -- http://mail.python.org/mailman/listinfo/python-list