Package: wnpp Severity: wishlist Owner: Emmanuel Bourg <ebo...@apache.org>
* Package name : boilerpipe Version : 1.2.0 Upstream Author : Christian Kohlschütter <christ...@kohlschutter.com> * URL : http://code.google.com/p/boilerpipe * License : Apache-2.0 Programming Lang: Java Description : Boilerplate removal and fulltext extraction from HTML pages The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. -- To UNSUBSCRIBE, email to debian-wnpp-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130619214008.14063.61113.reportbug@debiandev