Hi Novelli
Do you insist on HtmlParser in Nutch?
Or some alternatives are available, maybe, you can try htmlparser
hosted on sf.net
http://htmlparser.sourceforge.net/
Regards
/Jack
On 7/29/05, Giovanni Novelli <[EMAIL PROTECTED]> wrote:
> Hello,
> I'm working to the development of a multi-agen
I have tried both HtmlParser v1.5 and NekoHTML. About the former my
implementation doesn't work as i.e. it get text from javascripts; I
have followed the hint from
http://htmlparser.sourceforge.net/javadoc/org/htmlparser/visitors/TextExtractingVisitor.html
The following is my NOT working implement
Hi Giovanni
We are using the Neko HTML parser. Some simple example code can be
found in the "Lucene in Action" book.
For more information:
http://www.manning.com/books/hatcher2
http://www.apache.org/~andyc/neko/doc/html/
Patrick
On 29/07/05, Giovanni Novelli <[EMAIL PROTECTED]> wrote:
> Hello,
Hello,
I'm working to the development of a multi-agents software that
involves some information indexing, information retrieval and
information categorization tasks. I want to build the training set for
categorization using a set of HTML pages fetched from DMOZ RDF dumps.
I have tried the HtmlParse