Hi,

I am indexing a set of html websites using lucene (IndexHtml). The indexer
work fine and I can also find the indexed term but the problem this class
(IndexHtml) index all text inside the html site even the advertisements. I
am interested just in the body text and not interested in the advertisements
or side links text.

Any help how to solve this problem? Did I use the class wrongly?



-- 
View this message in context: 
http://www.nabble.com/Index-html-sites-using-IndexHtml-tp24666110p24666110.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to