Hi, I am indexing a set of html websites using lucene (IndexHtml). The indexer work fine and I can also find the indexed term but the problem this class (IndexHtml) index all text inside the html site even the advertisements. I am interested just in the body text and not interested in the advertisements or side links text.
Any help how to solve this problem? Did I use the class wrongly? -- View this message in context: http://www.nabble.com/Index-html-sites-using-IndexHtml-tp24666110p24666110.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org