e things, so you need to dig
a little more in the page to get those. If you are not crawling the wiki page
and are using XML dump, take any mediawiki parser which will give the html and
you can use the above code, but yeah it will be duplication effort.
--Thanks and Regards
Vaijanath N. Rao
-
Why not remove that content from every doc during indexing?
Or, if that's too harsh, you could massively reduce the score for hits
in that section, eg during indexing store payloads on those term
occurrences falling within the common section, and then use
BoostingTermQuery to down-weight those hit
Hi,
New to this group.
Question:
Generally sites like wikipeadia have a template and every page follows it.
These templates contains the word that occurs in every page.
For example wikipedia template has the list of language in the left panel.
Now these words gets indexed every tim