Re: REPOST from another list: Question related to improving search results

2009-05-02 Thread Vaijanathrao
e things, so you need to dig a little more in the page to get those. If you are not crawling the wiki page and are using XML dump, take any mediawiki parser which will give the html and you can use the above code, but yeah it will be duplication effort. --Thanks and Regards Vaijanath N. Rao -

Re: REPOST from another list: Question related to improving search results

2009-05-02 Thread Michael McCandless
Why not remove that content from every doc during indexing? Or, if that's too harsh, you could massively reduce the score for hits in that section, eg during indexing store payloads on those term occurrences falling within the common section, and then use BoostingTermQuery to down-weight those hit

REPOST from another list: Question related to improving search results

2009-05-02 Thread Aditya
Hi, New to this group. Question: Generally sites like wikipeadia have a template and every page follows it. These templates contains the word that occurs in every page. For example wikipedia template has the list of language in the left panel. Now these words gets indexed every tim