Re: Query Scoring

Harini Raghavan Sun, 01 Jan 2006 23:29:57 -0800

Yes I was refering to how IDF is used in the Highlighter code to findout how to prioritize fragments of the documents.

My requirement is to show the relevant fragments of the news article foreach company along with the search results. But the highlighter apisometimes picks up the fragments which are not so relevant to the newsarticle/company. I would like to know if there is anyway that I canmodify the scoring/ranking of these fragments in such a way that thenews items in which a company name & keywords in the headline getsassigned a very strong relevancy ranking, closely followed by a companyname mention in the first paragraph and a multiple-mention within theentire story. Something like headline = 5 points, first paragraph =four, etc.


Thanks,
Harini

markharw00d wrote:

Sorry to contradict, Erik, but the Highlighter's QueryScorer will makeuse of IDF, given a reader, in order to better prioritise which arethe "best" bits of a document.However, In the particular example given, the criteria includesseveral non-text fields which are not useful for IDF and generalscoring purposes - these are perhaps better expressed using a filterof some form. Otherwise, why should the scarcity of a particular datein the given range boost one matching document above others? Thesenumeric-type fields are simply mandatory boolean "hygiene factors"and should ideally play no part in highlight selection or resultsordering in general based on their IDF or TF.
Cheers,
Mark


Erik Hatcher wrote:
Harini,
I'm not sure I understand what you're asking. IDF doesn't factorinto highlighting.
IDF calculations are useful in scoring documents during a search,such that the most relevant documents are returned, but again thisis unrelated to highlighting.
Could you elaborate on what you're after?

    Erik

On Dec 30, 2005, at 12:02 PM, Harini Raghavan wrote:
Hi,

I have a requirement to highlight search keywords in the results and
display the matching fragment of the text with the results. I am using
the Hits highlighting mentioned in Lucene in Action.
Here is the search query(BooleanQuery) I am passing to theIndexSearcher
and QueryScorer:
+DocumentType:news
+(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
+FilingDate:[20041201 TO 20051201]
+(Content:"cost saving" Content:"cost savings" Content:outsource
Content:outsources Content:downsize Content:downsizes
Content:restructuring Content:restructure)
I do not quite understand how the query scoring actually works &how Inverse Document Frequency(IDF) calculations are useful? Can
someone shed some light on this using the given query as an example?

Thanks,
Harini


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________ NEW Yahoo!Cars - sell your car and browse thousands of new and used cars online!http://uk.cars.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query Scoring

Reply via email to