Re: product based term combination for BooleanQuery?

Chris Hostetter Sun, 08 Jul 2007 18:36:42 -0700

: At index time, I used a per document boost (over all fields) and a per
: field bost (over all documents). I can certainly factor out the first
: into a query boost, but I was under the impression that if I ever wanted
: to combine fields (eg to index all "name" "alias" and "title" data in a
: single "head" field) then I had to pre-boost the data prior to combining


whoa, whoa, WHOA! ... not at ALL ... I'm not sure how you got that
impression, but when combining differnet pieces of source data into single
field Lucene has no idea where those differnet peices come from --
boosting a "title" field has no impact whatsoever on a "head" field just
because you happen to put the same piece of text in both "title" and
"head"

furthermore, field boosts apply to the entire field value, if you are
making a "head" field containing some text you think of as title and some
text you think of as "name" you can't set a boost just on the "title" part
of the "head" field.

as i said -- loose those field boosts and you hsould see a *big*
improcement ... in general, i would advise against any attempt to combine
differnet ideas into a single field for the purpose of improving relevancy
... the only reason i would ever take something like a "title" and an
"author" and combine them into a single field is to make hte quering
simpler/faster, not in an attempt to improve relevancy ... query lots of
seperate fields using unique query time boosts.

: it. I tend to believe that these (short) fields contain more relevant
: information than (long) wikipedia articles or other documents.

: Should idf and tf take care of that short/long quality distinction? It
: sounds like you feel they should.

tf/idf will take care of recognizing that the word "John" is relaly
common, so it's not as significant to the query as "Bush" ... the
lengthNorm function of Similarity is what will help score fields better
then longer fields.

: I'll build an index without the per field boost and see if that produces
: improved results.

try the DisjunctionMaxQuery too .. particularly if you have multiword
queries.  the DisMaxQueryParser in solr thta i mentioned before can be
very handy.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: product based term combination for BooleanQuery?

Reply via email to