Re: Multi-valued field and numTerms

Michael McCandless Thu, 15 Jan 2015 01:44:07 -0800

Normally Lucene will count your d1 as having length=2.

However, if "la" was added as a synonym for "los angeles", such that
it "overlaps" its position, then the default similarity discounts that
and will count it as length=1.


But for that to work, the position of the 2nd token must be the same
as the previous token.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 15, 2015 at 4:34 AM, rama44ster <rama44s...@gmail.com> wrote:
> Hi,
> I am using lucene to index documents that have a multivalued text field
> named ‘city’.
> Each document might have multiple values for this field, like la, los
> angeles etc.
>
> Assuming
> document d1 contains city = la ; city = los angeles
> document d2 contains city = la mirada
> document d3 contains city = la quinta
>
> Now when I search for 'la', I would prefer getting d1 as it has the exact
> match ie., a match that doesn't have any extra terms than what is in the
> query. I read lucene already prefers documents with fewer terms as
> DefaultSimilarity.computeNorm does
>
> return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
>
> The problem I have is, I am not sure how numTerms is calculated for a
> multivalued field like city. Here would numTerms for d1 be 1 or 3? Would
> the numTerms be the sum of all the numTerms for each field value?
>
> Any idea on how to make the document d1 rank higher than d2 and d3?
>
> Thanks in advance,
> Prasad.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Multi-valued field and numTerms

Reply via email to