Normally Lucene will count your d1 as having length=2. However, if "la" was added as a synonym for "los angeles", such that it "overlaps" its position, then the default similarity discounts that and will count it as length=1.
But for that to work, the position of the 2nd token must be the same as the previous token. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 15, 2015 at 4:34 AM, rama44ster <rama44s...@gmail.com> wrote: > Hi, > I am using lucene to index documents that have a multivalued text field > named ‘city’. > Each document might have multiple values for this field, like la, los > angeles etc. > > Assuming > document d1 contains city = la ; city = los angeles > document d2 contains city = la mirada > document d3 contains city = la quinta > > Now when I search for 'la', I would prefer getting d1 as it has the exact > match ie., a match that doesn't have any extra terms than what is in the > query. I read lucene already prefers documents with fewer terms as > DefaultSimilarity.computeNorm does > > return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); > > The problem I have is, I am not sure how numTerms is calculated for a > multivalued field like city. Here would numTerms for d1 be 1 or 3? Would > the numTerms be the sum of all the numTerms for each field value? > > Any idea on how to make the document d1 rank higher than d2 and d3? > > Thanks in advance, > Prasad. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org