Hi, I am using lucene to index documents that have a multivalued text field named ‘city’. Each document might have multiple values for this field, like la, los angeles etc.
Assuming document d1 contains city = la ; city = los angeles document d2 contains city = la mirada document d3 contains city = la quinta Now when I search for 'la', I would prefer getting d1 as it has the exact match ie., a match that doesn't have any extra terms than what is in the query. I read lucene already prefers documents with fewer terms as DefaultSimilarity.computeNorm does return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); The problem I have is, I am not sure how numTerms is calculated for a multivalued field like city. Here would numTerms for d1 be 1 or 3? Would the numTerms be the sum of all the numTerms for each field value? Any idea on how to make the document d1 rank higher than d2 and d3? Thanks in advance, Prasad.