FuzzyQuery scoring was changen in Lucene 5.3: https://issues.apache.org/jira/browse/LUCENE-329
Maybe look at the result of IndexSearcher.explain to understand why the "Boston" doc got a lower score than you "Basti Bosan" doc? Le jeu. 21 avr. 2016 à 15:39, Jeremy Glesner <jer...@bericotechnologies.com> a écrit : > Hello, > > I'm witnessing a change in behavior between Lucene 4.9 and 5.4.1 that I > don't quite understand. > I'd like to track down what's happening under the hood. I'm working to > update the dependencies of an open source geospatial resolution tool ( > https://github.com/Berico-Technologies/CLAVIN), which uses Lucene. I've > indexed the geonames.org database using both Lucene 4.9 and 5.4.1. We > index on the Population of each city for later sorting on query. > > When running a fuzzy query "bostn~" with Occur.MUST in 4.9, we get the > expected result of Boston, where 6793534 is a boosted population. Here is > the scoreDoc.toString(): > > *Boston: doc=19586055 score=NaN shardIndex=-1 fields=[2.971942, 6793534]* > > However, using 5.4.1, the fuzzy match with Occur.MUST returns "Basti Bosan" > and "Boston Basin", both of which have a population of zero before > returning Boston. > > *Basti Bosan: doc=11707183 score=NaN shardIndex=0 fields=[1.5721874, 0]* > > > *Boston Basin: doc=12728320 score=NaN shardIndex=0 fields=[1.5721874, > 0]Boston: doc=17515475 score=NaN shardIndex=0 fields=[1.4374285, 6793534]* > > I'm wondering if something with the FIELD_SCORE calculation changed between > 4.9 and 5.4.1, or perhaps I've done something incorrect in building the > index, etc. > > It's worth mentioning that for this test I have built an index w/ both 4.9 > and 5.4.1 using the same geonames database to ensure consistency. Also, > sort is set up with both versions in the same way: > > *private static final Sort POPULATION_SORT = new Sort(new SortField[] { > SortField.FIELD_SCORE, * > * new SortedNumericSortField(SORT_POP.key(), SortField.Type.LONG, true) * > *});* > > With regard to building the index, in 4.9, we added the population sort > field to the index like so: > > *doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(), > Field.Store.YES));* > > Because you can't sort on docValue = NONE anymore, in 5.4.1, we now add it > like this: > > *doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(), > LONG_FIELD_TYPE_STORED_SORTED));* > > where LONG_FIELD_TYPE_STORED_SORTED is: > > > *private static final FieldType LONG_FIELD_TYPE_STORED_SORTED = new > FieldType();* > > > > > > > > > > *static { LONG_FIELD_TYPE_STORED_SORTED.setTokenized(false); > LONG_FIELD_TYPE_STORED_SORTED.setOmitNorms(true); > LONG_FIELD_TYPE_STORED_SORTED.setIndexOptions(IndexOptions.DOCS); > LONG_FIELD_TYPE_STORED_SORTED > > .setNumericType(FieldType.NumericType.LONG);LONG_FIELD_TYPE_STORED_SORTED.setStored(true);LONG_FIELD_TYPE_STORED_SORTED.setDocValuesType(DocValuesType.NUMERIC);LONG_FIELD_TYPE_STORED_SORTED.freeze();}* > > I would greatly appreciate any insights here; and I'm happy to answer > questions to unravel this a bit more. Thank you for your time! > > V/r, > Jeremy >