first id see if omitting term frequencies and positions and norms did what you need, these are all things you can disable OOB...
Best Erick On Mon, Nov 5, 2012 at 5:26 AM, Damian Birchler <damian.birch...@bsiag.com>wrote: > Hi everyone**** > > ** ** > > We are using Lucene to search for possible duplicates in an address > database. We create an index with a document for each person in the > database. Each document has a field with one term for the first name, a > field with one term for the last name and so on. I think in this setting it > doesn’t make sense to let term frequency, inverse document frequency and > friends influence the document score (or does it?). For this reason I’m > thinking of overriding DefaultSimilarity to not take tf/idf into account > when scoring.**** > > ** ** > > Do you think that’s a reasonable thing to do? If so, how should I proceed > (I’m looking for implementation details here; should I, e.g., override the > method that calculates the term frequency to just return a constant > [altought, at the top of my head, I wouldn’t know what would be a sensible > constant.]).**** > > ** ** > > Thanks a lot,**** > > Damian**** > > ** ** >