Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-13 Thread danield
Hi all, I have found, much to my dismay, that the documentation on Lucene’s default similarity formula is very dangerously misleading. See it here: http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html#formula_tf Term Frequency (TF) counts are expecte

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-13 Thread danield
Corrections: document2={field1:”term1”, field2:”term1”} Coord(query1,document2)= 1/1 = 1 (Doesn't affect the problem/observation) -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread danield
Hi Mike, Thank you for your reply. Yes, I had thought of this, but it is not a solution to my problem, and this is because the Term Frequency and therefore the results will still be wrong, as prepending or appending a string to the term will still make it a different term. Similarily, I could use

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread danield
Oh thanks Mike, it did say somewhere. I guess it wouldn't hurt to make that explanation more prominent, as I clearly missed it. Never mind, I am working on my own solution for this, through subclassing QueryParser, BooleanQuery, BooleanScorer, Similarity and a bunch of other classes. Cheers, Dani

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-19 Thread danield
t; just mere worksmithing. Better yet, submit a patch since that's Javadoc, > although the exact form of the doc fix might be debatable, so I general > description of the problem should be sufficient, unless you feel > motivated. > > -- Jack Krupansky > > On Thu, Jan 15, 20

BulkScorer and .explain() compute scores separately?

2015-02-10 Thread danield
I have subclassed the BooleanQuery and changed the BooleanWeight constructor to change the way the /coord/ and /idf /components of the similiarity formula are computed, and my changes work as expected when calling IndexSearcher.explain(). However, I now find that when just calling IndexSearcher.se