Hi all,
I have found, much to my dismay, that the documentation on Lucene’s default
similarity formula is very dangerously misleading. See it here:
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html#formula_tf
Term Frequency (TF) counts are expecte
Corrections:
document2={field1:”term1”, field2:”term1”}
Coord(query1,document2)= 1/1 = 1
(Doesn't affect the problem/observation)
--
View this message in context:
http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4
Hi Mike,
Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and therefore
the results will still be wrong, as prepending or appending a string to the
term will still make it a different term.
Similarily, I could use
Oh thanks Mike, it did say somewhere. I guess it wouldn't hurt to make that
explanation more prominent, as I clearly missed it.
Never mind, I am working on my own solution for this, through subclassing
QueryParser, BooleanQuery, BooleanScorer, Similarity and a bunch of other
classes.
Cheers,
Dani
t; just mere worksmithing. Better yet, submit a patch since that's Javadoc,
> although the exact form of the doc fix might be debatable, so I general
> description of the problem should be sufficient, unless you feel
> motivated.
>
> -- Jack Krupansky
>
> On Thu, Jan 15, 20
I have subclassed the BooleanQuery and changed the BooleanWeight constructor
to change the way the /coord/ and /idf /components of the similiarity
formula are computed, and my changes work as expected when calling
IndexSearcher.explain().
However, I now find that when just calling IndexSearcher.se