Pulkitg64 commented on code in PR #13308:
URL: https://github.com/apache/lucene/pull/13308#discussion_r1567435158
##########
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##########
@@ -61,24 +60,6 @@ public float compare(byte[] v1, byte[] v2) {
}
},
- /**
- * Cosine similarity. NOTE: the preferred way to perform cosine similarity
is to normalize all
- * vectors to unit length, and instead use {@link
VectorSimilarityFunction#DOT_PRODUCT}. You
- * should only use this function if you need to preserve the original
vectors and cannot normalize
- * them in advance. The similarity score is normalised to assure it is
positive.
- */
- COSINE {
- @Override
- public float compare(float[] v1, float[] v2) {
- return Math.max((1 + cosine(v1, v2)) / 2, 0);
- }
-
- @Override
- public float compare(byte[] v1, byte[] v2) {
- return (1 + cosine(v1, v2)) / 2;
- }
- },
Review Comment:
Thanks @benwtrent for commenting!
> These enumerations have default ordinals
Yeah, I guess that's why some tests are failing. For Lucene9.x the ordinal
is getting mapped to `MAXIMUM_INNER_PRODUCT` which was earlier mapping to
`COSINE`.
> We need to either detach these ordinal IDs from the implementation or
Switch from an enumerator to an id-implementation idea.
I am not sure how to do this but let me try to give it a shot.
> Even if deprecated, Lucene needs to be able to read indices from 9x -> 10.
So, cosine cannot go away on read.
Oh okay, that means we need to remove support for COSINE just from indexing
side not from searching side?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]