[ https://issues.apache.org/jira/browse/SOLR-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824857#comment-17824857 ]
Chris M. Hostetter commented on SOLR-17164: ------------------------------------------- Sanjay: I REALLY like your patch, thank you for working on this. Some feedback... * By the time {{handle4ArgsVariant()}} is called, we already know what {{arg1Str}} and {{arg2Str}} are... ** so we should call them by more descriptive names in that method: {{private ValueSource handle4ArgsVariant(FunctionQParser fp, String vecEncStr, String vecSimStr)}} ** Likewise for the argument names in {{handle2ArgsVariant()}} * Passing {{constVec}} to {{handle2ArgsVariant()}} seems redundant and a tad confusing? ** By the time we call {{handle2ArgsVariant()}} if {{constVec}} is true, then {{arg2Str}} – aka {{field2Name}} – must be null *** If {{arg2Str}} is null but, {{constVec}} was false, then we're in a weird error situation ** So i think the method sig should be: {{private ValueSource handle2ArgsVariant(FunctionQParser fp, String field1Name, String field2Name)}} *** And if {{field2Name}} is null, then we know we should ask {{fp}} to parse a constant vec * The error messages thrown by {{requireVectorType}} should really include the field _name_ ** I would suggest refactoring the method to take in the {{SchemaField}} and return the {{DenseVectorField}} – using the {{SchemaField.getName()}} for the error if the type is wrong and the cast can't be done... {code:java} private DenseVectorField requireVectorType(SchemaField field) throws SyntaxError { FieldType fieldType = field.getType() if ( fieldType instanceof DenseVectorField ) { return (DenseVectorField) field.getType(); } throw new SolrException( BAD_REQUEST, String.format( Locale.ROOT, "Type mismatch: Expected [%s], but found a different field type for field: [%s]", DenseVectorField.class.getSimpleName(), field.getName())); } {code} * {{handle2ArgsVariant()}} should check that the vector encoding & similarity from {{field1Type}} match those from {{field2Type}} ** and we should have a test that we get an error if someone tries to mix and match ** Either that, or ... *_MAYBE_* leave the code alone, but the refguide should make it clear that when two fields are specified, the encoding & similarity are taken from the first field *** But i haven't thought this idea through all the way ... not sure it's a good idea to let people get different results for {{vectorSimilarity(fieldX, fieldY)}} then they get from {{vectorSimilarity(fieldY, fieldX)}} *** If folks want to mix and match encodings & similarity functions, then it doesn't seem weird to force them to use the 4 arg version * I like the mock test you've added, but the pattern repeated in each of the test methods (with a {{new FunctionQParser}} wrapped around the output of {{truncatePrefixFunctionQName}} is awkward and hard to make sense of at first. ** Other then the input string, the args to the {{FunctionQParser}} constructor are all consistent across all test methods (and if any test needs to tweak them they are already mocked) as is the (static) {{VectorSimilaritySourceParser}} ** So why not refactor all of this logic into a helper method... {code:java} protected ValueSource parseWithMocks(final String input) throws SyntaxError { FunctionQParser fqp = new FunctionQParser(input.substring("vectorSimilarity(".length()), localParams, params, request); return vecSimilarity.parse(fqp) } {code} * Your error test coverage of the 0 args, 1 arg, and 2 args (when either arg aren't vectors) bad input situations seem good – can you please add some tests of theerror message return when 3 args are used? .... Looking at the code I'm honestly not sure what happens if... ** first arg is a vector field, second arg is a constant vec, but there is a third useless arg ** first two args are an encoding & sim function, but only one vector argument is provided (either field name or constant) * Can you please add some tests where a constant vector is provided using a request param reference ** ie: {{vectorSimilarity(fieldName,$vec_param)}} ** (I'm not sure off the top of my head if param refs are resolved by the time we'll be making the {{peek()}} call) * Can you please update {{QueryEqualityTest.testFuncKnnVector()}} to check that given (consistent inputs) the queries returned by the 4 arg version are equal to the queries returned by the 2 arg version ** There a variant of the {{assertFuncEquals()}} method that takes a {{SolrQueryRequest}} as it's first argument which can be useful for checking the request param references are working at the same time... {code:java} SolrQueryRequest req = req("someVec", "[1,2,3,4]"); try { assertFuncEquals(req, "vectorSimilarity(FLOAT32,COSINE,float_vec_field,[1,2,3,4])", "vectorSimilarity(float_vec_field,[1,2,3,4])", "vectorSimilarity(FLOAT32,COSINE,float_vec_field,$someVec)", "vectorSimilarity(float_vec_field,$someVec)", ) } finally { req.close(); } {code} * In the ref-guide additions you made, you say {{The first argument must be a Knn vector, and the second argument can be either a Knn vector or a constant vector.}} but that's kind of missleading/confusing? ** what is "Knn vector" in this context? ** It should be clear that it requires a "DenseVectorField name" > Add 2 arg variant of vectorSimilarity() function > ------------------------------------------------ > > Key: SOLR-17164 > URL: https://issues.apache.org/jira/browse/SOLR-17164 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > Labels: vector-based-search > Time Spent: 10m > Remaining Estimate: 0h > > Solr's current 4 argument > {{vectorySimilarity(vectorEncoding,similarityFunction,vec1,vec2)}} function > is really awkward to use for the (seemingly common) situation where you just > want to know the similarity between a field and a constant vector, or > (probably less common) between two fields of the same type. > The first two (currently) mandatory arguments to {{vectorySimilarity()}} > ({{{}vectorEncoding{}}} and {{{}similarityFunction{}}}) are already mandatory > properties of {{{}DenseVectorField{}}}. IIUC the only reason these arguments > are required is in the (seemingly uncommon?) situation where you might want > to compute the similarity of two vector constants, so the function needs to > know what {{vectorEncoding}} and {{similarityFunction}} to use. > > ---- > > It would be really nice to support a simplified 2 argument variant of > {{vectorySimilarity()}} such that: > * the first argument must be the name of a {{DenseVectorField}} field > * the second argument must be either: > ** A vector constant > *** in which case the {{vectorEncoding}} use to parse the constant is > infered from the fieldType properties of the first argument > ** Or the name of a second {{DenseVectorField}} field > *** in which case the {{vectorEncoding}} and {{similarityFunction}} of the > two fields must match > * The ValueSource returned should be based on the configured > {{vectorEncoding}} & {{similarityFunction}} of the field(s) > Examples... > {noformat} > vectorySimilarity(title_float_vec_dim4, [1.0,2.0,3.0,4.0]) > ...or... > vectorySimilarity(title_float_vec_dim4, body_float_vec_dim4) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org