[ 
https://issues.apache.org/jira/browse/SOLR-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824857#comment-17824857
 ] 

Chris M. Hostetter commented on SOLR-17164:
-------------------------------------------

Sanjay: I REALLY like your patch, thank you for working on this.

 

Some feedback...
 * By the time {{handle4ArgsVariant()}} is called, we already know what 
{{arg1Str}} and {{arg2Str}} are...
 ** so we should call them by more descriptive names in that method: {{private 
ValueSource handle4ArgsVariant(FunctionQParser fp, String vecEncStr, String 
vecSimStr)}}
 ** Likewise for the argument names in {{handle2ArgsVariant()}}
 * Passing {{constVec}} to {{handle2ArgsVariant()}} seems redundant and a tad 
confusing?
 ** By the time we call {{handle2ArgsVariant()}} if {{constVec}} is true, then 
{{arg2Str}} – aka {{field2Name}} – must be null
 *** If {{arg2Str}} is null but, {{constVec}} was false, then we're in a weird 
error situation
 ** So i think the method sig should be: {{private ValueSource 
handle2ArgsVariant(FunctionQParser fp, String field1Name, String field2Name)}}
 *** And if {{field2Name}} is null, then we know we should ask {{fp}} to parse 
a constant vec
 * The error messages thrown by {{requireVectorType}} should really include the 
field _name_
 ** I would suggest refactoring the method to take in the {{SchemaField}} and 
return the {{DenseVectorField}} – using the {{SchemaField.getName()}} for the 
error if the type is wrong and the cast can't be done...
{code:java}
private DenseVectorField requireVectorType(SchemaField field) throws 
SyntaxError {
    FieldType fieldType = field.getType()
    if ( fieldType instanceof DenseVectorField ) {
      return (DenseVectorField) field.getType();
    }
    throw new SolrException(
        BAD_REQUEST,
        String.format(
            Locale.ROOT,
            "Type mismatch: Expected [%s], but found a different field type for 
field: [%s]",
            DenseVectorField.class.getSimpleName(), field.getName()));
  }
{code}

 * {{handle2ArgsVariant()}} should check that the vector encoding & similarity 
from {{field1Type}} match those from {{field2Type}}
 ** and we should have a test that we get an error if someone tries to mix and 
match
 ** Either that, or ... *_MAYBE_* leave the code alone, but the refguide should 
make it clear that when two fields are specified, the encoding & similarity are 
taken from the first field
 *** But i haven't thought this idea through all the way ... not sure it's a 
good idea to let people get different results for {{vectorSimilarity(fieldX, 
fieldY)}} then they get from {{vectorSimilarity(fieldY, fieldX)}}
 *** If folks want to mix and match encodings & similarity functions, then it 
doesn't seem weird to force them to use the 4 arg version
 * I like the mock test you've added, but the pattern repeated in each of the 
test methods (with a {{new FunctionQParser}} wrapped around the output of 
{{truncatePrefixFunctionQName}} is awkward and hard to make sense of at first.
 ** Other then the input string, the args to the {{FunctionQParser}} 
constructor are all consistent across all test methods (and if any test needs 
to tweak them they are already mocked) as is the (static) 
{{VectorSimilaritySourceParser}}
 ** So why not refactor all of this logic into a helper method...
{code:java}
protected ValueSource parseWithMocks(final String input) throws SyntaxError {
  FunctionQParser fqp = new 
FunctionQParser(input.substring("vectorSimilarity(".length()), localParams, 
params, request);
  return vecSimilarity.parse(fqp)
}
{code}

 * Your error test coverage of the 0 args, 1 arg, and 2 args (when either arg 
aren't vectors) bad input situations seem good – can you please add some tests 
of theerror message return when 3 args are used? .... Looking at the code I'm 
honestly not sure what happens if...
 ** first arg is a vector field, second arg is a constant vec, but there is a 
third useless arg
 ** first two args are an encoding & sim function, but only one vector argument 
is provided (either field name or constant)
 * Can you please add some tests where a constant vector is provided using a 
request param reference
 ** ie: {{vectorSimilarity(fieldName,$vec_param)}}
 ** (I'm not sure off the top of my head if param refs are resolved by the time 
we'll be making the {{peek()}} call)
 * Can you please update {{QueryEqualityTest.testFuncKnnVector()}} to check 
that given (consistent inputs) the queries returned by the 4 arg version are 
equal to the queries returned by the 2 arg version
 ** There a variant of the {{assertFuncEquals()}} method that takes a 
{{SolrQueryRequest}} as it's first argument which can be useful for checking 
the request param references are working at the same time...
{code:java}
    SolrQueryRequest req = req("someVec", "[1,2,3,4]");
    try {
      assertFuncEquals(req,
        "vectorSimilarity(FLOAT32,COSINE,float_vec_field,[1,2,3,4])",
        "vectorSimilarity(float_vec_field,[1,2,3,4])",
        "vectorSimilarity(FLOAT32,COSINE,float_vec_field,$someVec)",
        "vectorSimilarity(float_vec_field,$someVec)",
      )
    } finally {
      req.close();
    }
{code}

 * In the ref-guide additions you made, you say {{The first argument must be a 
Knn vector, and the second argument can be either a Knn vector or a constant 
vector.}} but that's kind of missleading/confusing?
 ** what is "Knn vector" in this context?
 ** It should be clear that it requires a "DenseVectorField name"

> Add 2 arg variant of vectorSimilarity() function
> ------------------------------------------------
>
>                 Key: SOLR-17164
>                 URL: https://issues.apache.org/jira/browse/SOLR-17164
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>              Labels: vector-based-search
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's current 4 argument 
> {{vectorySimilarity(vectorEncoding,similarityFunction,vec1,vec2)}} function 
> is really awkward to use for the (seemingly common) situation where you just 
> want to know the similarity between a field and a constant vector, or 
> (probably less common) between two fields of the same type.
> The first two (currently) mandatory arguments to {{vectorySimilarity()}} 
> ({{{}vectorEncoding{}}} and {{{}similarityFunction{}}}) are already mandatory 
> properties of {{{}DenseVectorField{}}}. IIUC the only reason these arguments 
> are required is in the (seemingly uncommon?) situation where you might want 
> to compute the similarity of two vector constants, so the function needs to 
> know what {{vectorEncoding}} and {{similarityFunction}} to use.
>  
> ----
>  
> It would be really nice to support a simplified 2 argument variant of 
> {{vectorySimilarity()}} such that:
>  * the first argument must be the name of a {{DenseVectorField}} field
>  * the second argument must be either:
>  ** A vector constant
>  *** in which case the {{vectorEncoding}} use to parse the constant is 
> infered from the fieldType properties of the first argument
>  ** Or the name of a second {{DenseVectorField}} field
>  *** in which case the {{vectorEncoding}} and {{similarityFunction}} of the 
> two fields must match
>  * The ValueSource returned should be based on the configured 
> {{vectorEncoding}} & {{similarityFunction}} of the field(s)
> Examples...
> {noformat}
> vectorySimilarity(title_float_vec_dim4, [1.0,2.0,3.0,4.0])
>    ...or...
> vectorySimilarity(title_float_vec_dim4, body_float_vec_dim4)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to