Hello everyone,

I am currently trying to create a system for performing distance
computation of different documents based on some pre-computed numerical
feature vector.

I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
documents as such, with the feature field being pfloat with multiValued set
to True:

      {
        "id":"1",
        "feature":[
          0.1,
          0.5,
          0.6,
          1.7],
      ,
      {
        "id":"2",
        "feature":[
          0.5,
          0.1,
          0.7,
          0.9],
      },
      {
        "id":"3",
        "feature":[
         -0.5,
          0.9,
          1.5,
          0.2],
      },

I want to create a matrix so I can then use the distance() function to
compute the distances for the columns of a matrix. The documentation
provides an example of what I am interested in, by defining the vectors on
the fly:

let(a=array(20, 30, 40),
    b=array(21, 29, 41),
    c=array(31, 40, 50),
    d=matrix(a, b, c),
    c=distance(d))

By transposing the matrix I can easily perform the distance between the
rows, so I can get what I want.

However, now I want to extract the numerical features from a feature field
indexed in Solr. The documentation explains how to create a matrix from
numerical values stored in some fields:

let(
    a=random(collection1, q="market:A", rows="5000", fl="price_f"),
    b=random(collection1, q="market:B", rows="5000", fl="price_f"),
    c=random(collection1, q="market:C", rows="5000", fl="price_f"),
    d=random(collection1, q="market:D", rows="5000", fl="price_f"),
    e=col(a, price_f),
    f=col(b, price_f),
    g=col(c, price_f),
    h=col(d, price_f),
    i=matrix(e, f, g, h),
    j=sumRows(i))

However, in my case, I already have an array of float values for each
document. So I try to do it that way:

let(
    s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
    s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
    s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
    m=matrix(f1,f2,f3)
)

But I get this error:

{
  "result-set": {
    "docs": [
      {
        "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
Numeric value expected but found type java.util.ArrayList for value
[0.1,0.5,0.6,1.7]",
        "EOF": true,
        "RESPONSE_TIME": 5
      }
    ]
  }
}

When I inspect what I get as f3, I see that I have an array of array, which
is why I think it is failing here to create the matrix. I've been searching
a lot on how to create a matrix from float vectors stored in a field of my
documents, and I still cannot find any solution. What I could do is extract
the vectors, create them on the fly, and construct the vectors and matrix,
but I would like to be able to do it in one request. Moreover, I find it
really curious that I cannot directly create the matrix on the results of a
a normal search. For instance, I would prefer to do something like that:

s=search(test,q="*",fl="feature,id"), m=col(s,feature))

which returns:

{
  "result-set": {
    "docs": [
      {
        "m": [
          [
            0.1,
            0.5,
            0.6,
            1.7
          ],
          [
            0.5,
            0.1,
            0.7,
            0.9
          ],
          [
            -0.5,
            0.9,
            1.5,
            0.2]
          ]
        ]
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 3
      }
    ]
  }
}

and be able to use the matrix I obtain here. But again, I was not able to
perform matrix operations on "m".

Does anyone know any elegant way to create a matrix from my numerical
vectors stored in my feature field?


Thank you.
-- 
Xavier Favory
Music Technology Group
Universitat Pompeu Fabra

Reply via email to