If you store it as a multi-value double or float, you would in theory just
get the array. It may be the way you are indexing the data rather than
defining the field that is creating the outer array.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 29, 2021 at 11:06 AM FAVORY , XAVIER <xavier.fav...@upf.edu>
wrote:

> Well, I actually index an array in my field.
> But when I use f1=col(s1, feature), it extracts it as a multi-valued field.
> I understand that col() is used to extract a field value from multiple
> retrieved instances, so it kind of puts it into an array, forming a
> multidimensional array.
>
> Could it be possible that I am not using the most adequate field type to
> store my features? I just want to store some arrays (for instance one
> 128-dim feature vector for each document).
> Also, as it is now, I need to perform an extra request to know the number
> of results I get from the query. This way I can then create the right
> streaming expression, with the right number of "fn" variables.
>
>
>
>
> On Thu, 29 Apr 2021 at 16:58, Joel Bernstein <joels...@gmail.com> wrote:
>
> > I agree this is very verbose. I didn't even realize you could index a
> > multidimensional array into a multi-value field until now. Knowing this
> it
> > makes sense to support matrix creation directly from multi-value arrays.
> > I'll add this when i get some time.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER <xavier.fav...@upf.edu>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Thank you for pointing me to that part of the documentation. valueAt()
> is
> > > exactly what I needed here.
> > > However, as you point out, there seems to be no way to directly get the
> > > matrix from a multidimensional array.
> > > As a consequence, my streaming expression is very verbose and quite
> long
> > > for my purpose (I perform this over a thousand documents), but it
> > actually
> > > works by doing it that way (and I get rid of an extra queries to get
> the
> > > ids from a text search for instance):
> > >
> > > let(
> > >     s=search(test,q="*",fl="feature"),
> > >     f1=valueAt(col(s, feature ),0),
> > >     f2=valueAt(col(s, feature ),1),
> > >     f3=valueAt(col(s, feature ),2),
> > >     m=transpose(matrix(f1,f2,f3)),
> > >     d=distance(m,cosine())
> > > )
> > >
> > >
> > > Thank you again,
> > > Best,
> > >
> > > Xavier
> > >
> > > On Thu, 29 Apr 2021 at 16:04, Joel Bernstein <joels...@gmail.com>
> wrote:
> > >
> > > > That's interesting, it seems like you've indexed a matrix into a
> field.
> > > >
> > > > If that's the case I think you'll need to access the arrays using the
> > > index
> > > > as described here:
> > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> > > >
> > > > Then you can create a matrix from the arrays.
> > > >
> > > > I guess we need to add a way to materialize the matrix directly from
> a
> > > > multidimensional array.
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <
> xavier.fav...@upf.edu
> > >
> > > > wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > I am currently trying to create a system for performing distance
> > > > > computation of different documents based on some pre-computed
> > numerical
> > > > > feature vector.
> > > > >
> > > > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I
> > have
> > > > > documents as such, with the feature field being pfloat with
> > multiValued
> > > > set
> > > > > to True:
> > > > >
> > > > >       {
> > > > >         "id":"1",
> > > > >         "feature":[
> > > > >           0.1,
> > > > >           0.5,
> > > > >           0.6,
> > > > >           1.7],
> > > > >       ,
> > > > >       {
> > > > >         "id":"2",
> > > > >         "feature":[
> > > > >           0.5,
> > > > >           0.1,
> > > > >           0.7,
> > > > >           0.9],
> > > > >       },
> > > > >       {
> > > > >         "id":"3",
> > > > >         "feature":[
> > > > >          -0.5,
> > > > >           0.9,
> > > > >           1.5,
> > > > >           0.2],
> > > > >       },
> > > > >
> > > > > I want to create a matrix so I can then use the distance() function
> > to
> > > > > compute the distances for the columns of a matrix. The
> documentation
> > > > > provides an example of what I am interested in, by defining the
> > vectors
> > > > on
> > > > > the fly:
> > > > >
> > > > > let(a=array(20, 30, 40),
> > > > >     b=array(21, 29, 41),
> > > > >     c=array(31, 40, 50),
> > > > >     d=matrix(a, b, c),
> > > > >     c=distance(d))
> > > > >
> > > > > By transposing the matrix I can easily perform the distance between
> > the
> > > > > rows, so I can get what I want.
> > > > >
> > > > > However, now I want to extract the numerical features from a
> feature
> > > > field
> > > > > indexed in Solr. The documentation explains how to create a matrix
> > from
> > > > > numerical values stored in some fields:
> > > > >
> > > > > let(
> > > > >     a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > > > >     b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > > > >     c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > > > >     d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > > > >     e=col(a, price_f),
> > > > >     f=col(b, price_f),
> > > > >     g=col(c, price_f),
> > > > >     h=col(d, price_f),
> > > > >     i=matrix(e, f, g, h),
> > > > >     j=sumRows(i))
> > > > >
> > > > > However, in my case, I already have an array of float values for
> each
> > > > > document. So I try to do it that way:
> > > > >
> > > > > let(
> > > > >     s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> > > > >     s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> > > > >     s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> > > > >     m=matrix(f1,f2,f3)
> > > > > )
> > > > >
> > > > > But I get this error:
> > > > >
> > > > > {
> > > > >   "result-set": {
> > > > >     "docs": [
> > > > >       {
> > > > >         "EXCEPTION": "Failed to evaluate expression
> matrix(f1,f2,f3)
> > -
> > > > > Numeric value expected but found type java.util.ArrayList for value
> > > > > [0.1,0.5,0.6,1.7]",
> > > > >         "EOF": true,
> > > > >         "RESPONSE_TIME": 5
> > > > >       }
> > > > >     ]
> > > > >   }
> > > > > }
> > > > >
> > > > > When I inspect what I get as f3, I see that I have an array of
> array,
> > > > which
> > > > > is why I think it is failing here to create the matrix. I've been
> > > > searching
> > > > > a lot on how to create a matrix from float vectors stored in a
> field
> > of
> > > > my
> > > > > documents, and I still cannot find any solution. What I could do is
> > > > extract
> > > > > the vectors, create them on the fly, and construct the vectors and
> > > > matrix,
> > > > > but I would like to be able to do it in one request. Moreover, I
> find
> > > it
> > > > > really curious that I cannot directly create the matrix on the
> > results
> > > > of a
> > > > > a normal search. For instance, I would prefer to do something like
> > > that:
> > > > >
> > > > > s=search(test,q="*",fl="feature,id"), m=col(s,feature))
> > > > >
> > > > > which returns:
> > > > >
> > > > > {
> > > > >   "result-set": {
> > > > >     "docs": [
> > > > >       {
> > > > >         "m": [
> > > > >           [
> > > > >             0.1,
> > > > >             0.5,
> > > > >             0.6,
> > > > >             1.7
> > > > >           ],
> > > > >           [
> > > > >             0.5,
> > > > >             0.1,
> > > > >             0.7,
> > > > >             0.9
> > > > >           ],
> > > > >           [
> > > > >             -0.5,
> > > > >             0.9,
> > > > >             1.5,
> > > > >             0.2]
> > > > >           ]
> > > > >         ]
> > > > >       },
> > > > >       {
> > > > >         "EOF": true,
> > > > >         "RESPONSE_TIME": 3
> > > > >       }
> > > > >     ]
> > > > >   }
> > > > > }
> > > > >
> > > > > and be able to use the matrix I obtain here. But again, I was not
> > able
> > > to
> > > > > perform matrix operations on "m".
> > > > >
> > > > > Does anyone know any elegant way to create a matrix from my
> numerical
> > > > > vectors stored in my feature field?
> > > > >
> > > > >
> > > > > Thank you.
> > > > > --
> > > > > Xavier Favory
> > > > > Music Technology Group
> > > > > Universitat Pompeu Fabra
> > > > >
> > > >
> > >
> > >
> > > --
> > > Xavier Favory
> > > Music Technology Group
> > > Universitat Pompeu Fabra
> > >
> >
>
>
> --
> Xavier Favory
> Music Technology Group
> Universitat Pompeu Fabra
>

Reply via email to