This would in theory return the average of the vectors: let(a=select(search(...), film_vector), b=col(a, film_vector), m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)), av=scalarDivide(3, sumColumns(m))
Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz <uyil...@vivaldi.net.invalid> wrote: > The main thing which converts search result fields to arrays is the “col” > function > https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function > > You may also need “let” to use variables etc. Rest is just employing > available math functions. > > But they don’t play well with multivalued fields, it’s hard to work with > them. They look like arrays but are not exactly arrays. It’s just a bunch > of values sticking together. For example afaik there’s no way to refer to > 1st, 2nd element of a multivalued field. When you enable docValues and use > the export handler, those values would be returned in ascending order, > losing position information. > > For example if the ratings were from different movie raters, such as imdb, > rottentomatoes etc and every rating were in a different field, it would be > much easier to work with, as Solr expects to build arrays and matrices from > such formatted documents. > > I’d be happy to learn if someone more knowledgeable has a better answer. > > Sent from Mail for Windows > > From: Eric Pugh > Sent: Saturday, October 14, 2023 8:05 PM > To: users@solr.apache.org > Subject: Re: Vector math with Streaming Expressions? > > By average them, I mean the first version. So at the end, I get a set of > numbers that represents the average vector. > > Here is an example of the vector.. > https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365 > > In the existing docs on searching vectors, we make a statement that we > have the average vector of three movies: > https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154 > > I’d actually like to figure out how to calculate that vector from data we > have in Solr already. > > > > > On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID> > wrote: > > > > By “average them” do you mean to calculate the simple arithmetic average > element by element of the all returned film ratings? Eg. sum first element > of all arrays and divide by the number of arrays, do it again for the > second element etc.. > > > > Or find the average of the array for each movie, producing a single > number for each movie > > > > ~ufuk > > > > — > > > >> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com > <mailto:ep...@opensourceconnections.com>> wrote: > >> > >> I’m trying to average three arrays of floats and not quite making the > conceptual jump from “I defined a array of numbers” in the way that the > https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math > example expects with “I made a query and get back a array of numbers”. > >> > >> I’m using the films example, so : bin/solr start -c -e films > >> > >> Then, I want to get the vectors for three films and average them. > >> > >> The streaming expression grabs the three vectors, but I can’t figure > out how to wrap it in something to average them. > >> > >> select( > >> search(films, > >> qt="/select", > >> q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter > and the Chamber of Secrets"", > >> fl="id,name,film_vector"), > >> film_vector > >> ) > >> > >> produces: > >> > >> { > >> "result-set": { > >> "docs": [ > >> { > >> "film_vector": [ > >> "-0.2758314", > >> "-0.14416906", > >> "-0.11316811", > >> "0.2745105", > >> "0.040616427", > >> "-4.2628963E-4", > >> "-0.120363355", > >> "0.07888852", > >> "0.036417373", > >> "-0.29541242" > >> ] > >> }, > >> { > >> "film_vector": [ > >> "-0.11665395", > >> "0.04247921", > >> "-0.13233364", > >> "0.52578413", > >> "-0.1739291", > >> "-0.01880563", > >> "-0.06670809", > >> "-0.11242808", > >> "0.09724514", > >> "-0.11909142" > >> ] > >> }, > >> { > >> "film_vector": [ > >> "-0.14272659", > >> "0.13051921", > >> "-0.19087574", > >> "0.44983688", > >> "-0.21098459", > >> "0.0033124345", > >> "-0.008155139", > >> "-0.09109363", > >> "0.12401622", > >> "-0.12211737" > >> ] > >> }, > >> { > >> "EOF": true, > >> "RESPONSE_TIME": 24 > >> } > >> ] > >> } > >> } > >> > >> Great, now how do I average across them and get the final vector that I > expect, which should be similar to: > >> > >> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, > 0.0859, -0.1789] > >> > >> Thanks! > >> > >> Eric > >> > >> _______________________ > >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > | http://www.opensourceconnections.com < > http://www.opensourceconnections.com/>< > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > >> This e-mail and all contents, including attachments, is considered to > be Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > > >