Nice! The next thing to do is have the 'matrix' function accept a list of vectors. Then you could just do this:
let( a=select( search(films, qt="/select", q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and the Chamber of Secrets"", fl="id,name,film_vector"), film_vector), b=col(a, film_vector), m=matrix(b), average=scalarDivide(length(b), sumColumns(m)) ) Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Nov 7, 2023 at 10:42 AM Eric Pugh <ep...@opensourceconnections.com> wrote: > Just got to give this a try and it worked GREAT! Here is the working > example (that will be in the upcoming “How to use Vectors” tutorial): > > let( > a=select( > search(films, > qt="/select", > q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter > and the Chamber of Secrets"", > fl="id,name,film_vector"), > film_vector), > b=col(a, film_vector), > m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)), > average=scalarDivide(3, sumColumns(m)) > ) > > > > On Oct 15, 2023, at 11:53 PM, Joel Bernstein <joels...@gmail.com> wrote: > > > > This would in theory return the average of the vectors: > > > > let(a=select(search(...), film_vector), > > b=col(a, film_vector), > > m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)), > > av=scalarDivide(3, sumColumns(m)) > > > > > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz <uyil...@vivaldi.net.invalid > > > > wrote: > > > >> The main thing which converts search result fields to arrays is the > “col” > >> function > >> > https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function > >> > >> You may also need “let” to use variables etc. Rest is just employing > >> available math functions. > >> > >> But they don’t play well with multivalued fields, it’s hard to work with > >> them. They look like arrays but are not exactly arrays. It’s just a > bunch > >> of values sticking together. For example afaik there’s no way to refer > to > >> 1st, 2nd element of a multivalued field. When you enable docValues and > use > >> the export handler, those values would be returned in ascending order, > >> losing position information. > >> > >> For example if the ratings were from different movie raters, such as > imdb, > >> rottentomatoes etc and every rating were in a different field, it would > be > >> much easier to work with, as Solr expects to build arrays and matrices > from > >> such formatted documents. > >> > >> I’d be happy to learn if someone more knowledgeable has a better answer. > >> > >> Sent from Mail for Windows > >> > >> From: Eric Pugh > >> Sent: Saturday, October 14, 2023 8:05 PM > >> To: users@solr.apache.org > >> Subject: Re: Vector math with Streaming Expressions? > >> > >> By average them, I mean the first version. So at the end, I get a set > of > >> numbers that represents the average vector. > >> > >> Here is an example of the vector.. > >> > https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365 > >> > >> In the existing docs on searching vectors, we make a statement that we > >> have the average vector of three movies: > >> > https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154 > >> > >> I’d actually like to figure out how to calculate that vector from data > we > >> have in Solr already. > >> > >> > >> > >>> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID > > > >> wrote: > >>> > >>> By “average them” do you mean to calculate the simple arithmetic > average > >> element by element of the all returned film ratings? Eg. sum first > element > >> of all arrays and divide by the number of arrays, do it again for the > >> second element etc.. > >>> > >>> Or find the average of the array for each movie, producing a single > >> number for each movie > >>> > >>> ~ufuk > >>> > >>> — > >>> > >>>> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com > >> <mailto:ep...@opensourceconnections.com>> wrote: > >>>> > >>>> I’m trying to average three arrays of floats and not quite making the > >> conceptual jump from “I defined a array of numbers” in the way that the > >> > https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math > >> example expects with “I made a query and get back a array of numbers”. > >>>> > >>>> I’m using the films example, so : bin/solr start -c -e films > >>>> > >>>> Then, I want to get the vectors for three films and average them. > >>>> > >>>> The streaming expression grabs the three vectors, but I can’t figure > >> out how to wrap it in something to average them. > >>>> > >>>> select( > >>>> search(films, > >>>> qt="/select", > >>>> q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter > >> and the Chamber of Secrets"", > >>>> fl="id,name,film_vector"), > >>>> film_vector > >>>> ) > >>>> > >>>> produces: > >>>> > >>>> { > >>>> "result-set": { > >>>> "docs": [ > >>>> { > >>>> "film_vector": [ > >>>> "-0.2758314", > >>>> "-0.14416906", > >>>> "-0.11316811", > >>>> "0.2745105", > >>>> "0.040616427", > >>>> "-4.2628963E-4", > >>>> "-0.120363355", > >>>> "0.07888852", > >>>> "0.036417373", > >>>> "-0.29541242" > >>>> ] > >>>> }, > >>>> { > >>>> "film_vector": [ > >>>> "-0.11665395", > >>>> "0.04247921", > >>>> "-0.13233364", > >>>> "0.52578413", > >>>> "-0.1739291", > >>>> "-0.01880563", > >>>> "-0.06670809", > >>>> "-0.11242808", > >>>> "0.09724514", > >>>> "-0.11909142" > >>>> ] > >>>> }, > >>>> { > >>>> "film_vector": [ > >>>> "-0.14272659", > >>>> "0.13051921", > >>>> "-0.19087574", > >>>> "0.44983688", > >>>> "-0.21098459", > >>>> "0.0033124345", > >>>> "-0.008155139", > >>>> "-0.09109363", > >>>> "0.12401622", > >>>> "-0.12211737" > >>>> ] > >>>> }, > >>>> { > >>>> "EOF": true, > >>>> "RESPONSE_TIME": 24 > >>>> } > >>>> ] > >>>> } > >>>> } > >>>> > >>>> Great, now how do I average across them and get the final vector that > I > >> expect, which should be similar to: > >>>> > >>>> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, > >> 0.0859, -0.1789] > >>>> > >>>> Thanks! > >>>> > >>>> Eric > >>>> > >>>> _______________________ > >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > >> | http://www.opensourceconnections.com < > >> http://www.opensourceconnections.com/>< > >> http://www.opensourceconnections.com/> | My Free/Busy < > >> http://tinyurl.com/eric-cal> > >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > >> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > >> > >>>> This e-mail and all contents, including attachments, is considered to > >> be Company Confidential unless explicitly stated otherwise, regardless > of > >> whether attachments are marked as such. > >> > >> _______________________ > >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > >> http://www.opensourceconnections.com < > >> http://www.opensourceconnections.com/> | My Free/Busy < > >> http://tinyurl.com/eric-cal> > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > >> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > >> > >> This e-mail and all contents, including attachments, is considered to be > >> Company Confidential unless explicitly stated otherwise, regardless of > >> whether attachments are marked as such. > >> > >> > >> > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >