Nice!

The next thing to do is have the 'matrix' function accept a list of
vectors. Then you could just do this:

let(
  a=select(
        search(films,
        qt="/select",
        q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter
and the Chamber of Secrets"",
        fl="id,name,film_vector"),
        film_vector),
  b=col(a, film_vector),
  m=matrix(b),
  average=scalarDivide(length(b), sumColumns(m))
  )



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 7, 2023 at 10:42 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> Just got to give this a try and it worked GREAT!    Here is the working
> example (that will be in the upcoming “How to use Vectors” tutorial):
>
> let(
>   a=select(
>         search(films,
>         qt="/select",
>         q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter
> and the Chamber of Secrets"",
>         fl="id,name,film_vector"),
>         film_vector),
>   b=col(a, film_vector),
>   m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
>   average=scalarDivide(3, sumColumns(m))
>   )
>
>
> > On Oct 15, 2023, at 11:53 PM, Joel Bernstein <joels...@gmail.com> wrote:
> >
> > This would in theory return the average of the vectors:
> >
> > let(a=select(search(...), film_vector),
> >     b=col(a, film_vector),
> >     m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
> >     av=scalarDivide(3, sumColumns(m))
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz <uyil...@vivaldi.net.invalid
> >
> > wrote:
> >
> >> The main thing which converts search result fields to arrays is the
> “col”
> >> function
> >>
> https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function
> >>
> >> You may also need “let” to use variables etc. Rest is  just employing
> >> available math functions.
> >>
> >> But they don’t play well with multivalued fields, it’s hard to work with
> >> them. They look like arrays but are not exactly arrays. It’s just a
> bunch
> >> of values sticking together. For example afaik there’s no way to refer
> to
> >> 1st, 2nd element of a multivalued field. When you enable docValues and
> use
> >> the export handler, those values would be returned in ascending order,
> >> losing position information.
> >>
> >> For example if the ratings were from different movie raters, such as
> imdb,
> >> rottentomatoes etc and every rating were in a different field, it would
> be
> >> much easier to work with, as Solr expects to build arrays and matrices
> from
> >> such formatted documents.
> >>
> >> I’d be happy to learn if someone more knowledgeable has a better answer.
> >>
> >> Sent from Mail for Windows
> >>
> >> From: Eric Pugh
> >> Sent: Saturday, October 14, 2023 8:05 PM
> >> To: users@solr.apache.org
> >> Subject: Re: Vector math with Streaming Expressions?
> >>
> >> By average them, I mean the first version.   So at the end, I get a set
> of
> >> numbers that represents the average vector.
> >>
> >> Here is an example of the vector..
> >>
> https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365
> >>
> >> In the existing docs on searching vectors, we make a statement that we
> >> have the average vector of three movies:
> >>
> https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154
> >>
> >> I’d actually like to figure out how to calculate that vector from data
> we
> >> have in Solr already.
> >>
> >>
> >>
> >>> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID
> >
> >> wrote:
> >>>
> >>> By “average them” do you mean to calculate the simple arithmetic
> average
> >> element by element of the all returned film ratings? Eg. sum first
> element
> >> of all arrays and divide by the number of arrays, do it again for the
> >> second element etc..
> >>>
> >>> Or find the average of the array for each movie, producing a single
> >> number for each movie
> >>>
> >>> ~ufuk
> >>>
> >>> —
> >>>
> >>>> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com
> >> <mailto:ep...@opensourceconnections.com>> wrote:
> >>>>
> >>>> I’m trying to average three arrays of floats and not quite making the
> >> conceptual jump from “I defined a array of numbers” in the way that the
> >>
> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math
> >> example expects with “I made a query and get back a array of numbers”.
> >>>>
> >>>> I’m using the films example, so :  bin/solr start -c -e films
> >>>>
> >>>> Then, I want to get the vectors for three films and average them.
> >>>>
> >>>> The streaming expression grabs the three vectors, but I can’t figure
> >> out how to wrap it in something to average them.
> >>>>
> >>>> select(
> >>>> search(films,
> >>>>      qt="/select",
> >>>>      q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter
> >> and the Chamber of Secrets"",
> >>>>      fl="id,name,film_vector"),
> >>>> film_vector
> >>>> )
> >>>>
> >>>> produces:
> >>>>
> >>>> {
> >>>> "result-set": {
> >>>>  "docs": [
> >>>>    {
> >>>>      "film_vector": [
> >>>>        "-0.2758314",
> >>>>        "-0.14416906",
> >>>>        "-0.11316811",
> >>>>        "0.2745105",
> >>>>        "0.040616427",
> >>>>        "-4.2628963E-4",
> >>>>        "-0.120363355",
> >>>>        "0.07888852",
> >>>>        "0.036417373",
> >>>>        "-0.29541242"
> >>>>      ]
> >>>>    },
> >>>>    {
> >>>>      "film_vector": [
> >>>>        "-0.11665395",
> >>>>        "0.04247921",
> >>>>        "-0.13233364",
> >>>>        "0.52578413",
> >>>>        "-0.1739291",
> >>>>        "-0.01880563",
> >>>>        "-0.06670809",
> >>>>        "-0.11242808",
> >>>>        "0.09724514",
> >>>>        "-0.11909142"
> >>>>      ]
> >>>>    },
> >>>>    {
> >>>>      "film_vector": [
> >>>>        "-0.14272659",
> >>>>        "0.13051921",
> >>>>        "-0.19087574",
> >>>>        "0.44983688",
> >>>>        "-0.21098459",
> >>>>        "0.0033124345",
> >>>>        "-0.008155139",
> >>>>        "-0.09109363",
> >>>>        "0.12401622",
> >>>>        "-0.12211737"
> >>>>      ]
> >>>>    },
> >>>>    {
> >>>>      "EOF": true,
> >>>>      "RESPONSE_TIME": 24
> >>>>    }
> >>>>  ]
> >>>> }
> >>>> }
> >>>>
> >>>> Great, now how do I average across them and get the final vector that
> I
> >> expect, which should be similar to:
> >>>>
> >>>> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415,
> >> 0.0859, -0.1789]
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Eric
> >>>>
> >>>> _______________________
> >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> >> | http://www.opensourceconnections.com <
> >> http://www.opensourceconnections.com/><
> >> http://www.opensourceconnections.com/> | My Free/Busy <
> >> http://tinyurl.com/eric-cal>
> >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >>
> >>>> This e-mail and all contents, including attachments, is considered to
> >> be Company Confidential unless explicitly stated otherwise, regardless
> of
> >> whether attachments are marked as such.
> >>
> >> _______________________
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >> http://www.opensourceconnections.com <
> >> http://www.opensourceconnections.com/> | My Free/Busy <
> >> http://tinyurl.com/eric-cal>
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >>
> >> This e-mail and all contents, including attachments, is considered to be
> >> Company Confidential unless explicitly stated otherwise, regardless of
> >> whether attachments are marked as such.
> >>
> >>
> >>
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to