The main thing which converts search result fields to arrays is the “col” 
function 
https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function

You may also need “let” to use variables etc. Rest is  just employing available 
math functions.

But they don’t play well with multivalued fields, it’s hard to work with them. 
They look like arrays but are not exactly arrays. It’s just a bunch of values 
sticking together. For example afaik there’s no way to refer to 1st, 2nd 
element of a multivalued field. When you enable docValues and use the export 
handler, those values would be returned in ascending order, losing position 
information.

For example if the ratings were from different movie raters, such as imdb, 
rottentomatoes etc and every rating were in a different field, it would be much 
easier to work with, as Solr expects to build arrays and matrices from such 
formatted documents.

I’d be happy to learn if someone more knowledgeable has a better answer.

Sent from Mail for Windows

From: Eric Pugh
Sent: Saturday, October 14, 2023 8:05 PM
To: users@solr.apache.org
Subject: Re: Vector math with Streaming Expressions?

By average them, I mean the first version.   So at the end, I get a set of 
numbers that represents the average vector.  

Here is an example of the vector..  
https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365

In the existing docs on searching vectors, we make a statement that we have the 
average vector of three movies: 
https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154

I’d actually like to figure out how to calculate that vector from data we have 
in Solr already.



> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID> wrote:
> 
> By “average them” do you mean to calculate the simple arithmetic average 
> element by element of the all returned film ratings? Eg. sum first element of 
> all arrays and divide by the number of arrays, do it again for the second 
> element etc..
> 
> Or find the average of the array for each movie, producing a single number 
> for each movie
> 
> ~ufuk
> 
> —
> 
>> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com 
>> <mailto:ep...@opensourceconnections.com>> wrote:
>> 
>> I’m trying to average three arrays of floats and not quite making the 
>> conceptual jump from “I defined a array of numbers” in the way that the 
>> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math
>>  example expects with “I made a query and get back a array of numbers”. 
>> 
>> I’m using the films example, so :  bin/solr start -c -e films
>> 
>> Then, I want to get the vectors for three films and average them.   
>> 
>> The streaming expression grabs the three vectors, but I can’t figure out how 
>> to wrap it in something to average them.
>> 
>> select(      
>> search(films,
>>       qt="/select",
>>       q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and 
>> the Chamber of Secrets"",
>>       fl="id,name,film_vector"),
>> film_vector
>> )
>> 
>> produces:
>> 
>> {
>> "result-set": {
>>   "docs": [
>>     {
>>       "film_vector": [
>>         "-0.2758314",
>>         "-0.14416906",
>>         "-0.11316811",
>>         "0.2745105",
>>         "0.040616427",
>>         "-4.2628963E-4",
>>         "-0.120363355",
>>         "0.07888852",
>>         "0.036417373",
>>         "-0.29541242"
>>       ]
>>     },
>>     {
>>       "film_vector": [
>>         "-0.11665395",
>>         "0.04247921",
>>         "-0.13233364",
>>         "0.52578413",
>>         "-0.1739291",
>>         "-0.01880563",
>>         "-0.06670809",
>>         "-0.11242808",
>>         "0.09724514",
>>         "-0.11909142"
>>       ]
>>     },
>>     {
>>       "film_vector": [
>>         "-0.14272659",
>>         "0.13051921",
>>         "-0.19087574",
>>         "0.44983688",
>>         "-0.21098459",
>>         "0.0033124345",
>>         "-0.008155139",
>>         "-0.09109363",
>>         "0.12401622",
>>         "-0.12211737"
>>       ]
>>     },
>>     {
>>       "EOF": true,
>>       "RESPONSE_TIME": 24
>>     }
>>   ]
>> }
>> }
>> 
>> Great, now how do I average across them and get the final vector that I 
>> expect, which should be similar to:
>> 
>> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 
>> 0.0859, -0.1789]
>> 
>> Thanks!
>> 
>> Eric
>> 
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>> http://www.opensourceconnections.com 
>> <http://www.opensourceconnections.com/><http://www.opensourceconnections.com/>
>>  | My Free/Busy <http://tinyurl.com/eric-cal>  
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>>     
>> This e-mail and all contents, including attachments, is considered to be 
>> Company Confidential unless explicitly stated otherwise, regardless of 
>> whether attachments are marked as such.

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.


Reply via email to