epugh commented on code in PR #1988: URL: https://github.com/apache/solr/pull/1988#discussion_r1412111764
########## solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc: ########## @@ -0,0 +1,172 @@ += Exercise 5: Using Vectors +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +[[exercise-5]] +== Exercise 5: Using Vectors in Solr + +This exercise will use the Films example that we looked at previously in Exercise 4. + +=== Getting Ready + +Make sure you have a running Solr, following the steps in xref:tutorial-films.adoc#restart-solr[]. + +=== Preparing for the Vector data + +[,console] +---- +$ bin/solr create -c films +---- + +Because we didn't specify a ConfigSet when we created the collection, we will use the `_default` ConfigSet. + +First we need to update our schema to add the vector field type, the field to hold the vector values and some supporting fields. + +[,console] +---- +$ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field-type" : { + "name":"knn_vector_10", + "class":"solr.DenseVectorField", + "vectorDimension":10, + "similarityFunction":"cosine", + "knnAlgorithm":"hnsw" + }, + "add-field" : [ + { + "name":"film_vector", + "type":"knn_vector_10", + "indexed":true, + "stored":true + }, + { + "name":"name", + "type":"text_general", + "multiValued":false, + "stored":true + }, + { + "name":"initial_release_date", + "type":"pdate", + "stored":true + } + ] +}' +---- + +=== Now index the Films data with Vectors + +We have the vectors embedded in our `films.json` file, so lets index that data, taking advantage of our new schema field we just defined. + +[.dynamic-tabs] +-- +[example.tab-pane#unixindexjson] +==== +[.tab-label]*Linux/Mac* + +[,console] +---- +$ bin/solr post -c films example/films/films.json + +---- +==== + +[example.tab-pane#winindexjson] +==== +[.tab-label]*Windows* + +[,console] +---- +$ bin/solr post -c films example\films\films.json +---- +==== +-- + +=== Let's do some Vector searches +Before making the queries, we define an example target vector, simulating a person that +watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the Chamber of Secrets_. +We get the vector of each movie, then calculate the resulting average vector, which will +be used as input vector for all the following example queries. + +``` +[-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 0.0859, -0.1789] +``` + +[NOTE] +==== +Interested in calculating the vector using Solr's streaming capability? Try out: Review Comment: Nice idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org