Hi all, This is my first time writing to this mailing list and I would like to thank you in advance for your attention. I am writing because I am having problems using the "MoreLikeThis" features. I am working in a Solr cluster (version 8.11.1) consisting of multiple nodes, each of which contains multiple shards.
It is a quite big cluster and data is sharded using implicit routing and documents are distributed by date on monthly shards. Here are the fields that I'm using: * UniqueReference: the unique reference of a document * DocumentDate: the date of a document (in the standar Solr format) * DataType: the data type of the document (let's say that can be A or B) * Content: the content of a document (a string) Here is what my managed schema looks like ... <field name="UniqueReference" type="string" indexed="true" stored="true" required="true" /> <field name="DocumentDate" type="pdate" indexed="true" stored="false" required="true" /> <field name="DataType" type="string" indexed="true" stored="false" required="true" /> <field name="Content_en" type="text_en" indexed="true" stored="true" required="false" /> ... The task that I want to perform is the following: Given the unique reference of a document of type A, I want to find the documents of data type B and in a fixed time interval, that have the most similar content. Here the first questions: 1. Which is the best solr request to perform this task? 2. Is there a parameter that allows me to restrict the corpus of documents that are analyzed for the return of similar contents? it should be noted that this corpus of documents may not contain the initial document from which I am starting Initially I thought about using the "mlt" endpoint, but since there was no parameter in the documentation that would allow me to select the shard on which to direct the query (I absolutely need it, otherwise I risk putting a strain on my cluster), I opted to use the "select" endpoint, with the "mlt" parameter set to true, and the "shards" parameter. Those are the parameters that I am using: * q: "UniqueReference:doc_id" * fq: "(DocumentDate:[2022-01-22T00:00:00Z TO 2022-01-26T00:00:00Z] AND DataType:B) OR (UniqueReference:doc_id)" * mlt: true * mlt.fl: "Content" * shards: "shard_202201" I realize that the "fq" parameter is used in a bizarre way. In theory it should be aimed at the documents of the main query (in my case the source document). It is an attempt to solve problem (2) (which didn't work, actually). Anyway, my doubts are not limited to this. What really surprises me is the structure of the response that Solr returns to me. The content of response looks like this: { "response" : { "docs" : [], ... } "moreLikeThis" : ... } The weird stuff appear in the "moreLikeThis" part. Sometimes Solr is returning me a list, other times a dictionary. Repeating the same call several times the two possibilities occur repeatedly, apparently without a logical pattern, and I have not been able to understand why. And to be precise, in both cases the documents contained in the answer are not necessarily of data type B, as requested by me with the "fq" parameter. In the "dictionary" case, there is only one key, which is the UniqueReference of the source document and the corresponding value are similar documents. In the "list" case, the second element contains the required documents So, here is the last question: 1. I am perfectly aware that I am lost, therefore, what I'm missing? I thank everyone for the attention you have dedicated to me. Greetings from Italy. I'm available for clarifications, Marco