Multivalued doc value field
Hi, I am getting an error while doing a copy from a field to another but the second field is a docValues which I am using PersonIDSDV to run StreamingExpressions. As I can see in this page I should be able use multivalued fields with docValues. https://solr.apache.org/guide/8_1/docvalues.html Any ideas or ways to solve this? *Error:* Multiple values encountered for non multiValued copy field PersonIDSDV *Schema* Thanks a lot Sergio
Re: Multivalued doc value field
On 3/10/22 08:03, Sergio García Maroto wrote: *Error:* Multiple values encountered for non multiValued copy field PersonIDSDV *Schema* Best guess: You changed the schema file so it's multiValued, but did not reload the core. Or maybe it's SolrCloud, and you changed the schema file on disk, but didn't upload the changes to zookeeper, or didn't reload the collection after uploading the changes. The error is saying that the in-memory schema definition for that field on that index is not multiValued. If there are already documents in the index with values in that field indexed without multiValued and you switched it to multiValued, then you're going to have to completely reindex after completely deleting the existing index directory. This is a Lucene docValues requirement, it doesn't come from Solr. Thanks, Shawn
Question regarding the MoreLikeThis features
Hi all, This is my first time writing to this mailing list and I would like to thank you in advance for your attention. I am writing because I am having problems using the "MoreLikeThis" features. I am working in a Solr cluster (version 8.11.1) consisting of multiple nodes, each of which contains multiple shards. It is a quite big cluster and data is sharded using implicit routing and documents are distributed by date on monthly shards. Here are the fields that I'm using: * UniqueReference: the unique reference of a document * DocumentDate: the date of a document (in the standar Solr format) * DataType: the data type of the document (let's say that can be A or B) * Content: the content of a document (a string) Here is what my managed schema looks like ... ... The task that I want to perform is the following: Given the unique reference of a document of type A, I want to find the documents of data type B and in a fixed time interval, that have the most similar content. Here the first questions: 1. Which is the best solr request to perform this task? 2. Is there a parameter that allows me to restrict the corpus of documents that are analyzed for the return of similar contents? it should be noted that this corpus of documents may not contain the initial document from which I am starting Initially I thought about using the "mlt" endpoint, but since there was no parameter in the documentation that would allow me to select the shard on which to direct the query (I absolutely need it, otherwise I risk putting a strain on my cluster), I opted to use the "select" endpoint, with the "mlt" parameter set to true, and the "shards" parameter. Those are the parameters that I am using: * q: "UniqueReference:doc_id" * fq: "(DocumentDate:[2022-01-22T00:00:00Z TO 2022-01-26T00:00:00Z] AND DataType:B) OR (UniqueReference:doc_id)" * mlt: true * mlt.fl: "Content" * shards: "shard_202201" I realize that the "fq" parameter is used in a bizarre way. In theory it should be aimed at the documents of the main query (in my case the source document). It is an attempt to solve problem (2) (which didn't work, actually). Anyway, my doubts are not limited to this. What really surprises me is the structure of the response that Solr returns to me. The content of response looks like this: { "response" : { "docs" : [], ... } "moreLikeThis" : ... } The weird stuff appear in the "moreLikeThis" part. Sometimes Solr is returning me a list, other times a dictionary. Repeating the same call several times the two possibilities occur repeatedly, apparently without a logical pattern, and I have not been able to understand why. And to be precise, in both cases the documents contained in the answer are not necessarily of data type B, as requested by me with the "fq" parameter. In the "dictionary" case, there is only one key, which is the UniqueReference of the source document and the corresponding value are similar documents. In the "list" case, the second element contains the required documents So, here is the last question: 1. I am perfectly aware that I am lost, therefore, what I'm missing? I thank everyone for the attention you have dedicated to me. Greetings from Italy. I'm available for clarifications, Marco
Re: Prometheus solr 7.2.1
Hello Dan, I would recommend using a later version of the Prometheus Exporter, as early versions had some pretty nasty bugs. I would try using 8.x, and it should be pretty apparent if it's working or not from the start. At the very least, use 7.7.3. As for deployment, you definitely want to have it running in a separate container, connecting to the same zk instances. I would only run 1 solr-exporter docker container, but if you can guarantee that it will be brought back up if it dies (e.g. via a Deployment in Kubernetes). Running multiple would likely be fine, but adds additional strain on your resources for very little gain. If you do run multiple, make sure prometheus knows to only scrape from one instance at a time, so maybe you put a load balancer in front of the containers. - Houston On Wed, Feb 23, 2022 at 6:04 AM Dan Rosher wrote: > Hi, > > At our organisation we're still on solr 7.2.1. We'd like to use prometheus, > just wondering if anyone had knowledge that the 7.3 prometheus contrib will > work with solr 7.2.1? > > Also we're thinking of having it work on a separate docker container to the > solr docker containers, connecting to the same zk instances as the solr > containers, do others have experience of setting it up this way? > > To avoid a single point of failure, I think we'd need to run multiple > solr-exporter docker instances, but can prometheus scrap from multiple > solr-exporter instances? > > Many thanks in advance for any advice, > > Kind regards, > Dan >
Re: Question regarding the MoreLikeThis features
Marco, Finding 'similar' documents will end up being weighted by document length. I would recommend, at the point of indexing, also indexing an ordered token set of the first 256, 1024 up to around 5k tokens (depending on document lengths). What this does is allow a vector to vector normalized comparison. You could then query for similar possibile documents directly and build a normalized vector with respect to the query document. Normalizing schemes in something like an inverted index will tend to weight the lower token count documents over higher token count documents. So the above is an attempt to get at a normalized and comparable view between documents independent of size. Next you end up normalizing by the inverse of a commonality. That is, a more common token is weighted lower than a least common token. (I would also discount tokens which have a raw frequency below 5.). At the point you have a normalized vector, you can use that to find similarities weighted by more meaningful tokens. tim On Thu, Mar 10, 2022 at 9:18 AM Marco D'Ambra wrote: > Hi all, > This is my first time writing to this mailing list and I would like to > thank you in advance for your attention. > I am writing because I am having problems using the "MoreLikeThis" > features. > I am working in a Solr cluster (version 8.11.1) consisting of multiple > nodes, each of which contains multiple shards. > > It is a quite big cluster and data is sharded using implicit routing and > documents are distributed by date on monthly shards. > > Here are the fields that I'm using: > > * UniqueReference: the unique reference of a document > * DocumentDate: the date of a document (in the standar Solr format) > * DataType: the data type of the document (let's say that can be A or > B) > * Content: the content of a document (a string) > Here is what my managed schema looks like > ... > required="true" /> > > required="true" /> > > required="true" /> > > required="false" /> > ... > > > The task that I want to perform is the following: > Given the unique reference of a document of type A, I want to find the > documents of data type B and in a fixed time interval, that have the most > similar content. > Here the first questions: > > 1. Which is the best solr request to perform this task? > 2. Is there a parameter that allows me to restrict the corpus of > documents that are analyzed for the return of similar contents? it should > be noted that this corpus of documents may not contain the initial document > from which I am starting > Initially I thought about using the "mlt" endpoint, but since there was no > parameter in the documentation that would allow me to select the shard on > which to direct the query (I absolutely need it, otherwise I risk putting a > strain on my cluster), I opted to use the "select" endpoint, with the "mlt" > parameter set to true, and the "shards" parameter. > Those are the parameters that I am using: > > * q: "UniqueReference:doc_id" > * fq: "(DocumentDate:[2022-01-22T00:00:00Z TO 2022-01-26T00:00:00Z] > AND DataType:B) OR (UniqueReference:doc_id)" > * mlt: true > * mlt.fl: "Content" > * shards: "shard_202201" > I realize that the "fq" parameter is used in a bizarre way. In theory it > should be aimed at the documents of the main query (in my case the source > document). It is an attempt to solve problem (2) (which didn't work, > actually). > Anyway, my doubts are not limited to this. What really surprises me is the > structure of the response that Solr returns to me. > The content of response looks like this: > { > "response" : { > "docs" : [], > ... > } > "moreLikeThis" : ... > } > The weird stuff appear in the "moreLikeThis" part. Sometimes Solr is > returning me a list, other times a dictionary. Repeating the same call > several times the two possibilities occur repeatedly, apparently without a > logical pattern, and I have not been able to understand why. > And to be precise, in both cases the documents contained in the answer are > not necessarily of data type B, as requested by me with the "fq" parameter. > In the "dictionary" case, there is only one key, which is the > UniqueReference of the source document and the corresponding value are > similar documents. > In the "list" case, the second element contains the required documents > So, here is the last question: > > 1. I am perfectly aware that I am lost, therefore, what I'm missing? > I thank everyone for the attention you have dedicated to me. Greetings > from Italy. > I'm available for clarifications, > > Marco > >
copyField dest is not an explicit field and doesn't match a dynamic field
Hi all, trying to POST to .../update/json/docs, payload includes "DISPLAY_MAPPING" : "foo", and the result is a 500 with ``` null:org.apache.solr.common.SolrException: copyField dest :'doc.DISPLAY_MAPPING_str' is not an explicit field and doesn't match a dynamicField. ``` Which is fine as far as that goes: the only place this can come from is AddSchemaFields w/ the default mapping as: ``` java.lang.String text_general 256 ``` Except that DISPLAY_MAPPING exists in the schema as org.apache.solr.schema.StrField and I can't figure out why it is triggering AddSchemaField on it in the first place. Any suggestions? TIA Dima
copyField dest is not an explicit field and doesn't match a dynamic field
PS Solr 8.7.0 w/ manually updated JARs for the recent vuln. Dima
Re: copyField dest is not an explicit field and doesn't match a dynamic field
On 3/10/2022 4:24 PM, dmitri maziuk wrote: Which is fine as far as that goes: the only place this can come from is AddSchemaFields w/ the default mapping as: ``` java.lang.String text_general 256 ``` I am not very familiar with the update processor that adds fields. But if I understand that correctly, it means that for any field matching that specification, it's going to do a copyField, which means that if you have a string field named "DISPLAY_MAPPING" it's going to try to copy it to another field named "DISPLAY_MAPPING_str" ... and the error is saying that there's nothing in the schema that can handle the "DISPLAY_MAPPING_str" field. You can either remove the copyField specification, or add something to the schema that will handle the destination field name. The _default schema that comes with Solr contains the following, which would handle that: docValues="true" indexed="false" useDocValuesAsStored="false"/> Thanks, Shawn
Re: copyField dest is not an explicit field and doesn't match a dynamic field
On 2022-03-10 6:07 PM, Shawn Heisey wrote: I am not very familiar with the update processor that adds fields. But if I understand that correctly, it means that for any field matching that specification, it's going to do a copyField ... Thanks, I get that. The comment (inherited from 6.x days) says it is for adding "unknown fields to the schema" and it looks like the comment is actually wrong: it seems to be adding _str for *every* field (or at least more than just that one: I managed to get trigger the same error on other fields). Which doesn't sound like a smart thing to do and actually doesn't make sense for the field that initially triggered this, and some others. Looks like that default definition for 'somehow got lost in the 6.6->8.7 upgrade here. Or maybe got dropped deliberately. Thanks again, Dima
Solr Collections Join
Hello All, I have a requirement to join 2 collections and get fields from both the collections. I have got the join query as below, when i run below join query I am getting the fields of Collection1 only. is There any way I can get the fields from collection2 as well ? Running below query on Collection1. {!join method="crossCollection" fromIndex="collection2" from="id" to="id" v="*:*"} Any help here is much appreciated !! Thanks, Venkat.
Re: Solr Collections Join
Is this a SolrCloud setup? On Thu, Mar 10, 2022, 22:25 Venkateswarlu Bommineni wrote: > Hello All, > > I have a requirement to join 2 collections and get fields from both the > collections. > > I have got the join query as below, when i run below join query I am > getting the fields of Collection1 only. > > is There any way I can get the fields from collection2 as well ? > > Running below query on Collection1. > {!join method="crossCollection" fromIndex="collection2" from="id" to="id" > v="*:*"} > > > Any help here is much appreciated !! > > Thanks, > Venkat. >