Multivalued doc value field

2022-03-10 Thread Sergio García Maroto
Hi,

I am getting an error while doing a copy from a field to another but the
second field is a docValues which
I am using PersonIDSDV to run StreamingExpressions.

As I can see in this page I should be able use multivalued fields with
docValues.
https://solr.apache.org/guide/8_1/docvalues.html

Any ideas or ways to solve this?

*Error:*
Multiple values encountered for non multiValued copy field PersonIDSDV

*Schema*






Thanks a lot
Sergio


Re: Multivalued doc value field

2022-03-10 Thread Shawn Heisey

On 3/10/22 08:03, Sergio García Maroto wrote:

*Error:*
Multiple values encountered for non multiValued copy field PersonIDSDV

*Schema*







Best guess:  You changed the schema file so it's multiValued, but did 
not reload the core.


Or maybe it's SolrCloud, and you changed the schema file on disk, but 
didn't upload the changes to zookeeper, or didn't reload the collection 
after uploading the changes.


The error is saying that the in-memory schema definition for that field 
on that index is not multiValued.


If there are already documents in the index with values in that field 
indexed without multiValued and you switched it to multiValued, then 
you're going to have to completely reindex after completely deleting the 
existing index directory.  This is a Lucene docValues requirement, it 
doesn't come from Solr.


Thanks,
Shawn



Question regarding the MoreLikeThis features

2022-03-10 Thread Marco D'Ambra
Hi all,
This is my first time writing to this mailing list and I would like to thank 
you in advance for your attention.
I am writing because I am having problems using the "MoreLikeThis" features.
I am working in a Solr cluster (version 8.11.1) consisting of multiple nodes, 
each of which contains multiple shards.

It is a quite big cluster and data is sharded using implicit routing and 
documents are distributed by date on monthly shards.

Here are the fields that I'm using:

  *   UniqueReference: the unique reference of a document
  *   DocumentDate: the date of a document (in the standar Solr format)
  *   DataType: the data type of the document (let's say that can be A or B)
  *   Content: the content of a document (a string)
Here is what my managed schema looks like
...







...


The task that I want to perform is the following:
Given the unique reference of a document of type A, I want to find the 
documents of data type B and in a fixed time interval, that have the most 
similar content.
Here the first questions:

  1.  Which is the best solr request to perform this task?
  2.  Is there a parameter that allows me to restrict the corpus of documents 
that are analyzed for the return of similar contents? it should be noted that 
this corpus of documents may not contain the initial document from which I am 
starting
Initially I thought about using the "mlt" endpoint, but since there was no 
parameter in the documentation that would allow me to select the shard on which 
to direct the query (I absolutely need it, otherwise I risk putting a strain on 
my cluster), I opted to use the "select" endpoint, with the "mlt" parameter set 
to true, and the "shards" parameter.
Those are the parameters that I am using:

  *   q: "UniqueReference:doc_id"
  *   fq: "(DocumentDate:[2022-01-22T00:00:00Z TO 2022-01-26T00:00:00Z] AND 
DataType:B) OR (UniqueReference:doc_id)"
  *   mlt: true
  *   mlt.fl: "Content"
  *   shards: "shard_202201"
I realize that the "fq" parameter is used in a bizarre way. In theory it should 
be aimed at the documents of the main query (in my case the source document). 
It is an attempt to solve problem (2) (which didn't work, actually).
Anyway, my doubts are not limited to this. What really surprises me is the 
structure of the response that Solr returns to me.
The content of response looks like this:
{
"response" : {
"docs" : [],
...
}
"moreLikeThis" : ...
}
The weird stuff appear in the "moreLikeThis" part. Sometimes Solr is returning 
me a list, other times a dictionary. Repeating the same call several times the 
two possibilities occur repeatedly, apparently without a logical pattern, and I 
have not been able to understand why.
And to be precise, in both cases the documents contained in the answer are not 
necessarily of data type B, as requested by me with the "fq" parameter.
In the "dictionary" case, there is only one key, which is the UniqueReference 
of the source document and the corresponding value are similar documents.
In the "list" case, the second element contains the required documents
So, here is the last question:

  1.  I am perfectly aware that I am lost, therefore, what I'm missing?
I thank everyone for the attention you have dedicated to me. Greetings from 
Italy.
I'm available for clarifications,

Marco



Re: Prometheus solr 7.2.1

2022-03-10 Thread Houston Putman
Hello Dan,

I would recommend using a later version of the Prometheus Exporter, as
early versions had some pretty nasty bugs. I would try using 8.x, and it
should be pretty apparent if it's working or not from the start. At the
very least, use 7.7.3.

As for deployment, you definitely want to have it running in a separate
container, connecting to the same zk instances. I would only run 1
solr-exporter docker container, but if you can guarantee that it will be
brought back up if it dies (e.g. via a Deployment in Kubernetes).
Running multiple would likely be fine, but adds additional strain on your
resources for very little gain. If you do run multiple, make sure
prometheus knows to only scrape from one instance at a time, so maybe you
put a load balancer in front of the containers.

- Houston

On Wed, Feb 23, 2022 at 6:04 AM Dan Rosher  wrote:

> Hi,
>
> At our organisation we're still on solr 7.2.1. We'd like to use prometheus,
> just wondering if anyone had knowledge that the 7.3 prometheus contrib will
> work with solr 7.2.1?
>
> Also we're thinking of having it work on a separate docker container to the
> solr docker containers,  connecting to the same zk instances as the solr
> containers, do others have experience of setting it up this way?
>
> To avoid a single point of failure, I think we'd need to run multiple
> solr-exporter docker instances, but can prometheus scrap from multiple
> solr-exporter instances?
>
> Many thanks in advance for any advice,
>
> Kind regards,
> Dan
>


Re: Question regarding the MoreLikeThis features

2022-03-10 Thread Tim Casey
Marco,

Finding 'similar' documents will end up being weighted by document length.
I would recommend, at the point of indexing, also indexing an ordered token
set of the first 256, 1024 up to around 5k tokens (depending on document
lengths).  What this does is allow a vector to vector normalized
comparison.  You could then query for similar possibile documents directly
and build a normalized vector with respect to the query document.

Normalizing schemes in something like an inverted index will tend to weight
the lower token count documents over higher token count documents.  So the
above is an attempt to get at a normalized and comparable view between
documents independent of size.  Next you end up normalizing by the inverse
of a commonality.  That is, a more common token is weighted lower than a
least common token.  (I would also discount tokens which have a raw
frequency below 5.). At the point you have a normalized vector, you can use
that to find similarities weighted by more meaningful tokens.

tim

On Thu, Mar 10, 2022 at 9:18 AM Marco D'Ambra  wrote:

> Hi all,
> This is my first time writing to this mailing list and I would like to
> thank you in advance for your attention.
> I am writing because I am having problems using the "MoreLikeThis"
> features.
> I am working in a Solr cluster (version 8.11.1) consisting of multiple
> nodes, each of which contains multiple shards.
>
> It is a quite big cluster and data is sharded using implicit routing and
> documents are distributed by date on monthly shards.
>
> Here are the fields that I'm using:
>
>   *   UniqueReference: the unique reference of a document
>   *   DocumentDate: the date of a document (in the standar Solr format)
>   *   DataType: the data type of the document (let's say that can be A or
> B)
>   *   Content: the content of a document (a string)
> Here is what my managed schema looks like
> ...
>  required="true" />
>
>  required="true" />
>
>  required="true" />
>
>  required="false" />
> ...
>
>
> The task that I want to perform is the following:
> Given the unique reference of a document of type A, I want to find the
> documents of data type B and in a fixed time interval, that have the most
> similar content.
> Here the first questions:
>
>   1.  Which is the best solr request to perform this task?
>   2.  Is there a parameter that allows me to restrict the corpus of
> documents that are analyzed for the return of similar contents? it should
> be noted that this corpus of documents may not contain the initial document
> from which I am starting
> Initially I thought about using the "mlt" endpoint, but since there was no
> parameter in the documentation that would allow me to select the shard on
> which to direct the query (I absolutely need it, otherwise I risk putting a
> strain on my cluster), I opted to use the "select" endpoint, with the "mlt"
> parameter set to true, and the "shards" parameter.
> Those are the parameters that I am using:
>
>   *   q: "UniqueReference:doc_id"
>   *   fq: "(DocumentDate:[2022-01-22T00:00:00Z TO 2022-01-26T00:00:00Z]
> AND DataType:B) OR (UniqueReference:doc_id)"
>   *   mlt: true
>   *   mlt.fl: "Content"
>   *   shards: "shard_202201"
> I realize that the "fq" parameter is used in a bizarre way. In theory it
> should be aimed at the documents of the main query (in my case the source
> document). It is an attempt to solve problem (2) (which didn't work,
> actually).
> Anyway, my doubts are not limited to this. What really surprises me is the
> structure of the response that Solr returns to me.
> The content of response looks like this:
> {
> "response" : {
> "docs" : [],
> ...
> }
> "moreLikeThis" : ...
> }
> The weird stuff appear in the "moreLikeThis" part. Sometimes Solr is
> returning me a list, other times a dictionary. Repeating the same call
> several times the two possibilities occur repeatedly, apparently without a
> logical pattern, and I have not been able to understand why.
> And to be precise, in both cases the documents contained in the answer are
> not necessarily of data type B, as requested by me with the "fq" parameter.
> In the "dictionary" case, there is only one key, which is the
> UniqueReference of the source document and the corresponding value are
> similar documents.
> In the "list" case, the second element contains the required documents
> So, here is the last question:
>
>   1.  I am perfectly aware that I am lost, therefore, what I'm missing?
> I thank everyone for the attention you have dedicated to me. Greetings
> from Italy.
> I'm available for clarifications,
>
> Marco
>
>


copyField dest is not an explicit field and doesn't match a dynamic field

2022-03-10 Thread dmitri maziuk

Hi all,

trying to POST to .../update/json/docs, payload includes 
"DISPLAY_MAPPING" : "foo", and the result is a 500 with

```
null:org.apache.solr.common.SolrException: copyField dest 
:'doc.DISPLAY_MAPPING_str' is not an explicit field and doesn't match a 
dynamicField.

```

Which is fine as far as that goes: the only place this can come from is 
AddSchemaFields w/ the default mapping as:

```
java.lang.String
text_general

  256

```

Except that DISPLAY_MAPPING exists in the schema as 
org.apache.solr.schema.StrField and I can't figure out why it is 
triggering AddSchemaField on it in the first place.


Any suggestions?

TIA
Dima


copyField dest is not an explicit field and doesn't match a dynamic field

2022-03-10 Thread dmitri maziuk

PS Solr 8.7.0 w/ manually updated JARs for the recent vuln.

Dima


Re: copyField dest is not an explicit field and doesn't match a dynamic field

2022-03-10 Thread Shawn Heisey

On 3/10/2022 4:24 PM, dmitri maziuk wrote:
Which is fine as far as that goes: the only place this can come from 
is AddSchemaFields w/ the default mapping as:

```
java.lang.String
text_general

  256

```


I am not very familiar with the update processor that adds fields.  But 
if I understand that correctly, it means that for any field matching 
that specification, it's going to do a copyField, which means that if 
you have a string field named "DISPLAY_MAPPING" it's going to try to 
copy it to another field named "DISPLAY_MAPPING_str" ... and the error 
is saying that there's nothing in the schema that can handle the 
"DISPLAY_MAPPING_str" field.


You can either remove the copyField specification, or add something to 
the schema that will handle the destination field name.


The _default schema that comes with Solr contains the following, which 
would handle that:


docValues="true" indexed="false" useDocValuesAsStored="false"/>


Thanks,
Shawn



Re: copyField dest is not an explicit field and doesn't match a dynamic field

2022-03-10 Thread dmitri maziuk

On 2022-03-10 6:07 PM, Shawn Heisey wrote:

I am not very familiar with the update processor that adds fields.  But 
if I understand that correctly, it means that for any field matching 
that specification, it's going to do a copyField


...

Thanks, I get that.

The comment (inherited from 6.x days) says it is for adding "unknown 
fields to the schema" and it looks like the comment is actually wrong: 
it seems to be adding _str for *every* field (or at least more than just 
that one: I managed to get trigger the same error on other fields). 
Which doesn't sound like a smart thing to do and actually doesn't make 
sense for the field that initially triggered this, and some others.


Looks like that default definition for 'somehow got lost in the 6.6->8.7 upgrade here. Or maybe got dropped 
deliberately.


Thanks again,
Dima


Solr Collections Join

2022-03-10 Thread Venkateswarlu Bommineni
Hello All,

I have a requirement to join 2 collections and get fields from both the
collections.

I have got the join query as below, when i run below join query I am
getting the fields of Collection1 only.

is There any way I can get the fields from collection2 as well ?

Running below query on Collection1.
{!join method="crossCollection" fromIndex="collection2" from="id" to="id"
v="*:*"}


Any help here is much appreciated !!

Thanks,
Venkat.


Re: Solr Collections Join

2022-03-10 Thread Srijan
Is this a SolrCloud setup?

On Thu, Mar 10, 2022, 22:25 Venkateswarlu Bommineni 
wrote:

> Hello All,
>
> I have a requirement to join 2 collections and get fields from both the
> collections.
>
> I have got the join query as below, when i run below join query I am
> getting the fields of Collection1 only.
>
> is There any way I can get the fields from collection2 as well ?
>
> Running below query on Collection1.
> {!join method="crossCollection" fromIndex="collection2" from="id" to="id"
> v="*:*"}
>
>
> Any help here is much appreciated !!
>
> Thanks,
> Venkat.
>