Re: Prevent Loss of Documents after Implicit Sharding
I may have got this wrong, but I think it might be better to shard randomly, not on a value from one of your source documents, as otherwise certain searches will only hit some of the shards and possibly overload them. This might also be the cause of the behaviour below. Charlie On 30/11/2023 04:36, Saksham Gupta wrote: Hi All, Pinging again for some assistance. On Wed, Nov 29, 2023 at 7:11 PM Saksham Gupta wrote: Hi Solr Developers, Problem Statement We have been using solr cloud with implicit sharding. The data of the collection was divided into 8 shards. In order to reduce the response time, we thought of sharding the data further. Therefore we planned on sharding the solr data into 56 shards to reduce response time. According to this sharding strategy, one of the values of a multivalued field is being used to decide the shard of the document. But this has led to loss of documents. How is the loss Happening? Explaining the problem with an example: Consider 3 solr Documents: Doc1 { FieldA: id21, id29, id60P; Field2: val2; } Doc2 { FieldA: id19, id9, id8P; Field2: val1; } Doc1 { FieldA: id101, id29, id108P; Field2: val4; } While Querying on Solr: Let’s consider the Query--- fq=FieldA: id21+id8+id108; According to previous sharding, Doc1, Doc2, & Doc3 will be returned in the results as the filter query matches with at least one values present in each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3. According to the new sharding, only Doc2 and Doc3 will be returned and Doc1 will not be included in results because the query will be routed only to the shards corresponding to values present in filter query i.e. shard21,shard8,shard108 and Doc1 is present on shard60. INDEXING QUERYING ON THIS COLLECTION And our query won’t even go to the shard that contains document1. Therefore, document1 will not be returned in the results. Probable Solutions To deal with this, we can index the same document on multiple shards based on all the values of the field. But handling indexing/deletion if the values of this field is changed would be very complicated. So, this index can be very complex to maintain. Is this the most optimal way or is there a better way to achieve the goal and avoid losing any documents? -- Charlie Hull - Managing Consultant at OpenSource Connections Limited Founding member of The Search Network and co-author of Searching the Enterprise tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin Amtsgericht Charlottenburg | HRB 230712 B Geschäftsführer: John M. Woodell | David E. Pugh Finanzamt: Berlin Finanzamt für Körperschaften II
Re: Prevent Loss of Documents after Implicit Sharding
I thought multi valued field was not supported as routing field? You'll likely need to choose a single-valued stable property for routing and not a value that a field that a single document can have several different values for. So have a look at your schema for other candidate single-valued routing fields. If you cannot find one, perhaps compositeID (i.e. hash-based) is better for you. Having 8 shards on CompositeID, you could easily go to 16 -> 32 -> 64 by splitting your existing shards. But also in that case you'd need to have some stable single-valued ID to route on if you want more efficient queries, not hitting all shards every time. https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html Jan > 29. nov. 2023 kl. 14:41 skrev Saksham Gupta > : > > one of the values of a > multivalued field is being used to decide the shard of the document.
Invalid JSON response with UUID field
Hi, I have a schema with a UUID field type configured as a unique key. I recently upgraded my Solr installation to 9.3 (from 7.6) and my application stopped working. It turns out that Solr has stopped encoding UUIDs as strings in the JSON response writer. Whereas before I would get: "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9", Now I get: "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae, Of course, UUIDs are not a valid JSON data type, so this looks like a bug to me? -Andrew
Re: Invalid JSON response with UUID field
On 11/26/23 03:40, Andrew Hankinson wrote: I recently upgraded my Solr installation to 9.3 (from 7.6) and my application stopped working. It turns out that Solr has stopped encoding UUIDs as strings in the JSON response writer. Whereas before I would get: "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9", Now I get: "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae, Of course, UUIDs are not a valid JSON data type, so this looks like a bug to me? Still quoted in 8.11.2 FWIW Dima
Re: Invalid JSON response with UUID field
It might have the same root cause like https://issues.apache.org/jira/browse/SOLR-10653?filter=-3 Could you share more details about your env setup: is it "SolrCloud"? is it /get or /select ? etc. On Fri, Dec 1, 2023 at 12:05 AM Andrew Hankinson wrote: > Hi, > > I have a schema with a UUID field type configured as a unique key. > > multiValued="false" /> > > I recently upgraded my Solr installation to 9.3 (from 7.6) and my > application stopped working. It turns out that Solr has stopped encoding > UUIDs as strings in the JSON response writer. > > Whereas before I would get: > > "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9", > > Now I get: > > "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae, > > Of course, UUIDs are not a valid JSON data type, so this looks like a bug > to me? > > -Andrew -- Sincerely yours Mikhail Khludnev
Re: Invalid JSON response with UUID field
No SolrCloud, complete wipe and reindex of the data, select handler. > On 1 Dec 2023, at 07:54, Mikhail Khludnev wrote: > > It might have the same root cause like > https://issues.apache.org/jira/browse/SOLR-10653?filter=-3 Could you share > more details about your env setup: is it "SolrCloud"? is it /get or /select > ? etc. > >> On Fri, Dec 1, 2023 at 12:05 AM Andrew Hankinson >> wrote: >> >> Hi, >> >> I have a schema with a UUID field type configured as a unique key. >> >> > multiValued="false" /> >> >> I recently upgraded my Solr installation to 9.3 (from 7.6) and my >> application stopped working. It turns out that Solr has stopped encoding >> UUIDs as strings in the JSON response writer. >> >> Whereas before I would get: >> >> "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9", >> >> Now I get: >> >> "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae, >> >> Of course, UUIDs are not a valid JSON data type, so this looks like a bug >> to me? >> >> -Andrew > > > > -- > Sincerely yours > Mikhail Khludnev