Hi All, Pinging again for some assistance. On Wed, Nov 29, 2023 at 7:11 PM Saksham Gupta <saksham.gu...@indiamart.com> wrote:
> Hi Solr Developers, > > Problem Statement > > We have been using solr cloud with implicit sharding. The data of the > collection was divided into 8 shards. In order to reduce the response time, > we thought of sharding the data further. > > Therefore we planned on sharding the solr data into 56 shards to reduce > response time. According to this sharding strategy, one of the values of a > multivalued field is being used to decide the shard of the document. > > But this has led to loss of documents. > > How is the loss Happening? Explaining the problem with an example: > > Consider 3 solr Documents: > > Doc1 > > { > > FieldA: id21, id29, id60P; > > Field2: val2; > > } > > Doc2 > > { > > FieldA: id19, id9, id8P; > > Field2: val1; > > } > > Doc1 > > { > > FieldA: id101, id29, id108P; > > Field2: val4; > > } > > While Querying on Solr: > > Let’s consider the Query--- fq=FieldA: id21+id8+id108; > > According to previous sharding, Doc1, Doc2, & Doc3 will be returned in > the results as the filter query matches with at least one values present in > each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3. > > > According to the new sharding, only Doc2 and Doc3 will be returned and > Doc1 will not be included in results because the query will be routed > only to the shards corresponding to values present in filter query i.e. > shard21,shard8,shard108 and Doc1 is present on shard60. > > INDEXING > > > QUERYING ON THIS COLLECTION > > And our query won’t even go to the shard that contains document1. > Therefore, document1 will not be returned in the results. > > Probable Solutions > > To deal with this, we can index the same document on multiple shards based > on all the values of the field. But handling indexing/deletion if the > values of this field is changed would be very complicated. So, this index > can be very complex to maintain. > > Is this the most optimal way or is there a better way to achieve the goal > and avoid losing any documents? > >