Hi All,
Pinging again for some assistance.

On Wed, Nov 29, 2023 at 7:11 PM Saksham Gupta <saksham.gu...@indiamart.com>
wrote:

> Hi Solr Developers,
>
> Problem Statement
>
> We have been using solr cloud with implicit sharding. The data of the
> collection was divided into 8 shards. In order to reduce the response time,
> we thought of sharding the data further.
>
> Therefore we planned on sharding the solr data into 56 shards to reduce
> response time. According to this sharding strategy, one of the values of a
> multivalued field is being used to decide the shard of the document.
>
> But this has led to loss of documents.
>
> How is the loss Happening? Explaining the problem with an example:
>
> Consider 3 solr Documents:
>
> Doc1
>
> {
>
> FieldA: id21, id29, id60P;
>
> Field2: val2;
>
> }
>
> Doc2
>
> {
>
> FieldA: id19, id9, id8P;
>
> Field2: val1;
>
> }
>
> Doc1
>
> {
>
> FieldA: id101, id29, id108P;
>
> Field2: val4;
>
> }
>
> While Querying on Solr:
>
> Let’s consider the Query---  fq=FieldA: id21+id8+id108;
>
> According to previous sharding, Doc1, Doc2, & Doc3 will be returned in
> the results as the filter query matches with at least one values present in
> each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3.
>
>
> According to the new sharding, only Doc2 and Doc3 will be returned and
> Doc1 will not be included in results because the query will be routed
> only to the shards corresponding to values present in filter query i.e.
> shard21,shard8,shard108 and Doc1 is present on shard60.
>
> INDEXING
>
>
> QUERYING ON THIS COLLECTION
>
> And our query won’t even go to the shard that contains document1.
> Therefore, document1 will not be returned in the results.
>
> Probable Solutions
>
> To deal with this, we can index the same document on multiple shards based
> on all the values of the field. But handling indexing/deletion if the
> values of this field is changed would be very complicated. So, this index
> can be very complex to maintain.
>
> Is this the most optimal way or is there a better way to achieve the goal
> and avoid losing any documents?
>
>

Reply via email to