Yes, I know it doesn’t work. It creates an index that violates some basic 
invariants, like having one ID map to one document. It does weird things, like 
return one document but list two documents in the facet counts with different 
values for the same single-valued field.

I’m trying to patch it back into a consistent state while we wait for the next 
full reindex.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 1:44 PM, Shawn Heisey <elyog...@elyograg.org> wrote:
> 
> On 5/24/23 10:48, Walter Underwood wrote:
>> I think I know how we got into this mess. The cluster is configured and 
>> deployed into Kubernetes. I think it was rebuilt with more shards then the 
>> existing storage volumes were mounted for the matching shards. New shards 
>> got empty volumes. Then the content was reloaded without a delete-all.
> 
> You're probably aware... that approach to re-sharding just plain will not 
> work.  Increasing or decreasing the shard count of a compositeid-routed 
> collection requires re-indexing from scratch.  The only way to add shards to 
> an existing collection is to use SPLITSHARD, unless it's using the implicit 
> router.
> 
> I've seen discussion of a rebalance API, but no implementation.  It would not 
> be easy to implement.  I have thought of one approach that might make it 
> doable ... but it might not be possible to send any updates to the collection 
> until the entire rebalance is complete. Assuming it's even possible, the 
> approach I thought of would require a LOT of extra disk space, a lot of extra 
> bandwidth usage, and would take much longer to run than an optimize.  It 
> might even take longer than doing a full re-index from the source system.
> 
> Thanks,
> Shawn

Reply via email to