yashmayya opened a new pull request, #15930: URL: https://github.com/apache/pinot/pull/15930
- Currently, the minimize data movement algorithm for instance assignment isn't very effective for realtime tables where the number of partitions per replica group configured in the instance assignment config is 0 / 1 (or practically anything that goes out of sync with the actual number of stream partitions). - See the scenario described in https://github.com/apache/pinot/issues/14151 for instance. If `numPartitions` in `replicaGroupPartitionConfig` in the instance assignment config is set to 0 / 1, what happens is that each replica group is created with a single partition and the actual stream partitions are assigned uniformly across the instances using a simple modulo based mechanism (see [here](https://github.com/apache/pinot/blob/cc3eb2e9c296761aa3db7e7e6a9fa7d66bd213bc/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/assignment/segment/RealtimeSegmentAssignment.java#L147-L159)). Due to this, in cases where there are scale ups or scale downs, a number of partitions could be moved across instances even when `minimizeDataMovement` is enabled for the rebalance. For many user scenarios it could be inconvenient to have to always specify an explicit number of partitions in the instance assignment config and keep it updated based on any repartitioning upstream. - This patch introduces a new `ImplicitRealtimeTablePartitionSelector` strategy for instance assignment, where the number of partitions within each replica group is determined using the number of partitions from the source stream for a realtime table. It also enforces the use of 1 instance per partition (per replica group) for upsert tables. - Note that repartitioning upstream won't be automatically detected; instead, a rebalance (with reassign instances enabled) should be triggered which will update the `INSTANCE_PARTITIONS` as well as the segment assignment. - A test has been added to demonstrate how this strategy works in a scenario where the source stream is repartitioned and then new instances are added to take over the new partitions (with older partitions not being moved across instances). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
