Shekharrajak commented on issue #18439:
URL: https://github.com/apache/druid/issues/18439#issuecomment-4236351269

   
   ###  Proposed Plan
   
   *   **`KafkaShareIndexTask`:**  A new task type that `subscribe()`s to a 
Share Group, using the `ShareConsumer` API to poll for records across all topic 
partitions.
   *   **Simplified Supervisor:** The `Supervisor` will manage the number of 
Druid tasks, removing the need for partition assignment logic.
   *   **Consistency Model:** This is an **opt-in mode** for users that 
prioritize scale and throughput; strict per-partition ordering is not 
guaranteed.
   
   ### Proposed Configuration
   
   Users will enable Share Group ingestion in the `ioConfig`:
   
   ```json
   "ioConfig": {
     "type": "kafka",
     "useShareGroup": true,
     "numTasks": 20  // Can exceed partition count
   }
   ```
   -----
   
   
   This feature should be the default choice for Druid users when:
   
   -     Strict ordering is not required: Since multiple tasks pull from one 
partition, records may be processed out of sequence (perfectly fine for most 
time-series OLAP use cases).
   
   - Unpredictable Spikes: When ingestion traffic is bursty and requires rapid 
scaling.
   
   -     High Computational Cost: When Druid is doing complex flattenSpec or 
transformSpec operations during ingestion that slow down individual tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to