Shekharrajak commented on issue #18439:
URL: https://github.com/apache/druid/issues/18439#issuecomment-4236351269
### Proposed Plan
* **`KafkaShareIndexTask`:** A new task type that `subscribe()`s to a
Share Group, using the `ShareConsumer` API to poll for records across all topic
partitions.
* **Simplified Supervisor:** The `Supervisor` will manage the number of
Druid tasks, removing the need for partition assignment logic.
* **Consistency Model:** This is an **opt-in mode** for users that
prioritize scale and throughput; strict per-partition ordering is not
guaranteed.
### Proposed Configuration
Users will enable Share Group ingestion in the `ioConfig`:
```json
"ioConfig": {
"type": "kafka",
"useShareGroup": true,
"numTasks": 20 // Can exceed partition count
}
```
-----
This feature should be the default choice for Druid users when:
- Strict ordering is not required: Since multiple tasks pull from one
partition, records may be processed out of sequence (perfectly fine for most
time-series OLAP use cases).
- Unpredictable Spikes: When ingestion traffic is bursty and requires rapid
scaling.
- High Computational Cost: When Druid is doing complex flattenSpec or
transformSpec operations during ingestion that slow down individual tasks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]