Shekharrajak commented on issue #18439: URL: https://github.com/apache/druid/issues/18439#issuecomment-4237622070
> This would be useful in situations where a single partition, or small set of partitions, has more data to process than the other partitions. Yes, Everyone is going to want Share Groups once they see how much it simplifies their stack. It basically turns Kafka into a 'do-it-all' bus, handling everything from real-time event streams to RabbitMQ-style work queues and Google Pub/Sub-style scaling. It’s the end of needing a different messaging platform for every specific use case. 1. Elastic Scaling Beyond Partition Limits. 2. Eliminating "Head-of-Line" Blocking: In standard consumer groups, a single malformed or "heavy" message can stall an entire partition's processing. Share groups allow other Druid tasks to skip ahead and continue ingesting healthy data, keeping your real-time dashboards fresh even when some messages are slow to process (in some partitions). 3. Simplified Operational Management: Current Druid users often have to over-provision Kafka partitions just to ensure enough ingestion parallelism. Share groups remove this need, letting you tune Kafka for storage efficiency and Druid for ingestion throughput independently. As a start, we must start with at-least-once semantics since all the features required while recovery of the druid (in case of any failure) is available in Kafka Share group like make records available back after timeout and re-deliver. Acknowledge the record once it is persisted to druid node. Why I am not pushing for exactly once semantics in initial version is because of transaction sessions gap in share group for which I am already working on : - [KIP-1310: General Transaction Session](https://cwiki.apache.org/confluence/x/nJY8G) this have details about how external coordinators should be talking to Kafka broker for atomic write. - [KIP-1289 Support Transactional Acknowledgments for Share Groups](https://cwiki.apache.org/confluence/x/J448G) - Untill and unless we have support of transactional acknowledgement feature available we will not be able to guarantee **no data loss** because in share group in-flight records state is with Kafka broker and there is no API to reset (or seek) partition offset read during the recovery cases. > is KafkaShareIndexTask a user facing task? let me write the detail design, code changes overview here : https://github.com/Shekharrajak/druid/wiki/Queue-Semantics-support-in-Kafka-Ingestion and will share with the diagram . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
