Shekharrajak commented on issue #18439:
URL: https://github.com/apache/druid/issues/18439#issuecomment-4237622070

   >  This would be useful in situations where a single partition, or small set 
of partitions, has more data to process than the other partitions. 
   
   Yes, Everyone is going to want Share Groups once they see how much it 
simplifies their stack. It basically turns Kafka into a 'do-it-all' bus, 
handling everything from real-time event streams to RabbitMQ-style work queues 
and Google Pub/Sub-style scaling. It’s the end of needing a different messaging 
platform for every specific use case.
   
   
   1. Elastic Scaling Beyond Partition Limits.
   2. Eliminating "Head-of-Line" Blocking: In standard consumer groups, a 
single malformed or "heavy" message can stall an entire partition's processing. 
Share groups allow other Druid tasks to skip ahead and continue ingesting 
healthy data, keeping your real-time dashboards fresh even when some messages 
are slow to process (in some partitions).
   3. Simplified Operational Management: Current Druid users often have to 
over-provision Kafka partitions just to ensure enough ingestion parallelism. 
Share groups remove this need, letting you tune Kafka for storage efficiency 
and Druid for ingestion throughput independently.
   
   As a start, we must start with at-least-once semantics since all the 
features required while recovery of the druid (in case of any failure) is 
available in Kafka Share group like make records available back after timeout 
and re-deliver. Acknowledge the record once it is persisted to druid node.
   
   Why I am not pushing for exactly once semantics in initial version is 
because of transaction sessions gap in share group for which I am already 
working on : 
   
   - [KIP-1310: General Transaction 
Session](https://cwiki.apache.org/confluence/x/nJY8G) this have details about 
how external coordinators should be talking to Kafka broker for atomic write.
   - [KIP-1289 Support Transactional Acknowledgments for Share 
Groups](https://cwiki.apache.org/confluence/x/J448G) - Untill and unless we 
have support of transactional acknowledgement feature available we will not be 
able to guarantee **no data loss**  because in share group in-flight records 
state is with Kafka broker and there is no API to reset (or seek) partition 
offset read during the recovery cases.
   
   > is KafkaShareIndexTask a user facing task?
   
   let me write the detail design, code changes overview here : 
https://github.com/Shekharrajak/druid/wiki/Queue-Semantics-support-in-Kafka-Ingestion
 and will share with the diagram . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to