Hi Arvind, Thank you for proposing this KIP.
I am not sure how much experience you have in modifying Kafka's core module, so I don't know if you are aware of how deeply the storage and replication layer are integrated within Kafka. There is no clean API to rip out, this KIP will essentially require a re-write of most of Kafka. Obviously something that has huge risk (and huge amounts of work). For something that has so much risk and so much effort involved, I feel that the justification in the KIP is lacking. For instance, you say: "Distributed data stores can be vastly improved by integrating with Kafka. Some of these improvements are: * They can participate easily in the whole Kafka ecosystem * Data ingesting speeds can be improved" Things that are not clear to me are: 1) Why should the Kafka community rewrite Kafka in order to improve distributed data stores? Shouldn't the community for each data store make the effort to improve their application? Where is the benefit to Kafka users? 2) Can you detail in which ways are distributed data stores unable to participate in Kafka ecosystem now? In which ways do they want to participate? 3) Claiming that speeds can be improved is pretty easy :) Are you talking about ingest to Kafka? or from Kafka to another store? What is the current ingest rate? What is the current bottleneck? Where do you expect the speed improvement to come from? Are you talking about latency or throughput? Once we all agree that there is indeed a problem, we can discuss your proposed solution :) Personally, I feel that Kafka is a distributed data store (with log/queue semantics) and therefore cannot and should not delegate core data store responsibilities to an external system. Kafka users came to expect very strong reliability, consistency and durability guarantees from Kafka and very clear replication semantics and we must be very very careful not to compromise and put those at risk. Especially without very clear benefits to Kafka users. Thanks, Gwen Shapira On Sat, Jun 18, 2016 at 4:46 PM, Arvind Kandhare <sweet.ka...@gmail.com> wrote: > Hi, > Let's use this thread to discuss the above mentioned KIP. > > Here is the motivation for it: > "Distributed data stores can be vastly improved by integrating with Kafka. > Some of these improvements are: > > 1. They can participate easily in the whole Kafka ecosystem > 2. Data ingesting speeds can be improved > > Distributed data stores come with their own replication. Kafka replication > is a duplication of functionality for them.Kafka should defer replication > to underlying file system if the configuration mandates it. > > In the newly added configuration a flush to the filesystem should consider > a signal that the message is replicated." > > Do let me know your views on this. > > > Thanks and regards, > > Arvind