Hi Arvind,

Thank you for proposing this KIP.

I am not sure how much experience you have in modifying Kafka's core
module, so I don't know if you are aware of how deeply the storage and
replication layer are integrated within Kafka. There is no clean API
to rip out, this KIP will essentially require a re-write of most of
Kafka. Obviously something that has huge risk (and huge amounts of
work).

For something that has so much risk and so much effort involved, I
feel that the justification in the KIP is lacking.

For instance, you say:
"Distributed data stores can be vastly improved by integrating with
Kafka. Some of these improvements are:
* They can participate easily in the whole Kafka ecosystem
* Data ingesting speeds can be improved"

Things that are not clear to me are:

1) Why should the Kafka community rewrite Kafka in order to improve
distributed data stores? Shouldn't the community for each data store
make the effort to improve their application? Where is the benefit to
Kafka users?

2) Can you detail in which ways are distributed data stores unable to
participate in Kafka ecosystem now? In which ways do they want to
participate?

3) Claiming that speeds can be improved is pretty easy :) Are you
talking about ingest to Kafka? or from Kafka to another store? What is
the current ingest rate? What is the current bottleneck? Where do you
expect the speed improvement to come from? Are you talking about
latency or throughput?

Once we all agree that there is indeed a problem, we can discuss your
proposed solution :)

Personally, I feel that Kafka is a distributed data store (with
log/queue semantics) and therefore cannot and should not delegate core
data store responsibilities to an external system. Kafka users came to
expect very strong reliability, consistency and durability guarantees
from Kafka and very clear replication semantics and we must be very
very careful not to compromise and put those at risk. Especially
without very clear benefits to Kafka users.

Thanks,

Gwen Shapira




On Sat, Jun 18, 2016 at 4:46 PM, Arvind Kandhare <sweet.ka...@gmail.com> wrote:
> Hi,
> Let's use this thread to discuss the above mentioned KIP.
>
> Here is the motivation for it:
> "Distributed data stores can be vastly improved by integrating with Kafka.
> Some of these improvements are:
>
>    1. They can participate easily in the whole Kafka ecosystem
>    2. Data ingesting speeds can be improved
>
> Distributed data stores come with their own replication. Kafka replication
> is a duplication of functionality for them.Kafka should defer replication
> to underlying file system if the configuration mandates it.
>
> In the newly added configuration a flush to the filesystem should consider
> a signal that the message is replicated."
>
> Do let me know your views on this.
>
>
> Thanks and regards,
>
> Arvind

Reply via email to