Harsha, Sriharsha, Suresh, a couple thoughts:

- How could this be used to leverage fast key-value stores, e.g. Couchbase,
which can serve individual records but maybe not entire segments? Or is the
idea to only support writing and fetching entire segments? Would it make
sense to support both?

- Instead of defining a new interface and/or mechanism to ETL segment files
from brokers to cold storage, can we just leverage Kafka itself? In
particular, we can already ETL records to HDFS via Kafka Connect, Gobblin
etc -- we really just need a way for brokers to read these records back.
I'm wondering whether the new API could be limited to the fetch, and then
existing ETL pipelines could be more easily leveraged. For example, if you
already have an ETL pipeline from Kafka to HDFS, you could leave that in
place and just tell Kafka how to read these records/segments from cold
storage when necessary.

- I'm wondering if we could just add support for loading segments from
remote URIs instead of from file, i.e. via plugins for s3://, hdfs:// etc.
I suspect less broker logic would change in that case -- the broker
wouldn't necessarily care if it reads from file:// or s3:// to load a given
segment.

Combining the previous two comments, I can imagine a URI resolution chain
for segments. For example, first try file:///logs/{topic}/{segment}.log,
then s3://mybucket/{topic}/{date}/{segment}.log, etc, leveraging your
existing ETL pipeline(s).

Ryanne


On Mon, Feb 4, 2019 at 12:01 PM Harsha <ka...@harsha.io> wrote:

>  Hi All,
>          We are interested in adding tiered storage to Kafka. More details
> about motivation and design are in the KIP.  We are working towards an
> initial POC. Any feedback or questions on this KIP are welcome.
>
> Thanks,
> Harsha
>

Reply via email to