Harsha, Sriharsha, Suresh, a couple thoughts: - How could this be used to leverage fast key-value stores, e.g. Couchbase, which can serve individual records but maybe not entire segments? Or is the idea to only support writing and fetching entire segments? Would it make sense to support both?
- Instead of defining a new interface and/or mechanism to ETL segment files from brokers to cold storage, can we just leverage Kafka itself? In particular, we can already ETL records to HDFS via Kafka Connect, Gobblin etc -- we really just need a way for brokers to read these records back. I'm wondering whether the new API could be limited to the fetch, and then existing ETL pipelines could be more easily leveraged. For example, if you already have an ETL pipeline from Kafka to HDFS, you could leave that in place and just tell Kafka how to read these records/segments from cold storage when necessary. - I'm wondering if we could just add support for loading segments from remote URIs instead of from file, i.e. via plugins for s3://, hdfs:// etc. I suspect less broker logic would change in that case -- the broker wouldn't necessarily care if it reads from file:// or s3:// to load a given segment. Combining the previous two comments, I can imagine a URI resolution chain for segments. For example, first try file:///logs/{topic}/{segment}.log, then s3://mybucket/{topic}/{date}/{segment}.log, etc, leveraging your existing ETL pipeline(s). Ryanne On Mon, Feb 4, 2019 at 12:01 PM Harsha <ka...@harsha.io> wrote: > Hi All, > We are interested in adding tiered storage to Kafka. More details > about motivation and design are in the KIP. We are working towards an > initial POC. Any feedback or questions on this KIP are welcome. > > Thanks, > Harsha >