Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Harsha Mon, 01 Apr 2019 11:49:13 -0700

Hi Eno,

      Thanks for the comments. Answers are inline


"Performance & durability
----------------------------------
- would be good to have more discussion on performance implications of
tiering. Copying the data from the local storage to the remote storage is
going to be expensive in terms of network bandwidth and will affect
foreground traffic to Kafka potentially reducing its throughput and
latency."

Good point. We've run our local tests with 10GigE cards, even though our 
clients bandwidth requirements are high with 1000s of clients producing / 
consuming data we never hit hit our limits on network bandwidth. More often we 
hit limits of CPU, Mem limits than the network bandwidth. But this is something 
to be taken care of by the operator if they want to enable tiered storage.
Also as mentioned in the KIP/previous threads ,clients requesting older data is 
very rare and often used as insurance policy . What proposed here does increase 
bandwidth interms of shipping logsegments to remote but access patterns 
determines how much we end up reading from remote tier. 


"- throttling the copying of the data above might be a solution, however, if
you have a few TB of data to move to the slower remote tier the risk is
that the movement will never complete on time under high Kafka load. Do we
need a scheduler to use idle time to do the copying?"

In our design, we are going to have scheduler in RLM which will periodically 
copy in-active(rolled-over) log segments. 
Not sure idle time is easy to calculate and schedule a copy. More over we want 
to copy the segments as soon as they are available.
Throttling something we can take into account and provide options to tune it. 


"- Have you considered having two options: 1) a slow tier only (e.g., all
the data on HDFS) and 2) a fast tier only like Kafka today. This would
avoid copying data between the tiers. Customers that can tolerate a slower
tier with a better price/GB can just choose option (1). Would be good to
put in Alternatives considered."

 What we want to have is Kafka that is known to the users today with local fast 
disk access and fast data serving layer.  Tiered Storage option might not be 
for everyone and most users who are happy with Kafka today shouldn't see 
changes to their operation because of this KIP.

Fundamentally, we believe remote tiered storage data accessed very 
infrequently. We expect anyone going to read from remote tiered storage expects 
a slower read response (mostly backfills). 

Making an explicit change like slow/fast tier will only cause more confusion 
and operation complexity that will bring into play. With tiered storage , only 
users who want to use cheaper long-term storage can enable it and others can 
operate the Kafka as its today.  It will give a good balance of serving latest 
reads from local disk almost all the time and shipping older data and reading 
from remote tier when clients needs the older data. If necessary, we can 
re-visit slow/fast-tier options at a later point.


"Topic configs
------------------
- related to performance but also availability, we need to discuss the
replication mode for the remote tier. For example, if the Kafka topics used
to have 3-way replication, will they continue to have 3-way replication on
the remote tier? Will the user configure that replication? In S3 for
example, one can choose from different S3 tiers like STD or SIA, but there
is no direct control over the replication factor like in Kafka."

No. Remote tier is expected to be reliable storage with its own replication 
mechanisms.


" how will security and ACLs be configured for the remote tier. E.g., if
user A does not have access to a Kafka topic, when that topic is moved to
S3 or HDFS there needs to be a way to prevent access to the S3 bucket for
that user. This might be outside the scope of this KIP but would be good to
discuss first."

As mentioned in the KIP "Alternatives" section  We will keep the Kafka as the 
owner of those files in S3 or HDFS and take advantage of HDFS security model 
(file system permissions). So any user who wants to go directly and access 
files from HDFS will not be able to read them and any client requests will go 
through Kafka and its ACLs will apply like it does for any other request.



Hi Ron,
         Thanks for the comments. 

" I'm excited about this potential feature.  Did you consider
storing the information about the remote segments in a Kafka topic as
opposed to in the remote storage itself?  The topic would need infinite
retention (or it would need to be compacted) so as not to itself be sent to
cold storage, but assuming that topic would fit on local disk for all time
(an open question as to whether this is acceptable or not) it feels like
the most natural way to communicate information among brokers -- more
natural than having them poll the remote storage systems, at least."

With RemoteIndex we are extending the current index mechanism to find a offset 
and its message to find a file in remote storage for a givent offset. This will 
be optimal way finding for a given offset which remote segment might be serving 
compare to storing all of this data into internal topic.

"To add to Eric's question/confusion about where logic lives (RLM vs. RSM),
I think it would be helpful to explicitly identify in the KIP that the RLM
delegates to the RSM since the RSM is part of the public API and is the
pluggable piece.  For example, instead of saying "RLM will ship the log
segment files that are older than a configurable time to remote storage" I
think it would be better to say "RLM identifies log segment files that are
older than a configurable time and delegates to the configured RSM to ship
them to remote storage" (or something like that -- just make it clear that
the RLM is delegating to the configured RSM)."

Thanks. I agree with you. I'll update the KIP.


Hi Ambud,

Thanks for the comments.


"1. Wouldn't implicit checking for old offsets in remote location if not
found locally on the leader i.e. do we really need remote index files?
Since the storage path for a given topic would presumably be constant
across all the brokers, the remote topic-partition path could simply be
checked to see if there are any segment file names that would meet the
offset requirements for a Consumer Fetch Request. RSM implementations could
optionally cache this information."

By storing the remote index files locally , it will be faster for us to 
determine for a requested offset which file might contain the data. This will 
help us resolve the remote file quickly and return the response. Instead of 
making a call to remote tier for index look up. Given index files are smaller , 
it won't be much hit to the storage space.


"2. Would it make sense to create an internal compacted Kafka topic to
publish & record remote segment information? This would enable the
followers to get updates about new segments rather than running list()
operations on remote storage to detect new segments which may be expensive."


I think Ron also alluding to this. We thought shipping remote index files to 
remote storage files and let the follower's RLM picking that up makes it easy 
to have the current replication protocol without any changes.  So we don't 
determine if a follower is in ISR or not based on another topic's replication.  
We will run small tests and determine if use of topic is better for this. 
Thanks for the suggestion.

3. For RLM to scan local segment rotations are you thinking of leveraging
java.nio.file.WatchService or simply running listFiles() on a periodic
basis? Since WatchService implementation is heavily OS dependent it might
create some complications around missing FS Events.

Ideally we want to introduce file events like you suggested. For POC work we 
are using just listFiles(). Also copying these files to remote can be slower 
and we will not delete the files from local disk until the segment is copied 
and any requests to the data in these files will be served from local disk. So 
I don't think we need to be aggressive and optimize the this copy segment to 
remote path. 



Hi Viktor,
         Thanks for the comments.

"I have a rather technical question to this. How do you plan to package this
extension? Does this mean that Kafka will depend on HDFS?
I think it'd be nice to somehow separate this off to a different package in
the project so that it could be built and released separately from the main
Kafka packages."

We would like all of this code to be part of Apache Kafka . In early days of 
Kafka, there is external module which used to contain kafka to hdfs copy tools 
and dependencies.  We would like to have RLM (class implementation) and 
RSM(interface) to be in core and as you suggested, implementation of RSM could 
be in another package so that the dependencies of RSM won't come into Kafka's 
classpath unless someone explicity configures them. 


Thanks,
Harsha








On Mon, Apr 1, 2019, at 1:02 AM, Viktor Somogyi-Vass wrote:
> Hey Harsha,
> 
> I have a rather technical question to this. How do you plan to package this
> extension? Does this mean that Kafka will depend on HDFS?
> I think it'd be nice to somehow separate this off to a different package in
> the project so that it could be built and released separately from the main
> Kafka packages.
> This decoupling would be useful when direct dependency on HDFS (or other
> implementations) is not needed and would also encourage decoupling for
> other storage implementations.
> 
> Best,
> Viktor
> 
> On Mon, Apr 1, 2019 at 3:44 AM Ambud Sharma <asharma52...@gmail.com> wrote:
> 
> > Hi Harsha,
> >
> > Thank you for proposing this KIP. We are looking forward to this feature as
> > well.
> >
> > A few questions around the design & implementation:
> >
> > 1. Wouldn't implicit checking for old offsets in remote location if not
> > found locally on the leader i.e. do we really need remote index files?
> > Since the storage path for a given topic would presumably be constant
> > across all the brokers, the remote topic-partition path could simply be
> > checked to see if there are any segment file names that would meet the
> > offset requirements for a Consumer Fetch Request. RSM implementations could
> > optionally cache this information.
> >
> > 2. Would it make sense to create an internal compacted Kafka topic to
> > publish & record remote segment information? This would enable the
> > followers to get updates about new segments rather than running list()
> > operations on remote storage to detect new segments which may be expensive.
> >
> > 3. For RLM to scan local segment rotations are you thinking of leveraging
> > java.nio.file.WatchService or simply running listFiles() on a periodic
> > basis? Since WatchService implementation is heavily OS dependent it might
> > create some complications around missing FS Events.
> >
> > Thanks.
> > Ambud
> >
> > On Thu, Mar 28, 2019 at 8:04 AM Ron Dagostino <rndg...@gmail.com> wrote:
> >
> > > Hi Harsha.  I'm excited about this potential feature.  Did you consider
> > > storing the information about the remote segments in a Kafka topic as
> > > opposed to in the remote storage itself?  The topic would need infinite
> > > retention (or it would need to be compacted) so as not to itself be sent
> > to
> > > cold storage, but assuming that topic would fit on local disk for all
> > time
> > > (an open question as to whether this is acceptable or not) it feels like
> > > the most natural way to communicate information among brokers -- more
> > > natural than having them poll the remote storage systems, at least.
> > >
> > > To add to Eric's question/confusion about where logic lives (RLM vs.
> > RSM),
> > > I think it would be helpful to explicitly identify in the KIP that the
> > RLM
> > > delegates to the RSM since the RSM is part of the public API and is the
> > > pluggable piece.  For example, instead of saying "RLM will ship the log
> > > segment files that are older than a configurable time to remote storage"
> > I
> > > think it would be better to say "RLM identifies log segment files that
> > are
> > > older than a configurable time and delegates to the configured RSM to
> > ship
> > > them to remote storage" (or something like that -- just make it clear
> > that
> > > the RLM is delegating to the configured RSM).
> > >
> > > Ron
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Mar 28, 2019 at 6:12 AM Eno Thereska <eno.there...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Harsha,
> > > >
> > > > A couple of comments:
> > > >
> > > > Performance & durability
> > > > ----------------------------------
> > > > - would be good to have more discussion on performance implications of
> > > > tiering. Copying the data from the local storage to the remote storage
> > is
> > > > going to be expensive in terms of network bandwidth and will affect
> > > > foreground traffic to Kafka potentially reducing its throughput and
> > > > latency.
> > > > - throttling the copying of the data above might be a solution, however
> > > if
> > > > you have a few TB of data to move to the slower remote tier the risk is
> > > > that the movement will never complete on time under high Kafka load. Do
> > > we
> > > > need a scheduler to use idle time to do the copying?
> > > > - Have you considered having two options: 1) a slow tier only (e.g.,
> > all
> > > > the data on HDFS) and 2) a fast tier only like Kafka today. This would
> > > > avoid copying data between the tiers. Customers that can tolerate a
> > > slower
> > > > tier with a better price/GB can just choose option (1). Would be good
> > to
> > > > put in Alternatives considered.
> > > >
> > > > Topic configs
> > > > ------------------
> > > > - related to performance but also availability, we need to discuss the
> > > > replication mode for the remote tier. For example, if the Kafka topics
> > > used
> > > > to have 3-way replication, will they continue to have 3-way replication
> > > on
> > > > the remote tier? Will the user configure that replication? In S3 for
> > > > example, one can choose from different S3 tiers like STD or SIA, but
> > > there
> > > > is no direct control over the replication factor like in Kafka.
> > > > - how will security and ACLs be configured for the remote tier. E.g.,
> > if
> > > > user A does not have access to a Kafka topic, when that topic is moved
> > to
> > > > S3 or HDFS there needs to be a way to prevent access to the S3 bucket
> > for
> > > > that user. This might be outside the scope of this KIP but would be
> > good
> > > to
> > > > discuss first.
> > > >
> > > > That's it for now, thanks
> > > > Eno
> > > >
> > > >
> > > > On Wed, Mar 27, 2019 at 4:40 PM Harsha <ka...@harsha.io> wrote:
> > > >
> > > > > Hi All,
> > > > >            Thanks for your initial feedback. We updated the KIP.
> > Please
> > > > > take a look and let us know if you have any questions.
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Wed, Feb 6, 2019, at 10:30 AM, Harsha wrote:
> > > > > > Thanks Eno, Adam & Satish for you review and questions. I'll
> > address
> > > > > > these in KIP and update the thread here.
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Wed, Feb 6, 2019, at 7:09 AM, Satish Duggana wrote:
> > > > > > > Thanks, Harsha for the KIP. It is a good start for tiered storage
> > > in
> > > > > > > Kafka. I have a few comments/questions.
> > > > > > >
> > > > > > > It may be good to have a configuration to keep the number of
> > local
> > > > > > > segments instead of keeping only the active segment. This config
> > > can
> > > > > > > be exposed at cluster and topic levels with default value as 1.
> > In
> > > > > > > some use cases, few consumers may lag over one segment, it will
> > be
> > > > > > > better to serve from local storage instead of remote storage.
> > > > > > >
> > > > > > > It may be better to keep “remote.log.storage.enable” and
> > respective
> > > > > > > configuration at topic level along with cluster level. It will be
> > > > > > > helpful in environments where few topics are configured with
> > > > > > > local-storage and other topics are configured with remote
> > storage.
> > > > > > >
> > > > > > > Each topic-partition leader pushes its log segments with
> > respective
> > > > > > > index files to remote whenever active log rolls over, it updates
> > > the
> > > > > > > remote log index file for the respective remote log segment. The
> > > > > > > second option is to add offset index files also for each segment.
> > > It
> > > > > > > can serve consumer fetch requests for old segments from local log
> > > > > > > segment instead of serving directly from the remote log which may
> > > > > > > cause high latencies. There can be different strategies in when
> > the
> > > > > > > remote segment is copied to a local segment.
> > > > > > >
> > > > > > > What is “remote.log.manager.scheduler.interval.ms” config about?
> > > > > > >
> > > > > > > How do followers sync RemoteLogSegmentIndex files? Do they
> > request
> > > > > > > from leader replica? This looks to be important as the failed
> > over
> > > > > > > leader should have RemoteLogSegmentIndex updated and ready to
> > avoid
> > > > > > > high latencies in serving old data stored in remote logs.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Satish.
> > > > > > >
> > > > > > > On Tue, Feb 5, 2019 at 10:42 PM Ryanne Dolan <
> > > ryannedo...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Harsha, makes sense.
> > > > > > > >
> > > > > > > > Ryanne
> > > > > > > >
> > > > > > > > On Mon, Feb 4, 2019 at 5:53 PM Harsha Chintalapani <
> > > > ka...@harsha.io>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > "I think you are saying that this enables additional
> > > (potentially
> > > > > cheaper)
> > > > > > > > > storage options without *requiring* an existing ETL
> > pipeline. “
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > " But it's not really a replacement for the sort of pipelines
> > > > > people build
> > > > > > > > > with Connect, Gobblin etc.”
> > > > > > > > >
> > > > > > > > > It is not. But also making an assumption that everyone runs
> > > these
> > > > > > > > > pipelines for storing raw Kafka data into HDFS or S3 is also
> > > > wrong
> > > > > > > > >  assumption.
> > > > > > > > > The aim of this KIP is to provide tiered storage as whole
> > > package
> > > > > not
> > > > > > > > > asking users to ship the data on their own using existing
> > ETL,
> > > > > which means
> > > > > > > > > running a consumer and maintaining those pipelines.
> > > > > > > > >
> > > > > > > > > " My point was that, if you are already offloading records in
> > > an
> > > > > ETL
> > > > > > > > > pipeline, why do you need a new pipeline built into the
> > broker
> > > to
> > > > > ship the
> > > > > > > > > same data to the same place?”
> > > > > > > > >
> > > > > > > > > As you said its ETL pipeline, which means users of these
> > > > pipelines
> > > > > are
> > > > > > > > > reading the data from broker and transforming its state and
> > > > > storing it
> > > > > > > > > somewhere.
> > > > > > > > > The point of this KIP is store log segments as it is without
> > > > > changing
> > > > > > > > > their structure so that we can use the existing offset
> > > mechanisms
> > > > > to look
> > > > > > > > > it up when the consumer needs to read old data. When you do
> > > load
> > > > > it via
> > > > > > > > > your existing pipelines you are reading the topic as a whole
> > ,
> > > > > which
> > > > > > > > > doesn’t guarantee that you’ll produce this data back into
> > HDFS
> > > in
> > > > > S3 in the
> > > > > > > > > same order and who is going to generate the Index files
> > again.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > "So you'd end up with one of 1)cold segments are only useful
> > to
> > > > > Kafka; 2)
> > > > > > > > > you have the same data written to HDFS/etc twice, once for
> > > Kafka
> > > > > and once
> > > > > > > > > for everything else, in two separate formats”
> > > > > > > > >
> > > > > > > > > You are talking two different use cases. If someone is
> > storing
> > > > raw
> > > > > data
> > > > > > > > > out of Kafka for long term access.
> > > > > > > > > By storing the data as it is in HDFS though Kafka will solve
> > > this
> > > > > issue.
> > > > > > > > > They do not need to run another pipe-line to ship these logs.
> > > > > > > > >
> > > > > > > > > If they are running pipelines to store in HDFS in a different
> > > > > format,
> > > > > > > > > thats a different use case. May be they are transforming
> > Kafka
> > > > > logs to ORC
> > > > > > > > > so that they can query through Hive.  Once you transform the
> > > log
> > > > > segment it
> > > > > > > > > does loose its ability to use the existing offset index.
> > > > > > > > > Main objective here not to change the existing protocol and
> > > still
> > > > > be able
> > > > > > > > > to write and read logs from remote storage.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -Harsha
> > > > > > > > >
> > > > > > > > > On Feb 4, 2019, 2:53 PM -0800, Ryanne Dolan <
> > > > ryannedo...@gmail.com
> > > > > >,
> > > > > > > > > wrote:
> > > > > > > > > > Thanks Harsha, makes sense for the most part.
> > > > > > > > > >
> > > > > > > > > > > tiered storage is to get away from this and make this
> > > > > transparent to
> > > > > > > > > the
> > > > > > > > > > user
> > > > > > > > > >
> > > > > > > > > > I think you are saying that this enables additional
> > > > (potentially
> > > > > cheaper)
> > > > > > > > > > storage options without *requiring* an existing ETL
> > pipeline.
> > > > > But it's
> > > > > > > > > not
> > > > > > > > > > really a replacement for the sort of pipelines people build
> > > > with
> > > > > Connect,
> > > > > > > > > > Gobblin etc. My point was that, if you are already
> > offloading
> > > > > records in
> > > > > > > > > an
> > > > > > > > > > ETL pipeline, why do you need a new pipeline built into the
> > > > > broker to
> > > > > > > > > ship
> > > > > > > > > > the same data to the same place? I think in most cases this
> > > > will
> > > > > be an
> > > > > > > > > > additional pipeline, not a replacement, because the
> > segments
> > > > > written to
> > > > > > > > > > cold storage won't be useful outside Kafka. So you'd end up
> > > > with
> > > > > one of
> > > > > > > > > 1)
> > > > > > > > > > cold segments are only useful to Kafka; 2) you have the
> > same
> > > > > data written
> > > > > > > > > > to HDFS/etc twice, once for Kafka and once for everything
> > > else,
> > > > > in two
> > > > > > > > > > separate formats; 3) you use your existing ETL pipeline and
> > > > read
> > > > > cold
> > > > > > > > > data
> > > > > > > > > > directly.
> > > > > > > > > >
> > > > > > > > > > To me, an ideal solution would let me spool segments from
> > > Kafka
> > > > > to any
> > > > > > > > > sink
> > > > > > > > > > I would like, and then let Kafka clients seamlessly access
> > > that
> > > > > cold
> > > > > > > > > data.
> > > > > > > > > > Today I can do that in the client, but ideally the broker
> > > would
> > > > > do it for
> > > > > > > > > > me via some HDFS/Hive/S3 plugin. The KIP seems to
> > accomplish
> > > > > that -- just
> > > > > > > > > > without leveraging anything I've currently got in place.
> > > > > > > > > >
> > > > > > > > > > Ryanne
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 4, 2019 at 3:34 PM Harsha <ka...@harsha.io>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Eric,
> > > > > > > > > > > Thanks for your questions. Answers are in-line
> > > > > > > > > > >
> > > > > > > > > > > "The high-level design seems to indicate that all of the
> > > > logic
> > > > > for
> > > > > > > > > when and
> > > > > > > > > > > how to copy log segments to remote storage lives in the
> > RLM
> > > > > class. The
> > > > > > > > > > > default implementation is then HDFS specific with
> > > additional
> > > > > > > > > > > implementations being left to the community. This seems
> > > like
> > > > > it would
> > > > > > > > > > > require anyone implementing a new RLM to also
> > re-implement
> > > > the
> > > > > logic
> > > > > > > > > for
> > > > > > > > > > > when to ship data to remote storage."
> > > > > > > > > > >
> > > > > > > > > > > RLM will be responsible for shipping log segments and it
> > > will
> > > > > decide
> > > > > > > > > when
> > > > > > > > > > > a log segment is ready to be shipped over.
> > > > > > > > > > > Once a Log Segement(s) are identified as rolled over, RLM
> > > > will
> > > > > delegate
> > > > > > > > > > > this responsibility to a pluggable remote storage
> > > > > implementation.
> > > > > > > > > Users who
> > > > > > > > > > > are looking add their own implementation to enable other
> > > > > storages all
> > > > > > > > > they
> > > > > > > > > > > need to do is to implement the copy and read mechanisms
> > and
> > > > > not to
> > > > > > > > > > > re-implement RLM itself.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > "Would it not be better for the Remote Log Manager
> > > > > implementation to be
> > > > > > > > > > > non-configurable, and instead have an interface for the
> > > > remote
> > > > > storage
> > > > > > > > > > > layer? That way the "when" of the logic is consistent
> > > across
> > > > > all
> > > > > > > > > > > implementations and it's only a matter of "how," similar
> > to
> > > > > how the
> > > > > > > > > Streams
> > > > > > > > > > > StateStores are managed."
> > > > > > > > > > >
> > > > > > > > > > > It's possible that we can RLM non-configurable. But for
> > the
> > > > > initial
> > > > > > > > > > > release and to keep the backward compatibility
> > > > > > > > > > > we want to make this configurable and for any users who
> > > might
> > > > > not be
> > > > > > > > > > > interested in having the LogSegments shipped to remote,
> > > they
> > > > > don't
> > > > > > > > > need to
> > > > > > > > > > > worry about this.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Ryanne,
> > > > > > > > > > > Thanks for your questions.
> > > > > > > > > > >
> > > > > > > > > > > "How could this be used to leverage fast key-value
> > stores,
> > > > e.g.
> > > > > > > > > Couchbase,
> > > > > > > > > > > which can serve individual records but maybe not entire
> > > > > segments? Or
> > > > > > > > > is the
> > > > > > > > > > > idea to only support writing and fetching entire
> > segments?
> > > > > Would it
> > > > > > > > > make
> > > > > > > > > > > sense to support both?"
> > > > > > > > > > >
> > > > > > > > > > > LogSegment once its rolled over are immutable objects and
> > > we
> > > > > want to
> > > > > > > > > keep
> > > > > > > > > > > the current structure of LogSegments and corresponding
> > > Index
> > > > > files. It
> > > > > > > > > will
> > > > > > > > > > > be easy to copy the whole segment as it is, instead of
> > > > > re-reading each
> > > > > > > > > file
> > > > > > > > > > > and use a key/value store.
> > > > > > > > > > >
> > > > > > > > > > > "
> > > > > > > > > > > - Instead of defining a new interface and/or mechanism to
> > > ETL
> > > > > segment
> > > > > > > > > files
> > > > > > > > > > > from brokers to cold storage, can we just leverage Kafka
> > > > > itself? In
> > > > > > > > > > > particular, we can already ETL records to HDFS via Kafka
> > > > > Connect,
> > > > > > > > > Gobblin
> > > > > > > > > > > etc -- we really just need a way for brokers to read
> > these
> > > > > records
> > > > > > > > > back.
> > > > > > > > > > > I'm wondering whether the new API could be limited to the
> > > > > fetch, and
> > > > > > > > > then
> > > > > > > > > > > existing ETL pipelines could be more easily leveraged.
> > For
> > > > > example, if
> > > > > > > > > you
> > > > > > > > > > > already have an ETL pipeline from Kafka to HDFS, you
> > could
> > > > > leave that
> > > > > > > > > in
> > > > > > > > > > > place and just tell Kafka how to read these
> > > records/segments
> > > > > from cold
> > > > > > > > > > > storage when necessary."
> > > > > > > > > > >
> > > > > > > > > > > This is pretty much what everyone does and it has the
> > > > > additional
> > > > > > > > > overhead
> > > > > > > > > > > of keeping these pipelines operating and monitoring.
> > > > > > > > > > > What's proposed in the KIP is not ETL. It's just looking
> > a
> > > > the
> > > > > logs
> > > > > > > > > that
> > > > > > > > > > > are written and rolled over to copy the file as it is.
> > > > > > > > > > > Each new topic needs to be added (sure we can do so via
> > > > > wildcard or
> > > > > > > > > > > another mechanism) but new topics need to be onboard to
> > > ship
> > > > > the data
> > > > > > > > > into
> > > > > > > > > > > remote storage through a traditional ETL pipeline.
> > > > > > > > > > > Once the data lands somewhere like HDFS/HIVE etc.. Users
> > > need
> > > > > to write
> > > > > > > > > > > another processing line to re-process this data similar
> > to
> > > > how
> > > > > they are
> > > > > > > > > > > doing it in their Stream processing pipelines. Tiered
> > > storage
> > > > > is to get
> > > > > > > > > > > away from this and make this transparent to the user.
> > They
> > > > > don't need
> > > > > > > > > to
> > > > > > > > > > > run another ETL process to ship the logs.
> > > > > > > > > > >
> > > > > > > > > > > "I'm wondering if we could just add support for loading
> > > > > segments from
> > > > > > > > > > > remote URIs instead of from file, i.e. via plugins for
> > > s3://,
> > > > > hdfs://
> > > > > > > > > etc.
> > > > > > > > > > > I suspect less broker logic would change in that case --
> > > the
> > > > > broker
> > > > > > > > > > > wouldn't necessarily care if it reads from file:// or
> > s3://
> > > > to
> > > > > load a
> > > > > > > > > given
> > > > > > > > > > > segment."
> > > > > > > > > > >
> > > > > > > > > > > Yes, this is what we are discussing in KIP. We are
> > leaving
> > > > the
> > > > > details
> > > > > > > > > of
> > > > > > > > > > > loading segments to RLM read part instead of directly
> > > > exposing
> > > > > this in
> > > > > > > > > the
> > > > > > > > > > > Broker. This way we can keep the current Kafka code as it
> > > is
> > > > > without
> > > > > > > > > > > changing the assumptions around the local disk. Let the
> > RLM
> > > > > handle the
> > > > > > > > > > > remote storage part.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Harsha
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 4, 2019, at 12:54 PM, Ryanne Dolan wrote:
> > > > > > > > > > > > Harsha, Sriharsha, Suresh, a couple thoughts:
> > > > > > > > > > > >
> > > > > > > > > > > > - How could this be used to leverage fast key-value
> > > stores,
> > > > > e.g.
> > > > > > > > > > > Couchbase,
> > > > > > > > > > > > which can serve individual records but maybe not entire
> > > > > segments? Or
> > > > > > > > > is
> > > > > > > > > > > the
> > > > > > > > > > > > idea to only support writing and fetching entire
> > > segments?
> > > > > Would it
> > > > > > > > > make
> > > > > > > > > > > > sense to support both?
> > > > > > > > > > > >
> > > > > > > > > > > > - Instead of defining a new interface and/or mechanism
> > to
> > > > > ETL segment
> > > > > > > > > > > files
> > > > > > > > > > > > from brokers to cold storage, can we just leverage
> > Kafka
> > > > > itself? In
> > > > > > > > > > > > particular, we can already ETL records to HDFS via
> > Kafka
> > > > > Connect,
> > > > > > > > > Gobblin
> > > > > > > > > > > > etc -- we really just need a way for brokers to read
> > > these
> > > > > records
> > > > > > > > > back.
> > > > > > > > > > > > I'm wondering whether the new API could be limited to
> > the
> > > > > fetch, and
> > > > > > > > > then
> > > > > > > > > > > > existing ETL pipelines could be more easily leveraged.
> > > For
> > > > > example,
> > > > > > > > > if
> > > > > > > > > > > you
> > > > > > > > > > > > already have an ETL pipeline from Kafka to HDFS, you
> > > could
> > > > > leave
> > > > > > > > > that in
> > > > > > > > > > > > place and just tell Kafka how to read these
> > > > records/segments
> > > > > from
> > > > > > > > > cold
> > > > > > > > > > > > storage when necessary.
> > > > > > > > > > > >
> > > > > > > > > > > > - I'm wondering if we could just add support for
> > loading
> > > > > segments
> > > > > > > > > from
> > > > > > > > > > > > remote URIs instead of from file, i.e. via plugins for
> > > > > s3://, hdfs://
> > > > > > > > > > > etc.
> > > > > > > > > > > > I suspect less broker logic would change in that case
> > --
> > > > the
> > > > > broker
> > > > > > > > > > > > wouldn't necessarily care if it reads from file:// or
> > > s3://
> > > > > to load a
> > > > > > > > > > > given
> > > > > > > > > > > > segment.
> > > > > > > > > > > >
> > > > > > > > > > > > Combining the previous two comments, I can imagine a
> > URI
> > > > > resolution
> > > > > > > > > chain
> > > > > > > > > > > > for segments. For example, first try
> > > > > > > > > file:///logs/{topic}/{segment}.log,
> > > > > > > > > > > > then s3://mybucket/{topic}/{date}/{segment}.log, etc,
> > > > > leveraging your
> > > > > > > > > > > > existing ETL pipeline(s).
> > > > > > > > > > > >
> > > > > > > > > > > > Ryanne
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 4, 2019 at 12:01 PM Harsha <
> > ka...@harsha.io>
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > We are interested in adding tiered storage to Kafka.
> > > More
> > > > > > > > > > > details
> > > > > > > > > > > > > about motivation and design are in the KIP. We are
> > > > working
> > > > > towards
> > > > > > > > > an
> > > > > > > > > > > > > initial POC. Any feedback or questions on this KIP
> > are
> > > > > welcome.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Harsha
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Reply via email to