[
https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289608#comment-15289608
]
Ben Stopford commented on KAFKA-3726:
-------------------------------------
The standard approach to this sort of problem would be to use Kafka Connector
to move data to HDFS or S3 etc. Would this not suffice?
> Enable cold storage option
> --------------------------
>
> Key: KAFKA-3726
> URL: https://issues.apache.org/jira/browse/KAFKA-3726
> Project: Kafka
> Issue Type: Wish
> Reporter: Radoslaw Gruchalski
> Attachments: kafka-cold-storage.txt
>
>
> This JIRA builds up on the cold storage article I have published on Medium.
> The copy of the article attached here.
> The need for cold storage or an "indefinite" log seems to be quite often
> discussed on the user mailing list.
> The cold storage idea would enable the opportunity for the operator to keep
> the raw Kafka offset files in a third party storage and allow retrieving the
> data back for re-consumption.
> The two possible options for enabling such functionality are, from the
> article:
> First approach: if Kafka provided a notification mechanism and could trigger
> a program when a segment file is to be discarded, it would become feasible to
> provide a standard method of moving data to cold storage in reaction to those
> events. Once the program finishes backing the segments up, it could tell
> Kafka “it is now safe to delete these segments”.
> The second option is to provide an additional value for the
> log.cleanup.policy setting, call it cold-storage. In case of this value,
> Kafka would move the segment files — which otherwise would be deleted — to
> another destination on the server. They can be picked up from there and moved
> to the cold storage.
> Both have their limitations. The former one is simply a mechanism exposed to
> allow operator building up the tooling necessary to enable this. Events could
> be published in a manner similar to Mesos Event Bus
> (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself
> could provide a control topic on which such info would be published. The
> outcome is, the operator can subscribe to the event bus and get notified
> about, at least, two events:
> - log segment is complete and can be backed up
> - partition leader changed
> These two, together with an option to keep the log segment safe from
> compaction for a certain amount of time, would be sufficient to reliably
> implement cold storage.
> The latter option, {{log.cleanup.policy}} setting would be more complete
> feature but it is also much more difficult to implement. All brokers would
> have keep the backup of the data in the cold storage significantly increasing
> the size requirements, also, the de-duplication of the data for the
> replicated data would be left completely to the operator.
> In any case, the thing to stay away from is having Kafka to deal with the
> physical aspect of moving the data to and back from the cold storage. This is
> not Kafka's task. The intent is to provide a method for reliable cold storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)