Anthony,

Is there a reason you wouldn't want to just push the data into something
built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps
"replay" from that instead of from the kafka brokers?  I can't speak for
Jay, Jun or Neha, but I believe the expected usage of Kafka is essentially
as a buffering mechanism to take the edge off the natural ebb-n-flow of
unpredictable internet traffic.  The highly available, long-term storage of
data is probably not at the top of their list of use cases when making
design decisions.

--Eric


On Thu, Feb 21, 2013 at 6:00 PM, Anthony Grimes <i...@raynes.me> wrote:

> Our use case is that we'd like to log data we don't need away and
> potentially replay it at some point. We don't want to delete old logs. I
> googled around a bit and I only discovered this particular post:
> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**
> users/201210.mbox/%3CCAFbh0Q2=**eJcDT6NvTAPtxhXSk64x0Yms-G-**
> AOqOoy=FtVVM6SQ@mail.gmail.**com%3E<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%3CCAFbh0Q2=eJcDT6NvTAPtxhXSk64x0Yms-G-AOqOoy=ftvvm...@mail.gmail.com%3E>
>
> In summary, it appears the primary issue is that Kafka keeps file handles
> of each log segment open. Is there a way to configure this, or is a way to
> do so planned? It appears that an option to deduplicate instead of delete
> was added recently, so doesn't the file handle issue exist with that as
> well (since files aren't being deleted)?
>

Reply via email to