Anthony, Is there a reason you wouldn't want to just push the data into something built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps "replay" from that instead of from the kafka brokers? I can't speak for Jay, Jun or Neha, but I believe the expected usage of Kafka is essentially as a buffering mechanism to take the edge off the natural ebb-n-flow of unpredictable internet traffic. The highly available, long-term storage of data is probably not at the top of their list of use cases when making design decisions.
--Eric On Thu, Feb 21, 2013 at 6:00 PM, Anthony Grimes <i...@raynes.me> wrote: > Our use case is that we'd like to log data we don't need away and > potentially replay it at some point. We don't want to delete old logs. I > googled around a bit and I only discovered this particular post: > http://mail-archives.apache.**org/mod_mbox/incubator-kafka-** > users/201210.mbox/%3CCAFbh0Q2=**eJcDT6NvTAPtxhXSk64x0Yms-G-** > AOqOoy=FtVVM6SQ@mail.gmail.**com%3E<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%3CCAFbh0Q2=eJcDT6NvTAPtxhXSk64x0Yms-G-AOqOoy=ftvvm...@mail.gmail.com%3E> > > In summary, it appears the primary issue is that Kafka keeps file handles > of each log segment open. Is there a way to configure this, or is a way to > do so planned? It appears that an option to deduplicate instead of delete > was added recently, so doesn't the file handle issue exist with that as > well (since files aren't being deleted)? >