Re: Keeping logs forever

2013-02-22 Thread Jay Kreps
Hi Graham, This sounds like it should work fine. LinkedIn keeps the majority of things for 7 days. Performance is linear in data size and we have validated performance up to many TB of data per machine. The registry you describe sounds like it could potentially be useful. You would probably have

Re: Keeping logs forever

2013-02-22 Thread Eric Tschetter
Apologies for asking another question as a newbie without having really tried stuff out, but actually one of our main reasons for wanting to use kafka (not the linkedin use case) is exactly the fact that the "buffer" is not just for buffering. We want to keep data for days to weeks, and be able to

Re: Re: Keeping logs forever

2013-02-21 Thread Anthony Grimes
Sounds good. Thanks for the input, kind sir! Jay Kreps wrote: You can do this and it should work fine. You would have to keep adding machines to get disk capacity, of course, since your data set would only grow. We will keep an open file descriptor per file, but I think that is okay. Just set t

Re: Keeping logs forever

2013-02-21 Thread graham sanderson
Apologies for asking another question as a newbie without having really tried stuff out, but actually one of our main reasons for wanting to use kafka (not the linkedin use case) is exactly the fact that the "buffer" is not just for buffering. We want to keep data for days to weeks, and be able

Re: Keeping logs forever

2013-02-21 Thread Jay Kreps
You can do this and it should work fine. You would have to keep adding machines to get disk capacity, of course, since your data set would only grow. We will keep an open file descriptor per file, but I think that is okay. Just set the segment size to 1GB, then with 10TB of storage that is only 10

Re: Keeping logs forever

2013-02-21 Thread Milind Parikh
Forever is a long time. The definition of replay and navigating through different versions of kafka would be key. Example: If you are storing market data into kafka and have a cep engine running on top and would like replay "transactions" to be fed back to ensure replayability, then you would prob

Re: Keeping logs forever

2013-02-21 Thread Eric Tschetter
Anthony, Is there a reason you wouldn't want to just push the data into something built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps "replay" from that instead of from the kafka brokers? I can't speak for Jay, Jun or Neha, but I believe the expected usage of Kafka is essen

Keeping logs forever

2013-02-21 Thread Anthony Grimes
Our use case is that we'd like to log data we don't need away and potentially replay it at some point. We don't want to delete old logs. I googled around a bit and I only discovered this particular post: http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%3CCAFbh0Q2=eJcDT