"And if you can't consume it all within 6 minutes, partition the topic until you can run enough consumers such that you can keep up.", this is what I intend to do for each 6min -topic.
What I really need is a partitioned queue: each 6 minute of data can put into a separate partition, so that I can read that specific partition at the end of each 6 minutes. So apparently redis naturally fit this case, but the only issue is the performance,(well also some trick in ensuring the reliable message delivery). As I said, we have kakfa infrastructure in place, if without too much work, i can make the design work with kafka, i would rather go this path instead of setting up another queue system. Chen Chen On Mon, Aug 11, 2014 at 6:07 PM, Philip O'Toole < philip.oto...@yahoo.com.invalid> wrote: > It's still not clear to me why you need to create so many topics. > > Write the data to a single topic and consume it when it arrives. It > doesn't matter if it arrives in bursts, as long as you can process it all > within 6 minutes, right? > > And if you can't consume it all within 6 minutes, partition the topic > until you can run enough consumers such that you can keep up. The fact that > you are thinking about so many topics is a sign your design is wrong, or > Kafka is the wrong solution. > > Philip > > > On Aug 11, 2014, at 5:18 PM, Chen Wang <chen.apache.s...@gmail.com> > wrote: > > > > Philip, > > That is right. There is huge amount of data flushed into the topic > within each 6 minutes. Then at the end of each 6 min, I only want to read > from that specify topic, and data within that topic has to be processed as > fast as possible. I was originally using redis queue for this purpose, but > it takes much longer to process a redis queue than kafka queue(testing data > is 2M messages). Since we already have kafka infrastructure setup, instead > of seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of > kafka, although it does not seem like a common kafka user case. > > > > Chen > > > > > >> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole > <philip.oto...@yahoo.com.invalid> wrote: > >> I'd love to know more about what you're trying to do here. It sounds > like you're trying to create topics on a schedule, trying to make it easy > to locate data for a given time range? I'm not sure it makes sense to use > Kafka in this manner. > >> > >> Can you provide more detail? > >> > >> > >> Philip > >> > >> > >> ----------------------------------------- > >> http://www.philipotoole.com > >> > >> > >> On Monday, August 11, 2014 4:45 PM, Chen Wang < > chen.apache.s...@gmail.com> wrote: > >> > >> > >> > >> Todd, > >> I actually only intend to keep each topic valid for 3 days most. Each of > >> our topic has 3 partitions, so its around 3*240*3 =2160 partitions. > Since > >> there is no api for deleting topic, i guess i could set up a cron job > >> deleting the out dated topics(folders) from zookeeper.. > >> do you know when the delete topic api will be available in kafka? > >> Chen > >> > >> > >> > >> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino > <tpal...@linkedin.com.invalid> > >> wrote: > >> > >> > You need to consider your total partition count as you do this. After > 30 > >> > days, assuming 1 partition per topic, you have 7200 partitions. > Depending > >> > on how many brokers you have, this can start to be a problem. We just > >> > found an issue on one of our clusters that has over 70k partitions > that > >> > there¹s now a problem with doing actions like a preferred replica > election > >> > for all topics because the JSON object that gets written to the > zookeeper > >> > node to trigger it is too large for Zookeeper¹s default 1 MB data > size. > >> > > >> > You also need to think about the number of open file handles. Even > with no > >> > data, there will be open files for each topic. > >> > > >> > -Todd > >> > > >> > > >> > On 8/11/14, 2:19 PM, "Chen Wang" <chen.apache.s...@gmail.com> wrote: > >> > > >> > >Folks, > >> > >Is there any potential issue with creating 240 topics every day? > Although > >> > >the retention of each topic is set to be 2 days, I am a little > concerned > >> > >that since right now there is no delete topic api, the zookeepers > might be > >> > >overloaded. > >> > >Thanks, > >> > >Chen > >> > > >> > > > >