"And if you can't consume it all within 6 minutes, partition the topic
until you can run enough consumers such that you can keep up.", this is
what I intend to do for each 6min -topic.

What I really need is a partitioned queue: each 6 minute of data can put
into a separate partition, so that I can read that specific partition at
the end of each 6 minutes. So apparently redis naturally fit this case, but
the only issue is the performance,(well also some trick in ensuring the
reliable message delivery). As I said, we have kakfa infrastructure in
place, if without too much work, i can make the design work with kafka, i
would rather go this path instead of setting up another queue system.

Chen

Chen


On Mon, Aug 11, 2014 at 6:07 PM, Philip O'Toole <
philip.oto...@yahoo.com.invalid> wrote:

> It's still not clear to me why you need to create so many topics.
>
> Write the data to a single topic and consume it when it arrives. It
> doesn't matter if it arrives in bursts, as long as you can process it all
> within 6 minutes, right?
>
> And if you can't consume it all within 6 minutes, partition the topic
> until you can run enough consumers such that you can keep up. The fact that
> you are thinking about so many topics is a sign your design is wrong, or
> Kafka is the wrong solution.
>
> Philip
>
> > On Aug 11, 2014, at 5:18 PM, Chen Wang <chen.apache.s...@gmail.com>
> wrote:
> >
> > Philip,
> > That is right. There is huge amount of data flushed into the topic
> within each 6 minutes. Then at the end of each 6 min, I only want to read
> from that specify topic, and data within that topic has to be processed as
> fast as possible. I was originally using redis queue for this purpose, but
> it takes much longer to process a redis queue than kafka queue(testing data
> is 2M messages). Since we already have kafka infrastructure setup, instead
> of seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of
> kafka, although it does not seem like a common kafka user case.
> >
> > Chen
> >
> >
> >> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole
> <philip.oto...@yahoo.com.invalid> wrote:
> >> I'd love to know more about what you're trying to do here. It sounds
> like you're trying to create topics on a schedule, trying to make it easy
> to locate data for a given time range? I'm not sure it makes sense to use
> Kafka in this manner.
> >>
> >> Can you provide more detail?
> >>
> >>
> >> Philip
> >>
> >>
> >> -----------------------------------------
> >> http://www.philipotoole.com
> >>
> >>
> >> On Monday, August 11, 2014 4:45 PM, Chen Wang <
> chen.apache.s...@gmail.com> wrote:
> >>
> >>
> >>
> >> Todd,
> >> I actually only intend to keep each topic valid for 3 days most. Each of
> >> our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
> Since
> >> there is no api for deleting topic, i guess i could set up a cron job
> >> deleting the out dated topics(folders) from zookeeper..
> >> do you know when the delete topic api will be available in kafka?
> >> Chen
> >>
> >>
> >>
> >> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
> <tpal...@linkedin.com.invalid>
> >> wrote:
> >>
> >> > You need to consider your total partition count as you do this. After
> 30
> >> > days, assuming 1 partition per topic, you have 7200 partitions.
> Depending
> >> > on how many brokers you have, this can start to be a problem. We just
> >> > found an issue on one of our clusters that has over 70k partitions
> that
> >> > there¹s now a problem with doing actions like a preferred replica
> election
> >> > for all topics because the JSON object that gets written to the
> zookeeper
> >> > node to trigger it is too large for Zookeeper¹s default 1 MB data
> size.
> >> >
> >> > You also need to think about the number of open file handles. Even
> with no
> >> > data, there will be open files for each topic.
> >> >
> >> > -Todd
> >> >
> >> >
> >> > On 8/11/14, 2:19 PM, "Chen Wang" <chen.apache.s...@gmail.com> wrote:
> >> >
> >> > >Folks,
> >> > >Is there any potential issue with creating 240 topics every day?
> Although
> >> > >the retention of each topic is set to be 2 days, I am a little
> concerned
> >> > >that since right now there is no delete topic api, the zookeepers
> might be
> >> > >overloaded.
> >> > >Thanks,
> >> > >Chen
> >> >
> >> >
> >
>

Reply via email to