Todd -- can you share details of the ZK cluster you are running, to support
this scale? Is it one single Kafka cluster? Are you using 1 single ZK cluster?
Thanks,
Philip
-
http://www.philipotoole.com
On Monday, August 11, 2014 9:32 PM, Todd Palino
Got it. thanks for the input Todd!
Chen
On Mon, Aug 11, 2014 at 9:31 PM, Todd Palino
wrote:
> As I noted, we have a cluster right now with 70k partitions. It’s running
> on over 30 brokers, partly to cover the number of partitions and and
> partly to cover the amount of data that we push throug
As I noted, we have a cluster right now with 70k partitions. It’s running
on over 30 brokers, partly to cover the number of partitions and and
partly to cover the amount of data that we push through it. If you can
have at least 4 or 5 brokers, I wouldn’t anticipate any problems with the
number of p
Todd,
Yes I actually thought about that. My concern is that even a weeks topic
partition(240*7*3 = 5040) is too many. Does linkedin have a good experience
in using this many topics in your system?:-)
Thanks,
Chen
On Mon, Aug 11, 2014 at 9:02 PM, Todd Palino
wrote:
> In order to delete topics, y
In order to delete topics, you need to shut down the entire cluster (all
brokers), delete the topics from Zookeeper, and delete the log files and
partition directory from the disk on the brokers. Then you can restart the
cluster. Assuming that you can take a periodic outage on your cluster, you
can
Unfortunately, this would not work in our system. It means almost every
several minutes i will need to scan the entire queue, which is not possible
in our case. In fact, our old system is designed in this way: store the
data in hbase, and with hourly mapreduce to scan entire table figure out
which
Vipul,
The problem is that the producer does not know when it should set the
window start and window end boundary.. The data does not arrive in order. I
also think its difficult to get the offset of the boundary, and only pull
messages between those boundaries: i am already trying to avoid use the
Ok, now that is good detail. I understand your issue.
It's somewhat difficult as to use Kafka in your situation, as Kafka is a FIFO
queue, but you are trying to use it with data that is not tightly ordered in
that manner.
I don't have any definite solutions, but perhaps this might work.
Assu
Your use case requires messages to pushed out when time comes instead of
the order in which they arrived, while kafka may not be best for this as
within the Q you want some message batch to be sent out early and some
later. There could be another way to solve this with offset management as
kafka is
Those data has a timestamp: its actually email campaigns with scheduled
send time. But since they can be scheduled ahead(e.g, two days ahead), I
cannot read it when it arrives. It has to wait until its actual scheduled
send time. As you can tell, the sequence within the 6 min does not matter,
but
Why do you need to read it every 6 minutes? Why not just read it as it arrives?
If it naturally arrives in 6 minute bursts, you'll read it in 6 minute bursts,
no?
Perhaps the data does not have timestamps embedded in it, so that is why you
are relying on time-based topic names? In that case I w
"And if you can't consume it all within 6 minutes, partition the topic
until you can run enough consumers such that you can keep up.", this is
what I intend to do for each 6min -topic.
What I really need is a partitioned queue: each 6 minute of data can put
into a separate partition, so that I can
It's still not clear to me why you need to create so many topics.
Write the data to a single topic and consume it when it arrives. It doesn't
matter if it arrives in bursts, as long as you can process it all within 6
minutes, right?
And if you can't consume it all within 6 minutes, partition t
Philip,
That is right. There is huge amount of data flushed into the topic within
each 6 minutes. Then at the end of each 6 min, I only want to read from
that specify topic, and data within that topic has to be processed as fast
as possible. I was originally using redis queue for this purpose, but
I'd love to know more about what you're trying to do here. It sounds like
you're trying to create topics on a schedule, trying to make it easy to locate
data for a given time range? I'm not sure it makes sense to use Kafka in this
manner.
Can you provide more detail?
Philip
---
Todd,
I actually only intend to keep each topic valid for 3 days most. Each of
our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
there is no api for deleting topic, i guess i could set up a cron job
deleting the out dated topics(folders) from zookeeper..
do you know when the
You need to consider your total partition count as you do this. After 30
days, assuming 1 partition per topic, you have 7200 partitions. Depending
on how many brokers you have, this can start to be a problem. We just
found an issue on one of our clusters that has over 70k partitions that
there¹s no
17 matches
Mail list logo