Hi,
We are using Kafka for our messaging system and we have an estimate for 200
TB/week in the coming months. Will it impact any performance for Kafka?
PS: We will be having greater than 2 lakh partitions.
--
Regards
Vamsi Subhash
We definitely need a retention policy of a week. Hence.
On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:
>
> Hi,
>
> We are using Kafka for our messaging system and we have an estimate for
> 200 TB/week in the coming months. Will it impact any performance
hi,
Few things you have to plan for:
a. Ensure that from resilience point of view, you are having sufficient
follower brokers for your partitions.
b. In my testing of kafka (50TB/week) so far, haven't seen much issue with
CPU utilization or memory. I had 24 CPU and 32GB RAM.
c. 200,000 partitions
Yes. We need those many max partitions as we have a central messaging
service and thousands of topics.
On Friday, December 19, 2014, nitin sharma
wrote:
> hi,
>
> Few things you have to plan for:
> a. Ensure that from resilience point of view, you are having sufficient
> follower brokers for you
Technically/conceptually it is possible to have 200,000 topics, but do you
really need it like that?What do you intend to do with those messages - i.e.
how do you forsee them being processed downstream? And are those topics really
there to segregate different kinds of processing or different ids
We require:
- many topics
- ordering of messages for every topic
- Consumers hit different Http EndPoints which may be slow (in a push
model). In case of a Pull model, consumers may pull at the rate at which
they can process.
- We need parallelism to hit with as many consumers. Hence, we currently
see some comments inline
On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:
>
> We require:
> - many topics
> - ordering of messages for every topic
>
Ordering is only on a per partition basis so you might have to pick a
partition key that makes sense for
Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
partitions? I think you can take what I said below and change my 250 to 25
as I went with your result (1,000,000) and not your arguments (2,000 x 50).
And you should think on the processing as a separate step from fetch and
com
Hi Jay,
Many thanks for the info. All that makes sense, but from an API
standpoint when something is labelled async and returns a Future, this will
be misconstrued and developers will place async sends in critical client
facing request/response pathways of code that should never block. If the
app
Joe,
- Correction, it's 1,00,000 partitions
- We can have at max only 1 consumer/partition. Not 50 per 1 partition.
Yes, we have a hashing mechanism to support future partition increase as
well. We override the Default Partitioner.
- We use both Simple and HighLevel consumers depending on the cons
Hi all,
I was wondering what why every ProducerRecord sent requires a serialized
key. I am using kafka, to send opaque bytes and I am ending up creating
garbage keys because I don't really have a good one.
Thanks,
Rajiv
Hi Rajiv,
You can send messages without keys. Just provide null for key.
Jiangjie (Becket) Qin
On 12/19/14, 10:14 AM, "Rajiv Kurian" wrote:
>Hi all,
>
>I was wondering what why every ProducerRecord sent requires a serialized
>key. I am using kafka, to send opaque bytes and I am ending up crea
Hey Paul,
I agree we should document this better.
We allow and encourage using partitions to semantically distribute data. So
unfortunately we can't just arbitrarily assign a partition (say 0) as that
would actually give incorrect answers for any consumer that made use of the
partitioning. It is
Hi folks,
I am new to both Kafka and Storm and I have problem having KafkaSpout to
get data from Kafka in our three-node environment with Kafka 0.8.1.1 and
Storm 0.9.3.
What is working:
- I have a Kafka producer (a java application) to generate random string to
a topic and I was able to run the f
@Joe, Achanta is using Indian English numerals which is why it's a little
confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system
1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The
rest of the world :P)
On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash <
a
Hi
I would like to get some feedback on design choices with kafka consumers.
We have an application that a consumer reads a message and the thread does
a number of things, including database accesses before a message is
produced to another topic. The time between consuming and producing the
message
Thanks, didn't know that.
On Fri, Dec 19, 2014 at 10:39 AM, Jiangjie Qin
wrote:
>
> Hi Rajiv,
>
> You can send messages without keys. Just provide null for key.
>
> Jiangjie (Becket) Qin
>
>
> On 12/19/14, 10:14 AM, "Rajiv Kurian" wrote:
>
> >Hi all,
> >
> >I was wondering what why every Produce
Hi Jay,
I have implemented a wrapper around the producer to behave like I want it
to. Where it diverges from current 0.8.2 producer is that it accepts three
new inputs:
- A list of expected topics
- A timeout value to init meta for those topics during producer creationg
- An option to blow up if
Also, if log.cleaner.enable is true in your broker config, that enables the
log-compaction retention strategy.
Then, for topics with the per-topic "cleanup.policy=compact" config
parameter set, Kafka will scan the topic periodically, nuking old versions of
the data with the same key.
Yeah if you want to file and JIRA and post a patch for a new option its
possible others would want it. Maybe something like
pre.initialize.topics=x,y,z
pre.initialize.timeout=x
The metadata fetch timeout is a bug...that behavior is inherited from
Object.wait which defines zero to mean infinite
20 matches
Mail list logo