Re: Kafka Scalability with the Number of Partitions

Tom Crayford Wed, 25 May 2016 03:03:05 -0700

Hi,

Kafka's performance all comes from batching. There's going to be a huge
perf impact from limiting your batching like that, and that's likely the
issue. I'd recommend designing your system around Kafka's batching model,
which involves large numbers of messages per fetch request.


Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Yazeed Alabdulkarim <y.alabdulka...@gmail.com>
wrote:

> Hi Tom,
> Thank you for your help. I have only one broker. I used kafka production
> server configuration listed in kafka's documentation page:
> http://kafka.apache.org/documentation.html#prodconfig . I have increased
> the flush interval and number of messages to prevent the disk from becoming
> the bottleneck. For the consumers, I used the following configurations:
> Properties props = new Properties();
> props.put("enable.auto.commit", "true");
> props.put("request.timeout.ms", "500000000");
> props.put("session.timeout.ms", "50000000");
> props.put("connections.max.idle.ms", "50000000");
> props.put("fetch.min.bytes", 1);
> props.put("fetch.max.wait.ms", "500");
> props.put("group.id", "gid");
> props.put("key.deserializer", StringDeserializer.class.getName());
> props.put("value.deserializer", StringDeserializer.class.getName());
> props.put("max.partition.fetch.bytes", "128");
> consumer = new KafkaConsumer<String, String>(props);
>
> I am setting the max.partition.fetch.bytes to 128, because I only want to
> process one record for each poll.
>
> Thank a lot for your help. I really appreciate it.
>
> On Tue, May 24, 2016 at 7:51 AM, Tom Crayford <tcrayf...@heroku.com
> <javascript:;>> wrote:
>
> > What's your server setup for the brokers and consumers? Generally I'd
> > expect something to be exhausted here and that to end up being the
> > bottleneck.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim <
> > y.alabdulka...@gmail.com <javascript:;>> wrote:
> >
> > > Hi,
> > > I am running simple experiments to evaluate the scalability of Kafka
> > > consumers with respect to the number of partitions. I assign every
> > consumer
> > > to a specific partition. Each consumer polls the records in its
> assigned
> > > partition and print the first one, then polls again from the offset of
> > the
> > > printed record until all records are printed. Prior to running the
> test,
> > I
> > > produce 10 Million records evenly among partitions. After running the
> > test,
> > > I measure the time it took for the consumers to print all the records.
> I
> > > was expecting Kafka to scale as I increase the number of
> > > consumers/partitions. However, the scalability diminishes as I increase
> > the
> > > number of partitions/consumers, beyond certain number. Going from
> 1,2,4,8
> > > the scalability is great as the duration of the test is reduced by the
> > > factor increase of the number of partitions/consumers. However, beyond
> 8
> > > consumers/partitions, the duration of the test reaches a steady state.
> I
> > am
> > > monitoring the resources of my server and didn't see any bottleneck.
> Am I
> > > missing something here? Shouldn't Kafka consumers scale with the number
> > of
> > > partitions?
> > > --
> > > Best Regards,
> > > Yazeed Alabdulkarim
> > >
> >
>
>
>
> --
> Best Regards,
> Yazeed Alabdulkarim
>

Re: Kafka Scalability with the Number of Partitions

Reply via email to