Yes I am specifying a key for each message.

The same producer code works much slower when sending messages to a topic
with multiple partitions comparing to a topic with a single partition. This
doesn't make any sense to me at all.

If I understand correctly I need multiple partitions in order to scale the
consumers.

Could it be because the async producer is creating a connection per broker
(or per partition) and this is done in a serial way once the producer needs
to sens the messages? maybe when using a single partition the producer is
dong it in one batch

BTW, I have tried using multiple Producer instances but still I get poor
performance when using a topic with multiple partitions (by multiple
partitions I mean 12 which is exactly the number of broker machines
multiply by the number of disks I have on each machine which sounds
reasonable to me)

Is there any solution anyone can think of?


Yosi



On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <jun...@gmail.com> wrote:

> In 0.7, we have 1 producer send thread per broker. This is changed in 0.8,
> where there is only 1 producer send thread per producer. If a producer
> needs to send messages to multiple brokers, the send thread will do that
> serially, which will reduce the throughput. We plan to improve that in 0.9
> through client rewrites. For now, you can improve the throughput by either
> using a larger batch size or using more producer instances.
>
> As for degraded performance with more partitions, are you specifying a key
> for each message?
>
> Thanks,
>
> Jun
>
> On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yosi.bot...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using kafka 0.8. I have 3 machines each running kafka broker.
> >
> > I am using async mode of my Producer. I expected to see 3 different
> threads
> > with names starting with ProducerSendThread- (according to this article:
> > http://engineering.gnip.com/kafka-async-producer/)
> >
> > However I can see only one thread with the name *ProducerSendThread-*
> >
> > This is my producer configuration:
> >
> > server=1
> > topic=dat7
> > metadata.broker.list=
> > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > serializer.class=kafka.serializer.DefaultEncoder
> > request.required.acks=1
> > compression.codec=snappy
> > producer.type=async
> > queue.buffering.max.ms=2000
> > queue.buffering.max.messages=1000
> > batch.num.messages=500
> >
> >
> > *What am I missing here?*
> >
> >
> > BTW, I have also experienced very strange behavior regrading my producer
> > performance (which may or may not be related to the issue above).
> >
> > When I have defined a topic with 1 partition I got much better throughput
> > comparing to a topic with 3 partitions. A producer sending messages to a
> > topic with 3 partitions had much better throughput comparing to a topic
> > with 12 partitions.
> >
> > I would expect to have best performance for the topic with 12 partitions
> > since I have 3 machines running a broker each of with 4 disks (the broker
> > is configured to use all 4 disks)
> >
> > *Is there any logical explanation for this behavior?*
> >
>

Reply via email to