Hey Jordan,

I've put up a perf test on:

  https://issues.apache.org/jira/browse/SAMZA-548

The JIRA describes the test implementation, observed performance, and noted
deficiencies in the test. I'm getting much more than 2mb/s.

Cheers,
Chris

On Fri, Feb 6, 2015 at 8:34 AM, Chris Riccomini <criccom...@apache.org>
wrote:

> Hey Jordan,
>
> > I peaked out a single Samza container's consumer at around 2MB/s.
>
> Could you post your configs, and version of Samza that you're running?
>
> > Running a Kafka Consumer Perf test though on the same machine I can do
> 100's of MB/s.
>
> How many threads were you running? Also, you're saying "consumer perf"
> here. Consumer and producer exhibit very different throughput
> characteristics. Can you describe (or post) the two tests that you did?
>
> > It seems like most of the bottleneck exists in the Kafka async client.
>
> Yes, this is what we've observed as well.
>
> > A reasonable solution might be to just add partitions and increase
> container count with the partition count.
>
> This is usually the guidance that we give. If you have 8 cores, and want
> to max out your machine, you should run 8 containers.
>
> > Has there been any design discussions into allowing multiple cores on
> on a single container to allow better pipelining within the container?
>
> The discussion pretty much is what you've just described. We never felt
> that the increase in code complexity, configs, mental model was worth the
> trade-off. My argument is that we should make the Kafka producer go faster
> (see comments below), rather than increasing complexity in Samza to get
> around it.
>
> > I also know that Kafka has plans to rework their producer but I haven't
> been able to find if this includes introducing a thread pool to allow
> multiple async produces.
>
> We have upgraded Samza to the new producer in SAMZA-227. The code changes
> are on master now. You should definitely check that out.
>
> The new Kafka producer works as follows: there is one "sender" thread.
> When you send messages, the messages get queued up, and the sender thread
> takes them off the queue, and sends them to Kafka. One trick with the new
> producer is that they are using NIO, and allow for pipelining. This is
> *specifically* to address the point you made about those that care more
> about throughput than ordering guarantees. The config of interest to you is:
>
>   max.in.flight.requests.per.connection
>
> This defines how many parallel sends can be pipelined (over one socket, in
> the sender thread) before the send thread blocks. Samza forces this to 1
> right now (because we wanted to guarantee ordering). It seems like a
> reasonable request to allow users to over-ride this with their own setting
> if they want more parallelism. Could you open a JIRA for that?
>
> I should note, in smoke tests, with max-in-flight set to one in Samza, the
> perf seemed roughly on-par with the Samza running the old Kafka producer. I
> also spoke to Jay at the last Kafka meetup, and he mentioned that they
> don't see much of a performance boost when running max-in-flight > 1. Jun
> did some perf comparison between the old and new Kafka producer, and put
> the information on some slides that he presented at the meetup. If you're
> interested, you should ping them on the Kafka mailing list.
>
> > Lastly, has anyone been able to get more MB/s out of a container than
> what I have?
>
> Thus far, I (personally) haven't spent much time on producer-side
> optimization, so I don't have hard numbers on it. Our producer code is
> pretty thin, so we're pretty much bound to what the Kafka producer can
> do.If you're up for it, you might want to contribute something to:
>
>   https://issues.apache.org/jira/browse/SAMZA-6
>
> Here's what I'd recommend:
>
> 0. Write something reproducible and post it on SAMZA-6. For bonus points,
> write an equivalent raw-Kafka-producer test (no Samza) so we can compare
> them.
> 1. Checkout master.
> 2. Modify master to allow you to configure max-in-flights > 1 (line 185 of
> KafkaConfig.scala).
> 3. Try setting acks to 0 (it's 1 by default).
>
> Try running your tests at every one of these steps, and see how it affects
> performance. If you get to 3, and things are still slow, we can loop in
> some Kakfa-dev folks.
>
> Cheers,
> Chris
>
> On Fri, Feb 6, 2015 at 12:00 AM, Jordan Shaw <jor...@pubnub.com> wrote:
>
>> Hi everyone,
>> I've done some raw Disk, Kafka and Samza benchmarking. I peaked out a
>> single Samza container's consumer at around 2MB/s. Running a Kafka
>> Consumer
>> Perf test though on the same machine I can do 100's of MB/s. It seems like
>> most of the bottleneck exists in the Kafka async client. There appears to
>> be only 1 thread in the Kafka client rather than a thread pool and due to
>> the limitation that a container can't run on multiple cores this thread
>> gets scheduled I assume on the same core as the consumer and process call.
>>
>> I know a lot thought has been put into the design of maintaining parity
>> between task instances and partitions and preventing unpredictable
>> behavior
>> from a threaded system. A reasonable solution might be to just add
>> partitions and increase container count with the partition count. This is
>> at the cost of increasing memory usage on the node managers necessarily
>> due
>> to the increased container count.
>>
>> Has there been any design discussions into allowing multiple cores on on a
>> single container to allow better pipelining within the container to get
>> better throughput and also introducing a thread pool outside of Kafka's
>> client to allow concurrent produces to Kafka within the same container? I
>> understand there are ordering concerns with this concurrency and for those
>> sensitive use cases the thread pool could be 1 but for use cases where
>> ordering is less important and raw throughput is more of a concern they
>> can
>> achieve that with allowing current async produces. I also know that Kafka
>> has plans to rework their producer but I haven't been able to find if this
>> includes introducing a thread pool to allow multiple async produces.
>> Lastly, has anyone been able to get more MB/s out of a container than what
>> I have? Thanks!
>>
>> --
>> Jordan
>>
>
>

Reply via email to