Roshan,

The client allocates a batch per partition and has a hard cap on memory
usage (default 32MB). When it hits that cap it waits for in-flight requests
to complete to use their memory. Setting the batch size to 20M is not
good--that means each partition has a 20MB array allocated for it. This is
not needed.

If you just want good performance I think leave the batch.size alone, the
16k default is probably fine and more than 64k is unlikely to help at all.
Instead make sure you set linger.ms > 0 so you actually wait for the buffer
to fill up (or for flush to be called) instead of immediately sending
requests as soon as soon as there is some data to send.

-Jay

On Wed, Apr 29, 2015 at 8:32 PM, Roshan Naik <ros...@hortonworks.com> wrote:

>
> @Jay,
> My bad. I mistook the batch.size to be number of messages instead of
> bytes. Below are revised measurements based on computing the batch.size in
> bytes .
>
> @Jun,
>
>    With explicit flush()...  linger should not impact. Isn't it ?
>
> @Wang,
>    Larger batches are not necessarily giving better numbers are you can
> see below.
>
>
> The 2 problems I noted earlier still exist in the batched sync mode (using
> flush() ).
>
>   *   batch.size still seems to play a factor even when set to a larger
> value than the bytes generated by client
>   *   4 & 8 partition see a big slowdown
>
>
>
> Revised measurements for new Producer API:
>
> - All cases...Single threaded, 1k event size
>
>
> Batched SYNC using flus() , acks=1
>
>
>
>
>
>
>
>
>
>
>         1 partition
>
>
>
>
>
>
>         Batch=4k        Batch=8k        Batch=16k
>
>
>         batch.size == clientBatch       140
>         124
>
>
>         batch.size = 10MB       140     123     124
>
>
>         batch.Size = 20MB       31      30      42
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>         4 partitions
>
>
>
>
>
>
>         Batch=4k        Batch=8k        Batch=16k
>
>
>         batch.size == clientBatch       60      8       6
>
>
>         batch.size = 10M        7       7       7
>
>
>         batch.Size = 20M        6       6       5
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>         8 partitions
>
>
>
>
>
>
>         Batch=4k        Batch=8k        Batch=16k
>
>
>         batch.size == clientBatch       7       8       8
>
>
>         batch.size = 10M        7       8       7
>
>
>         batch.Size = 20M        6       6       6
>
>
>
> Just for reference I also took the number for  default ASYNC mode with
> acks=1 :
>
>
>
>
>
>
>         batch.size=deafult      batch.size=4MB  batch.size=8MB
> batch.size=16MB
> 1 partition     53      130     113     76
> 4 partitions    84      126     9       7
> 8 partitions    9       12      10      5
>
>
>
>
>
>
>
>
>

Reply via email to