Just want to confirm that when you say batch.size and number of records
will be equal you don't mean that literally. The batch.size is in bytes so
if you wanted a batch size of 16 1k messages for a single partition then
you are setting batch.size=16*1024.

-Jay

On Tue, Apr 28, 2015 at 5:58 PM, Roshan Naik <ros...@hortonworks.com> wrote:

> Based on recent suggestion by Joel, I am experimenting with using flush()
> to simulate  batched-sync behavior.
> The essence of my  single threaded producer code is :
>
>     for (int i = 0; i < numRecords;) {
>         // 1- Send a batch
>         for(int batchCounter=0; batchCounter<batchSz; ++batchCounter) {
>             Future<RecordMetadata> f =  producer.send(record, null);
>             futureList.add(f);
>             i++;
>         }
>         // 2- Flush after sending batch
>         producer.flush();
>
>         // 3- Ensure all msgs were send
>         for( Future<RecordMetadata> f : futureList) {
>             f.get();
>         }
>     }
>
> There are actually two batch size in play here. One is the number of
> messages between every flush() call made by the client. The other is the
> batch.size  setting which impacts the batching internally done by the
> underlying Async api.
>
> Intuitively  .. we either want to
>   A) Set both batch sizes to be Equal, OR
>   B) Set the underlying batch.size to a sufficiently large number so as to
> effectively disable internal batch management
>
>
> Below numbers are in MB/s.  The 'Batch' column indicate the number of
> events between each explicit client flush()
> Setup is 1-node broker and acks=1.
>
>                 1 partition
>                 Batch=4k        Batch=8k        Batch=16k
> Equal batchSizes (a)    16      32      52
> large batch.Size (b)    140     123     124
>
>                 4 partitions
>                 Batch=4k        Batch=8k        Batch=16k
> Equal batchSz (a)       35      61      82
> large batch.size (b)    7       7       7
>                 8 partitions
>                 Batch=4k        Batch=8k        Batch=16k
> Equal batchSz (a)               49      70      99
> large batch.size (b)    7       8       7
>
>
> There are two issues noticeable in these number:
> 1 - Case A is much faster than case B for 4 and 8 partitions.
> 2 - Single partition mode outperforms all others and here case B is faster
> than case A.
>
>
>
>
> Side Note: I used the  client APIs  from the trunk while the broker is
> running 0.8.2 (I don't think it matters, but nevertheless wanted to point
> out)
>
>

Reply via email to