Note that the batch size is per partition. The more partitions you have, the longer it will take to fill up all partitions with the same batch size. So, you probably need to increase the linger time such that in dependent of the number of partitions, the configured batch size can be reached. There is a jmx in the producer that tells you the average batch size.
Thanks, Jun On Tue, Apr 28, 2015 at 7:58 PM, Roshan Naik <ros...@hortonworks.com> wrote: > Based on recent suggestion by Joel, I am experimenting with using flush() > to simulate batched-sync behavior. > The essence of my single threaded producer code is : > > for (int i = 0; i < numRecords;) { > // 1- Send a batch > for(int batchCounter=0; batchCounter<batchSz; ++batchCounter) { > Future<RecordMetadata> f = producer.send(record, null); > futureList.add(f); > i++; > } > // 2- Flush after sending batch > producer.flush(); > > // 3- Ensure all msgs were send > for( Future<RecordMetadata> f : futureList) { > f.get(); > } > } > > There are actually two batch size in play here. One is the number of > messages between every flush() call made by the client. The other is the > batch.size setting which impacts the batching internally done by the > underlying Async api. > > Intuitively .. we either want to > A) Set both batch sizes to be Equal, OR > B) Set the underlying batch.size to a sufficiently large number so as to > effectively disable internal batch management > > > Below numbers are in MB/s. The 'Batch' column indicate the number of > events between each explicit client flush() > Setup is 1-node broker and acks=1. > > 1 partition > Batch=4k Batch=8k Batch=16k > Equal batchSizes (a) 16 32 52 > large batch.Size (b) 140 123 124 > > 4 partitions > Batch=4k Batch=8k Batch=16k > Equal batchSz (a) 35 61 82 > large batch.size (b) 7 7 7 > 8 partitions > Batch=4k Batch=8k Batch=16k > Equal batchSz (a) 49 70 99 > large batch.size (b) 7 8 7 > > > There are two issues noticeable in these number: > 1 - Case A is much faster than case B for 4 and 8 partitions. > 2 - Single partition mode outperforms all others and here case B is faster > than case A. > > > > > Side Note: I used the client APIs from the trunk while the broker is > running 0.8.2 (I don't think it matters, but nevertheless wanted to point > out) > >