Thanks @Jay for suggesting changes to batch.size and linger.ms.  I tried
them out. It appears one can do better than the default batch.size for
this synchronous batch mode with flush().

These new measurements are giving more "rational" numbers which with I can
reason and infer some thumb rules (for batch-sync mode using flush).


Here are my observations:
   - The new producer API does much better than the older one for *single
threaded* producer. (best# i saw with old is ~68MB/s, with new ~140MB/s)
   - Higher linger.ms sometimes helps perf and at other times hurts. No
simple rule here. Best to try it out and decide whether default is good
for your case or not.
   - For single threaded producer: To get the most throughput, set
batch.size = (total bytes between flushes / partition count).
   - Running more single threaded producer processes helped (till about
till 3 / 4 processes)
   - 1-producer going to single partition is faster than 1 producer going
to multiple partitions
   - The number of bytes between two explicit flushes (ie. flush interval)
made much smaller impact than the buffer.size. Something to be learnt
here.. my speculation is that with smaller flush intervals this might
change. Having two knobs (batch.size & flush interval is a a bit confusing
for end users trying to tune it, will be good if we can find if there is
some simple guidance feasible)
- Other than some inconveniences previously mentioned, I feel flush()
could be used as a way to simulate sync-batch behavior.

Producer Limits:
   - Able to exceed 1gigEthernet capacity, but not 10gigEthernet. Does not
appear to go beyond ~460MB/s. Verified my test machines are able to
achieve 1GB/s.

Todo:
- Need to try Multi threaded producer.
- I did some testing of the Consumer APIs as well with 0.8.1 consumer-perf
tool. Wasnt able to push it beyond  30MB/s. When producers ran in parallel
it fell to under 10MB/s. Need to dig deeper. Will report back. Suggestions
welcome.



Measurements:

 - See attachment 
 - Also available on paste bin:  http://pastebin.com/p3kSAjy6


Settings: acks=1, single broker, single threaded producer (new api)
Machines: 32 cores, 256GB RAM, 10 gigE, 6x15000 rpm disks


            1 partition         
                                   FlushInt=4MB    FlushInt=8MB    
FlushInt=16MB   
linger=def  batch.size = default         57              54               52  
linger=1s   batch.size = default         57              61               59  

linger=def  batch.size= flushInt/parts  136             125              116 
linger=1s   batch.size= flushInt/parts   92              77               56  

linger=def  batch.size == flushInt      140             123              124 
linger=def  batch.size = 10MB           140             123              124 
linger=def  batch.Size = 20MB            31              30               42  


            4 partitions            
                                    FlushInt=4MB    FlushInt=8MB    
FlushInt=16MB
linger=def  batch.size = default        95               82               80  
linger=1s   batch.size = default        85               83               85  

linger=def  batch.size= batch/#part     127             133               90  
linger=1s   batch.size= batch/#part     94              100              101 

linger=def  batch.size == flushInt      60                8                6   
linger=def  batch.size = 10M            7                 7                7   
linger=def  batch.Size = 20M            6                 6                5   
                        
                        
            8 partitions            
                                    FlushInt=4MB    FlushInt=8MB    
FlushInt=16MB
linger=def  batch.size = default        100              89               96    
linger=1s   batch.size = default        105              97               98  

linger=def  batch.size= batch/#part     114             128               78  
linger=1s   batch.size= batch/#part      95              94              102 

linger=def  batch.size == flushInt        7               8                8   
linger=def  batch.size = 10M              7               8                7   
linger=def  batch.Size = 20M              6               6                6   
                        

With multiple procduers (each single threaded)


For 1 partition :
1 process = 136 MB/s
3 process = 344 MB/s
4 process = 290 MB/s


For 4 partition ():
1 process = 127 MB/s
3 process = 345 MB/s
4 process = 372 MB/s


For 8 partition ():
1 process = 128 MB/s
3 process = 304 MB/s
4 process = 460 MB/s






Reply via email to