As the discussion in a separate thread, it might be making sense to check
what is the difference between using pooled allocator and unpooled
allocator using v3 protocol. Also considering comparing using heap buffer
and direct buffer as well.

I am suspecting this might contribute latency.

- Sijie

On Thu, Jul 20, 2017 at 4:49 AM, Enrico Olivelli <[email protected]>
wrote:

> Kishore, do you have news?
>
> Il ven 14 lug 2017, 09:05 Enrico Olivelli <[email protected]> ha
> scritto:
>
> > At the meeting we told the Kishore will perform some benchmarks on his
> > side.
> > He will take a look at my code, and we are going to share the results.
> > Maybe it will be possible to share the results of benchmarks done from
> > Kishore at Salesforce too.
> >
> > The primary goal is to understand the differences between 4.4 and 4.5,
> for
> > instance if we there is a need to change JVM/BK configuration in order to
> > make 4.5 perform as 4.4.
> >
> > @Sijie I hope I have answered your questions.
> >
> >
> > -- Enrico
> >
> >
> > 2017-07-13 9:29 GMT+02:00 Enrico Olivelli <[email protected]>:
> >
> >>
> >>
> >> 2017-07-13 4:11 GMT+02:00 Sijie Guo <[email protected]>:
> >>
> >>> On Wed, Jul 12, 2017 at 10:35 PM, Enrico Olivelli <[email protected]
> >
> >>> wrote:
> >>>
> >>> > Sijie, JV, just a recap my point of view:
> >>> > - considering latency = "time for asynchAddEntry to complete"
> >>> > - there is a some difference from 4.4 and 4.5 in the usage of memory,
> >>> but
> >>> > no so clear
> >>> > - the type of GC (parallel vs G1) does not impact very much but with
> >>> G1 you
> >>> > achieve best latency
> >>> >
> >>>
> >>> Did you try CMS for v3 protocol?
> >>>
> >>
> >> BK 4.5 + v3 procotol + CMS + throttle = 0
> >>
> >> #0 Total wall clock time: 309.0 ms, total callbacks time: 249753 ms,
> >> entry size 0,034 MB -> 249,75 ms per entry (latency),0,3 ms per entry
> >> (throughput) 110,6 MB/s throughput
> >> #1 Total wall clock time: 192.0 ms, total callbacks time: 156268 ms,
> >> entry size 0,034 MB -> 156,27 ms per entry (latency),0,2 ms per entry
> >> (throughput) 178,0 MB/s throughput
> >> #2 Total wall clock time: 129.0 ms, total callbacks time: 95279 ms,
> entry
> >> size 0,034 MB -> 95,28 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 265,0 MB/s throughput
> >> #3 Total wall clock time: 125.0 ms, total callbacks time: 94661 ms,
> entry
> >> size 0,034 MB -> 94,66 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 273,4 MB/s throughput
> >> #4 Total wall clock time: 112.0 ms, total callbacks time: 83015 ms,
> entry
> >> size 0,034 MB -> 83,02 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 305,2 MB/s throughput
> >> #5 Total wall clock time: 111.0 ms, total callbacks time: 82839 ms,
> entry
> >> size 0,034 MB -> 82,84 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 307,9 MB/s throughput
> >> #6 Total wall clock time: 114.0 ms, total callbacks time: 82562 ms,
> entry
> >> size 0,034 MB -> 82,56 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 299,8 MB/s throughput
> >> #7 Total wall clock time: 104.0 ms, total callbacks time: 74356 ms,
> entry
> >> size 0,034 MB -> 74,36 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 328,7 MB/s throughput
> >> #8 Total wall clock time: 110.0 ms, total callbacks time: 83488 ms,
> entry
> >> size 0,034 MB -> 83,49 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 310,7 MB/s throughput
> >> #9 Total wall clock time: 102.0 ms, total callbacks time: 68787 ms,
> entry
> >> size 0,034 MB -> 68,79 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 335,1 MB/s throughput
> >>
> >> gc.log
> >> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> 4.3.0
> >> 20080428 (Red Hat 4.3.0-8)
> >> Memory: 4k page, physical 32870636k(3276148k free), swap
> >> 15622140k(15460604k free)
> >> CommandLine flags: -XX:+CMSIncrementalMode
> -XX:InitialHeapSize=2147483648
> >> <(214)%20748-3648> -XX:MaxHeapSize=2147483648 <(214)%20748-3648>
> >> -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6
> -XX:NewSize=697933824
> >> -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:+PrintGC
> >> -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> 0,630: [GC (Allocation Failure)  545344K->8508K(2029056K), 0,0065802
> secs]
> >> 1,196: [GC (Allocation Failure)  553852K->117940K(2029056K), 0,0385576
> >> secs]
> >> 1,235: [GC (CMS Initial Mark)  119486K(2029056K), 0,0022661 secs]
> >> 1,544: [GC (Allocation Failure)  663284K->207833K(2029056K), 0,0330904
> >> secs]
> >> 1,731: [GC (CMS Final Remark)  560308K(2029056K), 0,0105624 secs]
> >> 1,837: [GC (Allocation Failure)  753177K->214147K(2029056K), 0,0105574
> >> secs]
> >> 2,115: [GC (Allocation Failure)  759491K->241263K(2029056K), 0,0166134
> >> secs]
> >>
> >>
> >> BK 4.5 + v2 protocol + CMS + throttle = 0
> >>
> >> #0 Total wall clock time: 198.0 ms, total callbacks time: 169954 ms,
> >> entry size 0,034 MB -> 169,95 ms per entry (latency),0,2 ms per entry
> >> (throughput) 172,6 MB/s throughput
> >> #1 Total wall clock time: 106.0 ms, total callbacks time: 70885 ms,
> entry
> >> size 0,034 MB -> 70,89 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 322,4 MB/s throughput
> >> #2 Total wall clock time: 109.0 ms, total callbacks time: 71848 ms,
> entry
> >> size 0,034 MB -> 71,85 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 313,6 MB/s throughput
> >> #3 Total wall clock time: 97.0 ms, total callbacks time: 73583 ms, entry
> >> size 0,034 MB -> 73,58 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 352,4 MB/s throughput
> >> #4 Total wall clock time: 91.0 ms, total callbacks time: 61293 ms, entry
> >> size 0,034 MB -> 61,29 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 375,6 MB/s throughput
> >> #5 Total wall clock time: 98.0 ms, total callbacks time: 73006 ms, entry
> >> size 0,034 MB -> 73,01 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 348,8 MB/s throughput
> >> #6 Total wall clock time: 104.0 ms, total callbacks time: 75696 ms,
> entry
> >> size 0,034 MB -> 75,70 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 328,7 MB/s throughput
> >> #7 Total wall clock time: 103.0 ms, total callbacks time: 79260 ms,
> entry
> >> size 0,034 MB -> 79,26 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 331,8 MB/s throughput
> >> #8 Total wall clock time: 92.0 ms, total callbacks time: 62232 ms, entry
> >> size 0,034 MB -> 62,23 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 371,5 MB/s throughput
> >> #9 Total wall clock time: 98.0 ms, total callbacks time: 65359 ms, entry
> >> size 0,034 MB -> 65,36 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 348,8 MB/s throughput
> >>
> >>
> >> gc.log
> >> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> 4.3.0
> >> 20080428 (Red Hat 4.3.0-8)
> >> Memory: 4k page, physical 32870636k(3154408k free), swap
> >> 15622140k(15460604k free)
> >> CommandLine flags: -XX:+CMSIncrementalMode
> -XX:InitialHeapSize=2147483648
> >> <(214)%20748-3648> -XX:MaxHeapSize=2147483648 <(214)%20748-3648>
> >> -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6
> -XX:NewSize=697933824
> >> -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:+PrintGC
> >> -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> 0,554: [GC (Allocation Failure)  545344K->8624K(2029056K), 0,0060763
> secs]
> >> 1,478: [GC (Allocation Failure)  552028K->79872K(2029056K), 0,0252399
> >> secs]
> >> 1,504: [GC (CMS Initial Mark)  84227K(2029056K), 0,0017813 secs]
> >>
> >>
> >> BK 4.4 + v3protocol + CMS + throttle = 0
> >>
> >> #0 Total wall clock time: 214.0 ms, total callbacks time: 207369 ms,
> >> entry size 0,034 MB -> 207,37 ms per entry (latency),0,2 ms per entry
> >> (throughput) 159,7 MB/s throughput
> >> #1 Total wall clock time: 153.0 ms, total callbacks time: 140661 ms,
> >> entry size 0,034 MB -> 140,66 ms per entry (latency),0,2 ms per entry
> >> (throughput) 223,4 MB/s throughput
> >> #2 Total wall clock time: 159.0 ms, total callbacks time: 145131 ms,
> >> entry size 0,034 MB -> 145,13 ms per entry (latency),0,2 ms per entry
> >> (throughput) 215,0 MB/s throughput
> >> #3 Total wall clock time: 109.0 ms, total callbacks time: 105018 ms,
> >> entry size 0,034 MB -> 105,02 ms per entry (latency),0,1 ms per entry
> >> (throughput) 313,6 MB/s throughput
> >> #4 Total wall clock time: 114.0 ms, total callbacks time: 110111 ms,
> >> entry size 0,034 MB -> 110,11 ms per entry (latency),0,1 ms per entry
> >> (throughput) 299,8 MB/s throughput
> >> #5 Total wall clock time: 118.0 ms, total callbacks time: 111534 ms,
> >> entry size 0,034 MB -> 111,53 ms per entry (latency),0,1 ms per entry
> >> (throughput) 289,7 MB/s throughput
> >> #6 Total wall clock time: 98.0 ms, total callbacks time: 87425 ms, entry
> >> size 0,034 MB -> 87,43 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 348,8 MB/s throughput
> >> #7 Total wall clock time: 135.0 ms, total callbacks time: 129583 ms,
> >> entry size 0,034 MB -> 129,58 ms per entry (latency),0,1 ms per entry
> >> (throughput) 253,2 MB/s throughput
> >> #8 Total wall clock time: 99.0 ms, total callbacks time: 84986 ms, entry
> >> size 0,034 MB -> 84,99 ms per entry (latency),0,1 ms per entry
> (throughput)
> >> 345,2 MB/s throughput
> >> #9 Total wall clock time: 125.0 ms, total callbacks time: 117492 ms,
> >> entry size 0,034 MB -> 117,49 ms per entry (latency),0,1 ms per entry
> >> (throughput) 273,4 MB/s throughput
> >>
> >> gc.log
> >> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> 4.3.0
> >> 20080428 (Red Hat 4.3.0-8)
> >> Memory: 4k page, physical 32870636k(3212484k free), swap
> >> 15622140k(15460604k free)
> >> CommandLine flags: -XX:+CMSIncrementalMode
> -XX:InitialHeapSize=2147483648
> >> <(214)%20748-3648> -XX:MaxHeapSize=2147483648 <(214)%20748-3648>
> >> -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6
> -XX:NewSize=697933824
> >> -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:+PrintGC
> >> -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> 0,572: [GC (Allocation Failure)  545344K->8016K(2029056K), 0,0057737
> secs]
> >> 0,882: [GC (Allocation Failure)  553360K->88089K(2029056K), 0,0205542
> >> secs]
> >> 0,903: [GC (CMS Initial Mark)  93974K(2029056K), 0,0010542 secs]
> >> 1,100: [GC (Allocation Failure)  633433K->94935K(2029056K), 0,0359948
> >> secs]
> >> 1,221: [GC (CMS Final Remark)  384627K(2029056K), 0,0079453 secs]
> >> 1,313: [GC (Allocation Failure)  640279K->98309K(2029056K), 0,0157774
> >> secs]
> >> 1,481: [GC (Allocation Failure)  643653K->154062K(2029056K), 0,0179418
> >> secs]
> >> 1,677: [GC (Allocation Failure)  699406K->210685K(2029056K), 0,0312668
> >> secs]
> >> 1,709: [GC (CMS Initial Mark)  211186K(2029056K), 0,0006615 secs]
> >> 1,894: [GC (Allocation Failure)  756029K->263420K(2029056K), 0,0168112
> >> secs]
> >>
> >> BK 4.5 + throttle = 5000 + CMS + v3 protocol
> >>
> >> #0 Total wall clock time: 248.0 ms, total callbacks time: 102307 ms,
> >> entry size 0,034 MB -> 102,31 ms per entry (latency),0,2 ms per entry
> >> (throughput) 137,8 MB/s throughput
> >> #1 Total wall clock time: 205.0 ms, total callbacks time: 12282 ms,
> entry
> >> size 0,034 MB -> 12,28 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >> #2 Total wall clock time: 204.0 ms, total callbacks time: 6920 ms, entry
> >> size 0,034 MB -> 6,92 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #3 Total wall clock time: 221.0 ms, total callbacks time: 13589 ms,
> entry
> >> size 0,034 MB -> 13,59 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 154,7 MB/s throughput
> >> #4 Total wall clock time: 204.0 ms, total callbacks time: 6023 ms, entry
> >> size 0,034 MB -> 6,02 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #5 Total wall clock time: 205.0 ms, total callbacks time: 6350 ms, entry
> >> size 0,034 MB -> 6,35 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >> #6 Total wall clock time: 205.0 ms, total callbacks time: 7849 ms, entry
> >> size 0,034 MB -> 7,85 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >> #7 Total wall clock time: 204.0 ms, total callbacks time: 6443 ms, entry
> >> size 0,034 MB -> 6,44 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #8 Total wall clock time: 206.0 ms, total callbacks time: 7009 ms, entry
> >> size 0,034 MB -> 7,01 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 165,9 MB/s throughput
> >> #9 Total wall clock time: 205.0 ms, total callbacks time: 6150 ms, entry
> >> size 0,034 MB -> 6,15 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >>
> >> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> 4.3.0
> >> 20080428 (Red Hat 4.3.0-8)
> >> Memory: 4k page, physical 32870636k(3247600k free), swap
> >> 15622140k(15460604k free)
> >> CommandLine flags: -XX:+CMSIncrementalMode
> -XX:InitialHeapSize=2147483648
> >> <(214)%20748-3648> -XX:MaxHeapSize=2147483648 <(214)%20748-3648>
> >> -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6
> -XX:NewSize=697933824
> >> -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:+PrintGC
> >> -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> 0,568: [GC (Allocation Failure)  545344K->8497K(2029056K), 0,0060302
> secs]
> >> 1,059: [GC (Allocation Failure)  553841K->112138K(2029056K), 0,0206230
> >> secs]
> >> 1,080: [GC (CMS Initial Mark)  117602K(2029056K), 0,0013282 secs]
> >> 1,561: [GC (Allocation Failure)  654247K->189231K(2029056K), 0,0379235
> >> secs]
> >> 2,079: [GC (Allocation Failure)  731215K->185468K(2029056K), 0,0107924
> >> secs]
> >> 2,320: [GC (CMS Final Remark)  431925K(2029056K), 0,0076070 secs]
> >> 2,635: [GC (Allocation Failure)  730812K->217376K(2029056K), 0,0136162
> >> secs]
> >>
> >>
> >> BK 4.4 + throttle = 5000 + CMS + v3 protocol
> >>
> >> #0 Total wall clock time: 208.0 ms, total callbacks time: 55274 ms,
> entry
> >> size 0,034 MB -> 55,27 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 164,3 MB/s throughput
> >> #1 Total wall clock time: 207.0 ms, total callbacks time: 11848 ms,
> entry
> >> size 0,034 MB -> 11,85 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 165,1 MB/s throughput
> >> #2 Total wall clock time: 208.0 ms, total callbacks time: 9442 ms, entry
> >> size 0,034 MB -> 9,44 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 164,3 MB/s throughput
> >> #3 Total wall clock time: 204.0 ms, total callbacks time: 6739 ms, entry
> >> size 0,034 MB -> 6,74 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #4 Total wall clock time: 205.0 ms, total callbacks time: 7546 ms, entry
> >> size 0,034 MB -> 7,55 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >> #5 Total wall clock time: 204.0 ms, total callbacks time: 6594 ms, entry
> >> size 0,034 MB -> 6,59 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #6 Total wall clock time: 205.0 ms, total callbacks time: 5885 ms, entry
> >> size 0,034 MB -> 5,89 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 166,7 MB/s throughput
> >> #7 Total wall clock time: 204.0 ms, total callbacks time: 7722 ms, entry
> >> size 0,034 MB -> 7,72 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 167,5 MB/s throughput
> >> #8 Total wall clock time: 224.0 ms, total callbacks time: 5968 ms, entry
> >> size 0,034 MB -> 5,97 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 152,6 MB/s throughput
> >> #9 Total wall clock time: 209.0 ms, total callbacks time: 6731 ms, entry
> >> size 0,034 MB -> 6,73 ms per entry (latency),0,2 ms per entry
> (throughput)
> >> 163,5 MB/s throughput
> >>
> >> gc.log
> >> Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >> (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> 4.3.0
> >> 20080428 (Red Hat 4.3.0-8)
> >> Memory: 4k page, physical 32870636k(3291888k free), swap
> >> 15622140k(15460604k free)
> >> CommandLine flags: -XX:+CMSIncrementalMode
> -XX:InitialHeapSize=2147483648
> >> <(214)%20748-3648> -XX:MaxHeapSize=2147483648 <(214)%20748-3648>
> >> -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6
> -XX:NewSize=697933824
> >> -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:+PrintGC
> >> -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> 0,554: [GC (Allocation Failure)  545344K->8022K(2029056K), 0,0055210
> secs]
> >> 0,890: [GC (Allocation Failure)  553366K->93114K(2029056K), 0,0177003
> >> secs]
> >> 0,909: [GC (CMS Initial Mark)  97210K(2029056K), 0,0011810 secs]
> >> 1,204: [GC (Allocation Failure)  638458K->102278K(2029056K), 0,0181779
> >> secs]
> >> 1,536: [GC (Allocation Failure)  645878K->103955K(2029056K), 0,0088187
> >> secs]
> >> 1,679: [GC (CMS Final Remark)  353595K(2029056K), 0,0078553 secs]
> >> 1,858: [GC (Allocation Failure)  649299K->78654K(2029056K), 0,0064803
> >> secs]
> >> 2,187: [GC (Allocation Failure)  623998K->120130K(2029056K), 0,0125856
> >> secs]
> >> 2,200: [GC (CMS Initial Mark)  121879K(2029056K), 0,0003934 secs]
> >> 2,518: [GC (Allocation Failure)  665474K->193813K(2029056K), 0,0202701
> >> secs]
> >>
> >>
> >>
> >>
> >>>
> >>> > - using the v2 protocol brings huge improvements in the usage of
> memory
> >>> > - if you want good "latency" for each entry you have to set at least
> >>> > throttle to 1000, but the throughput will be very lower (from 30 MB/s
> >>> vs
> >>> > 300 MB/s on my disk)
> >>> >
> >>>
> >>> What type of disk do you have? What is your filesystem mount options?
> >>>
> >>
> >> Disk type: SSD
> >> this is fstab line:
> >> /dev/mapper/Ssd-Home    /home                   ext4
> >> defaults,discard        1 2
> >>
> >> uname -a
> >> Linux DNA101PC193.diennea.lan 4.11.5-100.fc24.x86_64 #1 SMP Wed Jun 14
> >> 17:21:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >>
> >>
> >>>
> >>> When you did the test, how did GC activities look like when setting
> >>> throttle to be more than 1000?
> >>>
> >>
> >> I have added new results, see above
> >>
> >>
> >>>
> >>> I think the latency looked "bad" when throttle was more than 1000, is
> >>> because of GC. Did you have that test results?
> >>>
> >>
> >>
> >> I apologize, in the below results when I did not write 'throttle = 0" I
> >> meant throttle = 1000
> >>
> >> Did you take a look an the code of the bench ? Maybe a double check
> would
> >> be useful, many times badly written benchmarks are not useful at all
> >>
> >> Thank you
> >> -- Enrico
> >>
> >>
> >>
> >>>
> >>>
> >>> > below the results of my simple bench
> >>> >
> >>> > Enrico
> >>> >
> >>> > Bench code:
> >>> >
> >>> https://github.com/eolivelli/bookkeepers-benchs/blob/
> master/src/test/java/
> >>> > BookKeeperWriteTest.java
> >>> >
> >>> > Results:
> >>> > with 4.5.0-SNAPSHOT and v2 protocol - G1 GC
> >>> >
> >>> > #0 Total wall clock time: 1000.0 ms, total callbacks time: 8460 ms,
> >>> entry
> >>> > size 0,034 MB -> 8,46 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,2 MB/s throughput
> >>> > #1 Total wall clock time: 1005.0 ms, total callbacks time: 4529 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,53 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #2 Total wall clock time: 1002.0 ms, total callbacks time: 4293 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,29 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1003.0 ms, total callbacks time: 4317 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #4 Total wall clock time: 1002.0 ms, total callbacks time: 4553 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,55 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #5 Total wall clock time: 1001.0 ms, total callbacks time: 4448 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #6 Total wall clock time: 1004.0 ms, total callbacks time: 4449 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #7 Total wall clock time: 1003.0 ms, total callbacks time: 4504 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #8 Total wall clock time: 1002.0 ms, total callbacks time: 4290 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,29 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #9 Total wall clock time: 1003.0 ms, total callbacks time: 4441 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> >
> >>> > gc.log
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(9153868k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> >>> > 1,988: [GC pause (G1 Evacuation Pause) (young) 146M->59M(2048M),
> >>> 0,0104300
> >>> > secs]
> >>> >
> >>> >
> >>> > Results, with 4.5.0-SNAPSHOT and v3 protocol - G1 GC
> >>> > #0 Total wall clock time: 1001.0 ms, total callbacks time: 11800 ms,
> >>> entry
> >>> > size 0,034 MB -> 11,80 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #1 Total wall clock time: 1001.0 ms, total callbacks time: 4585 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,59 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #2 Total wall clock time: 1002.0 ms, total callbacks time: 4458 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1002.0 ms, total callbacks time: 4471 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #4 Total wall clock time: 1004.0 ms, total callbacks time: 4473 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #5 Total wall clock time: 1003.0 ms, total callbacks time: 4477 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,48 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #6 Total wall clock time: 1007.0 ms, total callbacks time: 4424 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,42 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 33,9 MB/s throughput
> >>> > #7 Total wall clock time: 1005.0 ms, total callbacks time: 4576 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,58 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #8 Total wall clock time: 1002.0 ms, total callbacks time: 4447 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #9 Total wall clock time: 1007.0 ms, total callbacks time: 4501 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 33,9 MB/s throughput
> >>> >
> >>> > gc.log
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(9042512k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> >>> > 0,830: [GC pause (G1 Evacuation Pause) (young) 118M->33M(2048M),
> >>> 0,0116376
> >>> > secs]
> >>> > 1,372: [GC pause (G1 Evacuation Pause) (young) 162M->73M(2048M),
> >>> 0,0055103
> >>> > secs]
> >>> > 2,172: [GC pause (G1 Evacuation Pause) (young) 227M->99M(2048M),
> >>> 0,0023962
> >>> > secs]
> >>> > 10,358: [GC pause (G1 Evacuation Pause) (young) 1620M->122M(2048M),
> >>> > 0,0026519 secs]
> >>> > 11,676: [GC pause (Metadata GC Threshold) (young) (initial-mark)
> >>> > 242M->114M(2048M), 0,0021797 secs]
> >>> > 11,678: [GC concurrent-root-region-scan-start]
> >>> > 11,679: [GC concurrent-root-region-scan-end, 0,0007421 secs]
> >>> > 11,679: [GC concurrent-mark-start]
> >>> > 11,684: [GC concurrent-mark-end, 0,0046645 secs]
> >>> > 11,684: [GC remark, 0,0027234 secs]
> >>> > 11,687: [GC cleanup 115M->111M(2048M), 0,0008957 secs]
> >>> > 11,688: [GC concurrent-cleanup-start]
> >>> > 11,688: [GC concurrent-cleanup-end, 0,0000070 secs]
> >>> >
> >>> >
> >>> > With 4.4.0 - v3 protocol - G1 GC
> >>> > #0 Total wall clock time: 1001.0 ms, total callbacks time: 6723 ms,
> >>> entry
> >>> > size 0,034 MB -> 6,72 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #1 Total wall clock time: 1003.0 ms, total callbacks time: 4495 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #2 Total wall clock time: 1001.0 ms, total callbacks time: 4406 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,41 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1005.0 ms, total callbacks time: 4457 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #4 Total wall clock time: 1002.0 ms, total callbacks time: 4388 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #5 Total wall clock time: 1004.0 ms, total callbacks time: 4451 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #6 Total wall clock time: 1003.0 ms, total callbacks time: 4574 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,57 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #7 Total wall clock time: 1003.0 ms, total callbacks time: 4523 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,52 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #8 Total wall clock time: 1004.0 ms, total callbacks time: 4450 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,45 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #9 Total wall clock time: 1002.0 ms, total callbacks time: 4542 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,54 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> >
> >>> > gc.log
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8931792k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> >>> > 0,750: [GC pause (G1 Evacuation Pause) (young) 106M->14M(2048M),
> >>> 0,0070948
> >>> > secs]
> >>> > 1,089: [GC pause (G1 Evacuation Pause) (young) 127M->21M(2048M),
> >>> 0,0038837
> >>> > secs]
> >>> > 1,608: [GC pause (G1 Evacuation Pause) (young) 193M->41M(2048M),
> >>> 0,0039903
> >>> > secs]
> >>> > 3,745: [GC pause (G1 Evacuation Pause) (young) 733M->49M(2048M),
> >>> 0,0046991
> >>> > secs]
> >>> > 7,990: [GC pause (G1 Evacuation Pause) (young) 1423M->65M(2048M),
> >>> 0,0052844
> >>> > secs]
> >>> >
> >>> >
> >>> >
> >>> > with 4.4 - v3 protocol Parallel GC
> >>> >
> >>> > #0 Total wall clock time: 999.0 ms, total callbacks time: 6007 ms,
> >>> entry
> >>> > size 0,034 MB -> 6,01 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,2 MB/s throughput
> >>> > #1 Total wall clock time: 1001.0 ms, total callbacks time: 4813 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,81 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #2 Total wall clock time: 1003.0 ms, total callbacks time: 4580 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,58 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1002.0 ms, total callbacks time: 4349 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #4 Total wall clock time: 1002.0 ms, total callbacks time: 4456 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,46 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #5 Total wall clock time: 1006.0 ms, total callbacks time: 4504 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,50 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #6 Total wall clock time: 1003.0 ms, total callbacks time: 4387 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #7 Total wall clock time: 1002.0 ms, total callbacks time: 4643 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,64 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #8 Total wall clock time: 1002.0 ms, total callbacks time: 4548 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,55 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #9 Total wall clock time: 1001.0 ms, total callbacks time: 4633 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,63 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> >
> >>> > gc.log
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8877776k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
> >>> -XX:+UseParallelGC
> >>> > 0,603: [GC (Allocation Failure)  524800K->8488K(2010112K), 0,0056746
> >>> secs]
> >>> > 1,996: [GC (Allocation Failure)  533268K->181534K(2010112K),
> 0,0248460
> >>> > secs]
> >>> > 3,475: [GC (Allocation Failure)  703337K->237612K(2010112K),
> 0,0192195
> >>> > secs]
> >>> > 4,963: [GC (Allocation Failure)  762412K->212913K(2010112K),
> 0,0085357
> >>> > secs]
> >>> > 6,473: [GC (Allocation Failure)  737684K->192597K(1951232K),
> 0,0065437
> >>> > secs]
> >>> > 8,019: [GC (Allocation Failure)  704597K->319617K(1863680K),
> 0,0189275
> >>> > secs]
> >>> > 9,008: [GC (Allocation Failure)  651905K->315463K(1853952K),
> 0,0121369
> >>> > secs]
> >>> > 9,997: [GC (Allocation Failure)  647729K->418132K(1848832K),
> 0,0222260
> >>> > secs]
> >>> > 10,742: [GC (Allocation Failure)  679764K->340097K(1878528K),
> 0,0111347
> >>> > secs]
> >>> >
> >>> >
> >>> > with 4.5 - v2 protocol Parallel GC
> >>> >
> >>> > #0 Total wall clock time: 1004.0 ms, total callbacks time: 8030 ms,
> >>> entry
> >>> > size 0,034 MB -> 8,03 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #1 Total wall clock time: 1003.0 ms, total callbacks time: 4394 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #2 Total wall clock time: 1002.0 ms, total callbacks time: 4417 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,42 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1005.0 ms, total callbacks time: 4362 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,36 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #4 Total wall clock time: 1003.0 ms, total callbacks time: 4435 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #5 Total wall clock time: 1002.0 ms, total callbacks time: 4489 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,49 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #6 Total wall clock time: 1002.0 ms, total callbacks time: 4346 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #7 Total wall clock time: 1004.0 ms, total callbacks time: 4474 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,47 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #8 Total wall clock time: 1005.0 ms, total callbacks time: 4436 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,44 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #9 Total wall clock time: 1001.0 ms, total callbacks time: 4322 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> >
> >>> > gc.log
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8951980k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
> >>> -XX:+UseParallelGC
> >>> > 0,583: [GC (Allocation Failure)  524800K->8921K(2010112K), 0,0054470
> >>> secs]
> >>> > 6,397: [GC (Allocation Failure)  533721K->178956K(2010112K),
> 0,0306493
> >>> secs
> >>> >
> >>> >
> >>> > with 4.5 - v3 protocol Parallel GC
> >>> >
> >>> > #0 Total wall clock time: 1002.0 ms, total callbacks time: 10679 ms,
> >>> entry
> >>> > size 0,034 MB -> 10,68 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #1 Total wall clock time: 1003.0 ms, total callbacks time: 4802 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,80 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #2 Total wall clock time: 1003.0 ms, total callbacks time: 4479 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,48 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #3 Total wall clock time: 1002.0 ms, total callbacks time: 4726 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,73 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #4 Total wall clock time: 1007.0 ms, total callbacks time: 4393 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,39 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 33,9 MB/s throughput
> >>> > #5 Total wall clock time: 1006.0 ms, total callbacks time: 4619 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,62 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #6 Total wall clock time: 1002.0 ms, total callbacks time: 4346 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,35 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #7 Total wall clock time: 1002.0 ms, total callbacks time: 4402 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,40 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,1 MB/s throughput
> >>> > #8 Total wall clock time: 1004.0 ms, total callbacks time: 4487 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,49 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 34,0 MB/s throughput
> >>> > #9 Total wall clock time: 1007.0 ms, total callbacks time: 4315 ms,
> >>> entry
> >>> > size 0,034 MB -> 4,32 ms per entry (latency),1,0 ms per entry
> >>> (throughput)
> >>> > 33,9 MB/s throughput
> >>> >
> >>> > gc.log
> >>> >
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8921388k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
> >>> -XX:+UseParallelGC
> >>> > 0,563: [GC (Allocation Failure)  524800K->8935K(2010112K), 0,0055378
> >>> secs]
> >>> > 1,936: [GC (Allocation Failure)  533735K->187494K(2010112K),
> 0,0281195
> >>> > secs]
> >>> > 3,917: [GC (Allocation Failure)  712294K->275491K(2010112K),
> 0,0298168
> >>> > secs]
> >>> > 6,385: [GC (Allocation Failure)  798141K->394430K(2010112K),
> 0,0279730
> >>> > secs]
> >>> > 8,934: [GC (Allocation Failure)  919230K->394530K(2008576K),
> 0,0127794
> >>> > secs]
> >>> > 11,634: [GC (Metadata GC Threshold)  811154K->348997K(1868288K),
> >>> 0,0080161
> >>> > secs]
> >>> > 11,642: [Full GC (Metadata GC Threshold)  348997K->39874K(1868288K),
> >>> > 0,0247462 secs]
> >>> >
> >>> >
> >>> > 4.5 v3 protocol parallel GC throttle = 0
> >>> >
> >>> > #0 Total wall clock time: 334.0 ms, total callbacks time: 267444 ms,
> >>> entry
> >>> > size 0,034 MB -> 267,44 ms per entry (latency),0,3 ms per entry
> >>> > (throughput) 102,3 MB/s throughput
> >>> > #1 Total wall clock time: 189.0 ms, total callbacks time: 154428 ms,
> >>> entry
> >>> > size 0,034 MB -> 154,43 ms per entry (latency),0,2 ms per entry
> >>> > (throughput) 180,8 MB/s throughput
> >>> > #2 Total wall clock time: 136.0 ms, total callbacks time: 103012 ms,
> >>> entry
> >>> > size 0,034 MB -> 103,01 ms per entry (latency),0,1 ms per entry
> >>> > (throughput) 251,3 MB/s throughput
> >>> > #3 Total wall clock time: 156.0 ms, total callbacks time: 113315 ms,
> >>> entry
> >>> > size 0,034 MB -> 113,32 ms per entry (latency),0,2 ms per entry
> >>> > (throughput) 219,1 MB/s throughput
> >>> > #4 Total wall clock time: 110.0 ms, total callbacks time: 79296 ms,
> >>> entry
> >>> > size 0,034 MB -> 79,30 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 310,7 MB/s throughput
> >>> > #5 Total wall clock time: 104.0 ms, total callbacks time: 71213 ms,
> >>> entry
> >>> > size 0,034 MB -> 71,21 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 328,7 MB/s throughput
> >>> > #6 Total wall clock time: 161.0 ms, total callbacks time: 72447 ms,
> >>> entry
> >>> > size 0,034 MB -> 72,45 ms per entry (latency),0,2 ms per entry
> >>> (throughput)
> >>> > 212,3 MB/s throughput
> >>> > #7 Total wall clock time: 103.0 ms, total callbacks time: 82783 ms,
> >>> entry
> >>> > size 0,034 MB -> 82,78 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 331,8 MB/s throughput
> >>> > #8 Total wall clock time: 138.0 ms, total callbacks time: 106086 ms,
> >>> entry
> >>> > size 0,034 MB -> 106,09 ms per entry (latency),0,1 ms per entry
> >>> > (throughput) 247,7 MB/s throughput
> >>> > #9 Total wall clock time: 118.0 ms, total callbacks time: 77750 ms,
> >>> entry
> >>> > size 0,034 MB -> 77,75 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 289,7 MB/s throughput
> >>> >
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8858832k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
> >>> -XX:+UseParallelGC
> >>> > 0,660: [GC (Allocation Failure)  524800K->8955K(2010112K), 0,0061442
> >>> secs]
> >>> > 1,288: [GC (Allocation Failure)  533726K->191538K(2010112K),
> 0,0329663
> >>> > secs]
> >>> > 1,634: [GC (Allocation Failure)  716338K->349468K(2010112K),
> 0,0508922
> >>> > secs]
> >>> > 1,942: [GC (Allocation Failure)  874268K->671717K(2010112K),
> 0,0609749
> >>> > secs]
> >>> > 2,266: [GC (Allocation Failure)  1196517K->781335K(2010112K),
> 0,0320305
> >>> > secs]
> >>> >
> >>> > 4.5 v2protocol throttle = 0 G1 GC
> >>> >
> >>> > #0 Total wall clock time: 200.0 ms, total callbacks time: 173682 ms,
> >>> entry
> >>> > size 0,034 MB -> 173,68 ms per entry (latency),0,2 ms per entry
> >>> > (throughput) 170,9 MB/s throughput
> >>> > #1 Total wall clock time: 128.0 ms, total callbacks time: 89987 ms,
> >>> entry
> >>> > size 0,034 MB -> 89,99 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 267,0 MB/s throughput
> >>> > #2 Total wall clock time: 117.0 ms, total callbacks time: 86411 ms,
> >>> entry
> >>> > size 0,034 MB -> 86,41 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 292,1 MB/s throughput
> >>> > #3 Total wall clock time: 95.0 ms, total callbacks time: 64054 ms,
> >>> entry
> >>> > size 0,034 MB -> 64,05 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 359,8 MB/s throughput
> >>> > #4 Total wall clock time: 92.0 ms, total callbacks time: 60863 ms,
> >>> entry
> >>> > size 0,034 MB -> 60,86 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 371,5 MB/s throughput
> >>> > #5 Total wall clock time: 92.0 ms, total callbacks time: 62026 ms,
> >>> entry
> >>> > size 0,034 MB -> 62,03 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 371,5 MB/s throughput
> >>> > #6 Total wall clock time: 93.0 ms, total callbacks time: 65033 ms,
> >>> entry
> >>> > size 0,034 MB -> 65,03 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 367,5 MB/s throughput
> >>> > #7 Total wall clock time: 88.0 ms, total callbacks time: 59851 ms,
> >>> entry
> >>> > size 0,034 MB -> 59,85 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 388,4 MB/s throughput
> >>> > #8 Total wall clock time: 93.0 ms, total callbacks time: 61816 ms,
> >>> entry
> >>> > size 0,034 MB -> 61,82 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 367,5 MB/s throughput
> >>> > #9 Total wall clock time: 93.0 ms, total callbacks time: 70397 ms,
> >>> entry
> >>> > size 0,034 MB -> 70,40 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 367,5 MB/s throughput
> >>> >
> >>> >
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8894828k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> >>> > 1,078: [GC pause (G1 Evacuation Pause) (young) 150M->65M(2048M),
> >>> 0,0138100
> >>> > secs]
> >>> >
> >>> > bk 4.5 v3 protocol G1 GC throttle = 0
> >>> >
> >>> > #0 Total wall clock time: 275.0 ms, total callbacks time: 238043 ms,
> >>> entry
> >>> > size 0,034 MB -> 238,04 ms per entry (latency),0,3 ms per entry
> >>> > (throughput) 124,3 MB/s throughput
> >>> > #1 Total wall clock time: 148.0 ms, total callbacks time: 106877 ms,
> >>> entry
> >>> > size 0,034 MB -> 106,88 ms per entry (latency),0,1 ms per entry
> >>> > (throughput) 230,9 MB/s throughput
> >>> > #2 Total wall clock time: 130.0 ms, total callbacks time: 96448 ms,
> >>> entry
> >>> > size 0,034 MB -> 96,45 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 262,9 MB/s throughput
> >>> > #3 Total wall clock time: 118.0 ms, total callbacks time: 78246 ms,
> >>> entry
> >>> > size 0,034 MB -> 78,25 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 289,7 MB/s throughput
> >>> > #4 Total wall clock time: 109.0 ms, total callbacks time: 79110 ms,
> >>> entry
> >>> > size 0,034 MB -> 79,11 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 313,6 MB/s throughput
> >>> > #5 Total wall clock time: 124.0 ms, total callbacks time: 86652 ms,
> >>> entry
> >>> > size 0,034 MB -> 86,65 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 275,6 MB/s throughput
> >>> > #6 Total wall clock time: 110.0 ms, total callbacks time: 76729 ms,
> >>> entry
> >>> > size 0,034 MB -> 76,73 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 310,7 MB/s throughput
> >>> > #7 Total wall clock time: 110.0 ms, total callbacks time: 80104 ms,
> >>> entry
> >>> > size 0,034 MB -> 80,10 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 310,7 MB/s throughput
> >>> > #8 Total wall clock time: 104.0 ms, total callbacks time: 73514 ms,
> >>> entry
> >>> > size 0,034 MB -> 73,51 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 328,7 MB/s throughput
> >>> > #9 Total wall clock time: 102.0 ms, total callbacks time: 74487 ms,
> >>> entry
> >>> > size 0,034 MB -> 74,49 ms per entry (latency),0,1 ms per entry
> >>> (throughput)
> >>> > 335,1 MB/s throughput
> >>> >
> >>> > Java HotSpot(TM) 64-Bit Server VM (25.121-b13) for linux-amd64 JRE
> >>> > (1.8.0_121-b13), built on Dec 12 2016 16:36:53 by "java_re" with gcc
> >>> 4.3.0
> >>> > 20080428 (Red Hat 4.3.0-8)
> >>> > Memory: 4k page, physical 32870636k(8958604k free), swap
> >>> > 15622140k(15464700k free)
> >>> > CommandLine flags: -XX:InitialHeapSize=2147483648 <(214)%20748-3648>
> >>> > -XX:MaxHeapSize=2147483648 <(214)%20748-3648> -XX:+PrintGC
> >>> -XX:+PrintGCTimeStamps
> >>> > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
> >>> > 0,876: [GC pause (G1 Evacuation Pause) (young) 102M->19M(2048M),
> >>> 0,0107143
> >>> > secs]
> >>> > 0,989: [GC pause (G1 Evacuation Pause) (young) 152M->81M(2048M),
> >>> 0,0074561
> >>> > secs]
> >>> > 1,125: [GC pause (G1 Evacuation Pause) (young) 198M->108M(2048M),
> >>> 0,0038296
> >>> > secs]
> >>> > 1,286: [GC pause (G1 Evacuation Pause) (young) 306M->146M(2048M),
> >>> 0,0034297
> >>> > secs]
> >>> >
> >>> >
> >>> >
> >>> > 2017-07-12 3:07 GMT+02:00 Sijie Guo <[email protected]>:
> >>> >
> >>> > > On Wed, Jul 12, 2017 at 2:04 AM, Venkateswara Rao Jujjuri <
> >>> > > [email protected]
> >>> > > > wrote:
> >>> > >
> >>> > > > Enrico, let me try to paraphrase the issue.
> >>> > > >
> >>> > > > - With G1GC + Netty 4.1 is giving you roughly same perf as prev
> >>> > release.
> >>> > > Is
> >>> > > > that accurate statement?
> >>> > > > But you are still seeing latency spikes with Netty 4.1??
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > > I did not fully
> >>> > > > understand your sleep usecase.
> >>> > > > How sleep is yielding better latency?
> >>> > > >
> >>> > >
> >>> > > I think the "sleep(1)" in Enrico's benchmark is for throttling. It
> is
> >>> > same
> >>> > > as setting 'throttle' value to 1000 in his test.
> >>> > >
> >>> > > - Sijie
> >>> > >
> >>> > >
> >>> > >
> >>> > > >
> >>> > > > Thanks,
> >>> > > > JV
> >>> > > >
> >>> > > > On Tue, Jul 11, 2017 at 8:27 AM, Enrico Olivelli <
> >>> [email protected]>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > Another interesting thing...during my profiling activity I
> gave a
> >>> > > chance
> >>> > > > to
> >>> > > > > the old v2 protocol and activated the gc logs, as expected the
> >>> result
> >>> > > is
> >>> > > > > that with v2 protocol there is almost no GC activity during the
> >>> > > benchmark
> >>> > > > > -- Enrico
> >>> > > > >
> >>> > > > > 2017-07-11 12:07 GMT+02:00 Enrico Olivelli <
> [email protected]
> >>> >:
> >>> > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > 2017-07-11 11:04 GMT+02:00 Sijie Guo <[email protected]>:
> >>> > > > > >
> >>> > > > > >> I think Netty4 requires more offheap memory. you might need
> to
> >>> > tune
> >>> > > > the
> >>> > > > > >> JVM
> >>> > > > > >> settings. I doubt that latency diff coming from the JVM gc.
> >>> > > > > >>
> >>> > > > > >> A simple thing to verify that is to dump the gc log by
> adding
> >>> "
> >>> > > > > -Xloggc:"
> >>> > > > > >> setting and compare the gc logs between versions.
> >>> > > > > >>
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Enabling G1 GC (-XX:+UseG1GC) apparently solves the
> difference
> >>> > > between
> >>> > > > > 4.4
> >>> > > > > > and 4.5.
> >>> > > > > >
> >>> > > > > > And I a loop the bench (in order to have a warm JVM) numbers
> >>> become
> >>> > > > more
> >>> > > > > > stable and similar to each other
> >>> > > > > >
> >>> > > > > > I have the "other issue" pending, the fact that latency (time
> >>> for
> >>> > > > > > asynchAddEntry to complete) is really bad and unpredictable
> >>> (from
> >>> > 100
> >>> > > > ms
> >>> > > > > to
> >>> > > > > > 2000ms)
> >>> > > > > >
> >>> > > > > > and if I introduce a Thread.sleep(1) all the callbacks
> complete
> >>> > > always
> >>> > > > > > with success in < 10ms. This happens even with G1 GC.
> >>> > > > > >
> >>> > > > > > Without the "sleep(1)" my machine uses a lot of CPU (I have 8
> >>> > "CPUs")
> >>> > > > and
> >>> > > > > > with the "sleep(1)" the load is slightly lower
> >>> > > > > >
> >>> > > > > > Honestly for me this is still a problem and I hope that with
> >>> your
> >>> > > help
> >>> > > > I
> >>> > > > > > will be able to find the problem, wherever it is (in BK code
> >>> or in
> >>> > > the
> >>> > > > > way
> >>> > > > > > I am doing the bench)
> >>> > > > > >
> >>> > > > > > I will try to create a new more complete bench
> >>> > > > > >
> >>> > > > > > -- Enrico
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >>
> >>> > > > > >> - Sijie
> >>> > > > > >>
> >>> > > > > >> On Tue, Jul 11, 2017 at 12:16 AM, Enrico Olivelli <
> >>> > > > [email protected]>
> >>> > > > > >> wrote:
> >>> > > > > >>
> >>> > > > > >> > a did a bisect and the culprit (in my opinion) is the
> >>> switch to
> >>> > > > netty
> >>> > > > > 4
> >>> > > > > >> for
> >>> > > > > >> > the performance regression from 4.5 and 4.4
> >>> > > > > >> >
> >>> > > > > >> > at commit:
> >>> > > > > >> > commit 811ece53a1c975c4e768422f3d622ac9de6b3e41
> >>> > > BOOKKEEPER-1058:
> >>> > > > > >> Ignore
> >>> > > > > >> > already deleted ledger on replication audit
> >>> > > > > >> >
> >>> > > > > >> > Total time: 204 ms
> >>> > > > > >> > Total real time: 79 ms per entry
> >>> > > > > >> >
> >>> > > > > >> > at commit:
> >>> > > > > >> > commit 74f795136c1fff3badb29fc982d0cc2d43096b45
> >>> > BOOKKEEPER-1008:
> >>> > > > > Netty
> >>> > > > > >> 4.1
> >>> > > > > >> >
> >>> > > > > >> > Total time: 308 ms
> >>> > > > > >> > Total real time: 189 ms per entry
> >>> > > > > >> >
> >>> > > > > >> > I have tried with epoll and with local transport, results
> >>> does
> >>> > not
> >>> > > > > >> change.
> >>> > > > > >> > I tried to upgrade to netty 4.1.13 too, but no change
> >>> > > > > >> >
> >>> > > > > >> > Could it be  the memory allocator of netty which is
> >>> overwhelmed
> >>> > > with
> >>> > > > > >> sudden
> >>> > > > > >> > bursts of allocation ?
> >>> > > > > >> > I did some trial with UnpooledByteBufAllocator.DEFAULT
> and
> >>> it
> >>> > > > helps a
> >>> > > > > >> > little, we get to "110 ms per entry" vs "189 ms per entry"
> >>> > > > > >> >
> >>> > > > > >> > the bench is here:
> >>> > > > > >> >
> >>> https://github.com/eolivelli/bookkeepers-benchs/blob/master/
> >>> > > > > >> src/test/java/
> >>> > > > > >> > BookKeeperWriteTest.java
> >>> > > > > >> >
> >>> > > > > >> >
> >>> > > > > >> > -- Enrico
> >>> > > > > >> >
> >>> > > > > >> >
> >>> > > > > >> >
> >>> > > > > >> > 2017-07-10 19:46 GMT+02:00 Enrico Olivelli <
> >>> [email protected]
> >>> > >:
> >>> > > > > >> >
> >>> > > > > >> > >
> >>> > > > > >> > >
> >>> > > > > >> > > Il lun 10 lug 2017, 18:21 Venkateswara Rao Jujjuri <
> >>> > > > > [email protected]
> >>> > > > > >> >
> >>> > > > > >> > ha
> >>> > > > > >> > > scritto:
> >>> > > > > >> > >
> >>> > > > > >> > >> With Netty changes, lack of native epoll() has huge
> perf
> >>> > impact
> >>> > > > as
> >>> > > > > >> per
> >>> > > > > >> > >> Kishore.
> >>> > > > > >> > >> Are you sure you are using epoll()?
> >>> > > > > >> > >>
> >>> > > > > >> > >
> >>> > > > > >> > > Yes. I tried with netty local transport too. It seems
> not
> >>> > > related
> >>> > > > to
> >>> > > > > >> > netty
> >>> > > > > >> > > to me.
> >>> > > > > >> > > I will double check, tomorrow
> >>> > > > > >> > > Enrico
> >>> > > > > >> > >
> >>> > > > > >> > >
> >>> > > > > >> > >> On Mon, Jul 10, 2017 at 1:49 AM, Enrico Olivelli <
> >>> > > > > >> [email protected]>
> >>> > > > > >> > >> wrote:
> >>> > > > > >> > >>
> >>> > > > > >> > >> > 2017-07-10 10:40 GMT+02:00 Sijie Guo <
> >>> [email protected]>:
> >>> > > > > >> > >> >
> >>> > > > > >> > >> > > Also one other thing to check is the JVM settings.
> >>> Do you
> >>> > > > mind
> >>> > > > > >> > sharing
> >>> > > > > >> > >> > that
> >>> > > > > >> > >> > > as well?
> >>> > > > > >> > >> > >
> >>> > > > > >> > >> > >
> >>> > > > > >> > >> > this is the surefire config, I am using oracle jdk 8
> >>> > > > > >> > >> >
> >>> > > > > >> > >> >              <plugin>
> >>> > > > > >> > >> >                 <artifactId>maven-surefire-
> >>> > > plugin</artifactId>
> >>> > > > > >> > >> >                 <version>2.20</version>
> >>> > > > > >> > >> >                 <configuration>
> >>> > > > > >> > >> >                     <forkCount>1</forkCount>
> >>> > > > > >> > >> >                     <reuseForks>false</reuseForks>
> >>> > > > > >> > >> >
> >>> > > > > >> > >> > <forkedProcessTimeoutInSeconds>300</
> >>> > > > > forkedProcessTimeoutInSeconds>
> >>> > > > > >> > >> >                     <argLine>-Xmx2G
> >>> > > > > >> > >> > -Djava.io.tmpdir=${basedir}/target</argLine>
> >>> > > > > >> > >> >                 </configuration>
> >>> > > > > >> > >> >             </plugin>
> >>> > > > > >> > >> >
> >>> > > > > >> > >> > -- Enrico
> >>> > > > > >> > >> >
> >>> > > > > >> > >> >
> >>> > > > > >> > >> >
> >>> > > > > >> > >> > > Sijie
> >>> > > > > >> > >> > >
> >>> > > > > >> > >> > > On Jul 10, 2017 1:17 AM, "Sijie Guo" <
> >>> [email protected]
> >>> > >
> >>> > > > > wrote:
> >>> > > > > >> > >> > >
> >>> > > > > >> > >> > > > I am not sure if there is any default values
> >>> changed
> >>> > for
> >>> > > > > >> journal
> >>> > > > > >> > >> > > settings.
> >>> > > > > >> > >> > > > I would suggest you testing by setting
> >>> specifically the
> >>> > > > > journal
> >>> > > > > >> > >> > settings.
> >>> > > > > >> > >> > > >
> >>> > > > > >> > >> > > > Also if you can share your benchmark, that would
> be
> >>> > good
> >>> > > > for
> >>> > > > > >> other
> >>> > > > > >> > >> > people
> >>> > > > > >> > >> > > > to verify.
> >>> > > > > >> > >> > > >
> >>> > > > > >> > >> > > > Sijie
> >>> > > > > >> > >> > > >
> >>> > > > > >> > >> > > > On Jul 10, 2017 12:32 AM, "Enrico Olivelli" <
> >>> > > > > >> [email protected]>
> >>> > > > > >> > >> > wrote:
> >>> > > > > >> > >> > > >
> >>> > > > > >> > >> > > >> Hi,
> >>> > > > > >> > >> > > >> I am doing some benchmarks on BK, I see that
> from
> >>> > 4.4.0
> >>> > > to
> >>> > > > > >> 4.5.0
> >>> > > > > >> > >> there
> >>> > > > > >> > >> > > is
> >>> > > > > >> > >> > > >> something "slow" but I cannot understand what. I
> >>> > really
> >>> > > > hope
> >>> > > > > >> that
> >>> > > > > >> > >> I am
> >>> > > > > >> > >> > > >> wrong.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> I am working with writes, I will pass to reads
> >>> once
> >>> > > writes
> >>> > > > > >> will
> >>> > > > > >> > be
> >>> > > > > >> > >> ok.
> >>> > > > > >> > >> > > >> My problem is both on latency (time for
> >>> AddComplete
> >>> > > > callback
> >>> > > > > >> to
> >>> > > > > >> > >> > > complete)
> >>> > > > > >> > >> > > >> and on overall throuput.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Actually I have two distinct problems, but
> >>> working on
> >>> > > the
> >>> > > > > >> first
> >>> > > > > >> > >> > problem
> >>> > > > > >> > >> > > I
> >>> > > > > >> > >> > > >> found a performance regression.
> >>> > > > > >> > >> > > >> I know that talking about "slow" things it is an
> >>> hard
> >>> > > > > matter,
> >>> > > > > >> so
> >>> > > > > >> > I
> >>> > > > > >> > >> > will
> >>> > > > > >> > >> > > >> try
> >>> > > > > >> > >> > > >> do describe as much as possible all the aspects
> >>> that I
> >>> > > > think
> >>> > > > > >> are
> >>> > > > > >> > >> > > relevant.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> First problem: under certain load performance
> >>> > > > > >> > (latency+throughput)
> >>> > > > > >> > >> > > degrade
> >>> > > > > >> > >> > > >> too much
> >>> > > > > >> > >> > > >> Second problem: the first problem is more
> evident
> >>> in
> >>> > > 4.5.0
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Let's describe my testcase and why I am worried.
> >>> > > > > >> > >> > > >> The bench issues a batch of asyncAddEntry and
> >>> prints
> >>> > the
> >>> > > > > >> average
> >>> > > > > >> > >> time
> >>> > > > > >> > >> > > for
> >>> > > > > >> > >> > > >> AddComplete to complete and the overall clock
> >>> time.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> This is the code
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> private static final byte[] TEST_DATA = new
> >>> byte[35 *
> >>> > > > 1024];
> >>> > > > > >> > >> > > >> private static final int testsize = 1000;
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> ...... (start 1 bookie, see below)
> >>> > > > > >> > >> > > >>             ClientConfiguration
> >>> clientConfiguration =
> >>> > > new
> >>> > > > > >> > >> > > >> ClientConfiguration();
> >>> > > > > >> > >> > > >>             clientConfiguration.setZkServ
> >>> > > > > >> ers(env.getAddress());
> >>> > > > > >> > >> > > >>             try (BookKeeper bk = new BookKeeper(
> >>> > > > > >> > >> clientConfiguration);
> >>> > > > > >> > >> > > >>                 LedgerHandle lh =
> >>> bk.createLedger(1,
> >>> > 1,
> >>> > > 1,
> >>> > > > > >> > >> > > >> BookKeeper.DigestType.CRC32, new byte[0])) {
> >>> > > > > >> > >> > > >>                 LongAdder totalTime = new
> >>> LongAdder();
> >>> > > > > >> > >> > > >>                 long _start =
> >>> > > System.currentTimeMillis();
> >>> > > > > >> > >> > > >>                 Collection<CompletableFuture>
> >>> batch =
> >>> > > new
> >>> > > > > >> > >> > > >> ConcurrentLinkedQueue<>();
> >>> > > > > >> > >> > > >>                 for (int i = 0; i < testsize;
> >>> i++) {
> >>> > > > > >> > >> > > >>                     CompletableFuture cf = new
> >>> > > > > >> > CompletableFuture();
> >>> > > > > >> > >> > > >>                     batch.add(cf);
> >>> > > > > >> > >> > > >>                     lh.asyncAddEntry(TEST_DATA,
> >>> new
> >>> > > > > >> > >> > > >> AsyncCallback.AddCallback() {
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >>                         long start =
> >>> > > > > >> System.currentTimeMillis();
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >>                         @Override
> >>> > > > > >> > >> > > >>                         public void
> >>> addComplete(int
> >>> > rc,
> >>> > > > > >> > >> LedgerHandle
> >>> > > > > >> > >> > lh,
> >>> > > > > >> > >> > > >> long entryId, Object ctx) {
> >>> > > > > >> > >> > > >>                             long now =
> >>> > > > > >> > >> > > >> System.currentTimeMillis();
> >>> > > > > >> > >> > > >>                             CompletableFuture
> _cf
> >>> =
> >>> > > > > >> > >> > (CompletableFuture)
> >>> > > > > >> > >> > > >> ctx;
> >>> > > > > >> > >> > > >>                             if (rc ==
> >>> > > > BKException.Code.OK) {
> >>> > > > > >> > >> > > >>
>  _cf.complete("");
> >>> > > > > >> > >> > > >>                             } else {
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> _cf.completeExceptionally(
> BKException.create(rc));
> >>> > > > > >> > >> > > >>                             }
> >>> > > > > >> > >> > > >>                             totalTime.add(now -
> >>> > start);
> >>> > > > > >> > >> > > >>                         }
> >>> > > > > >> > >> > > >>                     }, cf);
> >>> > > > > >> > >> > > >> //                    Thread.sleep(1);      //
> >>> this is
> >>> > > the
> >>> > > > > >> > >> tirgger!!!
> >>> > > > > >> > >> > > >>                 }
> >>> > > > > >> > >> > > >>                 assertEquals(testsize,
> >>> batch.size());
> >>> > > > > >> > >> > > >>                 for (CompletableFuture f :
> batch)
> >>> {
> >>> > > > > >> > >> > > >>                     f.get();
> >>> > > > > >> > >> > > >>                 }
> >>> > > > > >> > >> > > >>                 long _stop =
> >>> > System.currentTimeMillis();
> >>> > > > > >> > >> > > >>                 long delta = _stop - _start;
> >>> > > > > >> > >> > > >>                 System.out.println("Total time:
> "
> >>> +
> >>> > > delta
> >>> > > > +
> >>> > > > > "
> >>> > > > > >> > ms");
> >>> > > > > >> > >> > > >>                 System.out.println("Total real
> >>> time:
> >>> > " +
> >>> > > > > >> > >> > > totalTime.sum() +
> >>> > > > > >> > >> > > >> " ms -> "+(totalTime.sum()/testsize)+" ms per
> >>> entry");
> >>> > > > > >> > >> > > >>             }
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Bookie config:
> >>> > > > > >> > >> > > >>         ServerConfiguration conf = new
> >>> > > > > ServerConfiguration();
> >>> > > > > >> > >> > > >>         conf.setBookiePort(5621);
> >>> > > > > >> > >> > > >>         conf.setUseHostNameAsBookieID(true);
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >>         Path targetDir =
> >>> path.resolve("bookie_data");
> >>> > > > > >> > >> > > >>         conf.setZkServers("localhost:1282");
> >>> > > > > >> > >> > > >>         conf.setLedgerDirNames(new
> >>> > > > > >> > >> > > >> String[]{targetDir.
> toAbsolutePath().toString()});
> >>> > > > > >> > >> > > >>         conf.setJournalDirName(
> >>> > > > targetDir.toAbsolutePath().
> >>> > > > > >> > >> > toString());
> >>> > > > > >> > >> > > >>         conf.setFlushInterval(1000);
> >>> > > > > >> > >> > > >>         conf.setJournalFlushWhenQueueEmpty(
> true);
> >>> > > > > >> > >> > > >>
> >>>  conf.setProperty("journalMaxGroupWaitMSec",
> >>> > 0);
> >>> > > > > >> > >> > > >>         conf.setProperty("
> >>> > > journalBufferedWritesThreshold
> >>> > > > ",
> >>> > > > > >> > 1024);
> >>> > > > > >> > >> > > >>         conf.setAutoRecoveryDaemonEnabled(
> false);
> >>> > > > > >> > >> > > >>         conf.setEnableLocalTransport(true);
> >>> > > > > >> > >> > > >>         conf.setAllowLoopback(true);
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> The tests starts one ZK server + 1 Bookie + the
> >>> > testcase
> >>> > > > in
> >>> > > > > a
> >>> > > > > >> > JUnit
> >>> > > > > >> > >> > test
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Results:
> >>> > > > > >> > >> > > >> A - BK-4.4.0:
> >>> > > > > >> > >> > > >> Total time: 209 ms
> >>> > > > > >> > >> > > >> Total real time: 194337 ms -> 194 ms per entry
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> B - BK-4.5.0-SNAPSHOT:
> >>> > > > > >> > >> > > >> Total time: 269 ms
> >>> > > > > >> > >> > > >> Total real time: 239918 ms -> 239 ms per entry
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> C - BK-4.4,0 with sleep(1):
> >>> > > > > >> > >> > > >> Total time: 1113 ms (1000 ms sleep time)
> >>> > > > > >> > >> > > >> Total real time: 4238 ms  -> 4 ms per entry
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> D - BK-4.5,0-SNAPSHOT with sleep(1):
> >>> > > > > >> > >> > > >> Total time: 1121 ms (1000 ms sleep time)
> >>> > > > > >> > >> > > >> Total real time: 8018 ms -> 8 ms per entry
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Problem 1 (unexpected performance degradation):
> >>> > > > > >> > >> > > >> Times per entry (latency) are incredibly slow in
> >>> > cases A
> >>> > > > and
> >>> > > > > >> B.
> >>> > > > > >> > >> > > >> If I add a sleep(1) between one call of
> >>> asyncAddEntry
> >>> > > and
> >>> > > > > the
> >>> > > > > >> > next
> >>> > > > > >> > >> > > >> "latency" is around 4 ms per entry.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Problem 2: worse performance on 4.5.0
> >>> > > > > >> > >> > > >> Compare A vs B and C vs D, it is
> self-explaining.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> I am running the test on my laptop, with linux
> >>> 64bit
> >>> > > > > >> (Fedora), 12
> >>> > > > > >> > >> GB
> >>> > > > > >> > >> > > RAM,
> >>> > > > > >> > >> > > >> no swap, on an SSD disk. The results are similar
> >>> on
> >>> > > other
> >>> > > > > >> > >> computers.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> It seems that if I issue too many addEntry the
> >>> systems
> >>> > > > slows
> >>> > > > > >> > down.
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Please note this fact:
> >>> > > > > >> > >> > > >> numbers for case A and B (without sleep) mean
> >>> that all
> >>> > > the
> >>> > > > > >> adds
> >>> > > > > >> > got
> >>> > > > > >> > >> > > >> completed almost together
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> for the 4.5 vs 4.4 case:
> >>> > > > > >> > >> > > >> I tried to disable all of the threadpool
> >>> enhancements
> >>> > > > > >> (different
> >>> > > > > >> > >> > > >> read/write
> >>> > > > > >> > >> > > >> pools)....it makes not difference
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Questions:
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Is the "grouping" logic of the journal ?
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Is there a way of making a burst of 1000 async
> >>> writes
> >>> > on
> >>> > > > the
> >>> > > > > >> same
> >>> > > > > >> > >> > ledger
> >>> > > > > >> > >> > > >> perform <10 ms latency ?  (in my real case I
> have
> >>> > bursts
> >>> > > > of
> >>> > > > > >> > >> concurrent
> >>> > > > > >> > >> > > >> writes from different threads)
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Why 4.5.0 is anyway slower ?
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> Thanks
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >> -- Enrico
> >>> > > > > >> > >> > > >>
> >>> > > > > >> > >> > > >
> >>> > > > > >> > >> > >
> >>> > > > > >> > >> >
> >>> > > > > >> > >>
> >>> > > > > >> > >>
> >>> > > > > >> > >>
> >>> > > > > >> > >> --
> >>> > > > > >> > >> Jvrao
> >>> > > > > >> > >> ---
> >>> > > > > >> > >> First they ignore you, then they laugh at you, then
> they
> >>> > fight
> >>> > > > you,
> >>> > > > > >> then
> >>> > > > > >> > >> you win. - Mahatma Gandhi
> >>> > > > > >> > >>
> >>> > > > > >> > > --
> >>> > > > > >> > >
> >>> > > > > >> > >
> >>> > > > > >> > > -- Enrico Olivelli
> >>> > > > > >> > >
> >>> > > > > >> >
> >>> > > > > >>
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > Jvrao
> >>> > > > ---
> >>> > > > First they ignore you, then they laugh at you, then they fight
> you,
> >>> > then
> >>> > > > you win. - Mahatma Gandhi
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> > --
>
>
> -- Enrico Olivelli
>

Reply via email to