Are you saying that the go1.7beta compiler generates for the following loop:
for val < 2 { runLength = (runLength << 1) | int(val) srcIdx++ significantly worse code than go1.6.2? If you can isolate the problem, I'd say it's worth to open an issue about this. Il giorno sabato 18 giugno 2016 22:04:22 UTC+2, flanglet ha scritto: > > I have tested the performance of the newly released 1.7beta1 against a set > of compression algorithms I have been working on. > Here are the results for the compression of the Silesia Corpus. > The tests were performed on a desktop i7-2600 @3.40GHz, Win7, 16GB RAM > > In the encoding and decoding time columns, the first number is Go 1.6 and > the second is 1.7beta1. > > Size > Ratio Enc. (sec) Dec. (sec) > LZ4: 101,631,119 47.95% > 2.5 / 2.3 2.5 / 2.3 > Snappy: 101,285,612 47.79% 2.4 / > 2.3 2.6 / 2.4 > BWT+RANK+ZRLT+Huffman: 52,352,711 24.70% 38.1 / 32.5 > 23.0 / 22.5 > BWT+RANK+ZRLT+Range: 52,061,295 24.56% 38.5 / 32.9 > 24.6 / 23.7 > BWT+RANK+ZRLT+ANS: 52,061,115 24.56% 39.2 / 33.8 > 23.0 / 22.4 > BWT+RANK+ZRLT+FPAQ: 49,584,922 23.40% 49.0 / 41.4 > 36.4 / 34.0 > BWT+CM: 46,505,288 21.94% 91.2 / > 71.9 81.8 / 65.8 > BWT+PAQ: 46,514,028 21.95% 148.5 / > 121.1 140.1 / 117.2 > TPAQ: 42,463,928 20.04% > 335.3 / 264.2 329.9 / 262.7 > > The speed improvements are consistent and rather impressive. > Explanation of the algorithms and more performance numbers available here: > https://github.com/flanglet/kanzi/wiki/Compression-examples. > > As a data point, here are the Java results: > > Size Ratio > Enc. (sec) Dec. (sec) > LZ4: 101,631,119 47.95% > 3.8 2.5 > Snappy: 101,285,612 47.79% 3.6 > 2.5 > BWT+RANK+ZRLT+Huffman: 52,352,711 24.70% 33.4 21.9 > BWT+RANK+ZRLT+Range: 52,061,295 24.56% 33.3 23.7 > BWT+RANK+ZRLT+ANS: 52,061,115 24.56% 33.8 21.8 > BWT+RANK+ZRLT+FPAQ: 49,584,922 23.40% 37.2 28.3 > BWT+CM: 46,505,288 21.94% 53.0 > 45.9 > BWT+PAQ: 46,514,028 21.95% 93.0 > 86.5 > TPAQ: 42,463,928 20.04% 173.0 > 182.3 > > With the progress in release 1.7beta1, Go does catch up with Java for the > fast compressors (the performance > numbers of Java for LZ4 and Snappy are not that useful because a good > percentage of the time is used by the > JVM warmup). Go still lags behind for the Context Mixing based compressors > (which require several function calls > to estimate the probability of each bit). > > > > > I found one oddity when running tests on the Zero Run Length Transform: > > Go 1.6 > > ZRLT encoding [ms]: 10678 > Throughput [MB/s]: 223 > ZRLT decoding [ms]: 7577 > Throughput [MB/s]: 314 > > ZRLT encoding [ms]: 10720 > Throughput [MB/s]: 222 > ZRLT decoding [ms]: 7573 > Throughput [MB/s]: 314 > > ZRLT encoding [ms]: 10651 > Throughput [MB/s]: 223 > ZRLT decoding [ms]: 7509 > Throughput [MB/s]: 317 > > Go 1.7beta1 > > ZRLT encoding [ms]: 7049 > Throughput [MB/s]: 338 > ZRLT decoding [ms]: 11573 > Throughput [MB/s]: 206 > > ZRLT encoding [ms]: 6910 > Throughput [MB/s]: 345 > ZRLT decoding [ms]: 12040 > Throughput [MB/s]: 198 > > ZRLT encoding [ms]: 7024 > Throughput [MB/s]: 339 > ZRLT decoding [ms]: 11894 > Throughput [MB/s]: 200 > > The decoding takes much longer than the encoding now. > The culprit is a tight loop to decode the run length (val is a byte): > > for val < 2 { > runLength = (runLength << 1) | int(val) > srcIdx++ > [...] > > I changed the loop condition like this (it feels a bit kludgy): > > for val&1 == val { > runLength = (runLength << 1) | int(val) > srcIdx++ > [...] > > ZRLT encoding [ms]: 6842 > Throughput [MB/s]: 348 > ZRLT decoding [ms]: 7704 > Throughput [MB/s]: 309 > > ZRLT encoding [ms]: 6851 > Throughput [MB/s]: 347 > ZRLT decoding [ms]: 7822 > Throughput [MB/s]: 304 > > ZRLT encoding [ms]: 6823 > Throughput [MB/s]: 349 > ZRLT decoding [ms]: 7721 > Throughput [MB/s]: 308 > > > > > > > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.