Hi all,

I'm still doing a little more investigation before opening up a formal
bip PR, but getting close.  Here are some more findings.

After moving the compression from main.cpp to streams.h (CDataStream) it
was a simple matter to add compression to transactions as well. Results
as follows:

range = block size range
ubytes = average size of uncompressed transactions
cbytes = average size of compressed transactions
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range   ubytes  cbytes  cmp_ratio%      datapoints
0-250b  220     227     -3.16   23780
250-500b        356     354     0.68    20882
500-600         534     505     5.29    2772
600-700         653     608     6.95    1853
700-800         757     649     14.22   578
800-900         822     758     7.77    661
900-1KB         954     862     9.69    906
1KB-10KB        2698    2222    17.64   3370
10KB-100KB      15463   12092   21.8    15429


A couple of obvious observations.  Transactions don't compress well
below 500 bytes but do very well beyond 1KB where there are a great deal
of those large spam type transactions.   However, most transactions
happen to be in the < 500 byte range.  So the next step was to appy
bundling, or the creating of a "blob" for those smaller transactions, if
and only if there are multiple tx's in the getdata receive queue for a
peer.  Doing that yields some very good compression ratios.  Some
examples as follows:

The best one I've seen so far was the following where 175 transactions
were bundled into one blob before being compressed.  That yielded a 20%
compression ratio, but that doesn't take into account the savings from
the unneeded 174 message headers (24 bytes each) as well as 174 TCP
ACK's of 52 bytes each which yields and additional 76*174=13224 bytes,
making the overall bandwidth savings 32%, in this particular case.

*2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426 txcount:175*

To be sure, this was an extreme example.  Most transaction blobs were in
the 2 to 10 transaction range.  Such as the following:

*2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876 txcount:10*

But even here the savings are 10%, far better than the "nothing" we
would get without bundling, but add to that the 76 byte * 9 transaction
savings and we have a total 20% savings in bandwidth for transactions
that otherwise would not be compressible.

The same bundling was applied to blocks and very good compression ratios
are seen when sync'ing the blockchain.

Overall the bundling or blobbing of tx's and blocks seems to be a good
idea for improving bandwith use but also there is a scalability factor
here, when the system is busy, transactions are bundled more often,
compressed, sent faster, keeping message queue and network chatter to a
minimum.

I think I have enough information to put together a formal BIP with the
exception of which compression library to implement.  These tests were
done using ZLib but I'll also be running tests in the coming days with
LZO (Jeff Garzik's suggestion) and perhaps Snappy.  If there are any
other libraries that people would like me to get results for please let
me know and I'll pick maybe the top 2 or 3 and get results back to the
group.



On 13/11/2015 1:58 PM, Peter Tschipper wrote:
> Some further Block Compression tests results that compare performance
> when network latency is added to the mix.
>
> Running two nodes, windows 7, compressionlevel=6, syncing the first
> 200000 blocks from one node to another.  Running on a highspeed
> wireless LAN with no connections to the outside world.
> Network latency was added by using Netbalancer to induce the 30ms and
> 60ms latencies.
>
> From the data not only are bandwidth savings seen but also a small
> performance savings as well.  However, the overall the value in
> compressing blocks appears to be in terms of saving bandwidth.  
>
> I was also surprised to see that there was no real difference in
> performance when no latency was present; apparently the time it takes
> to compress is about equal to the performance savings in such a situation.
>
>
> The following results compare the tests in terms of how long it takes
> to sync the blockchain, compressed vs uncompressed and with varying
> latencies.
> uncmp = uncompressed
> cmp = compressed
>
> num blocks sync'd     uncmp (secs)    cmp (secs)      uncmp 30ms (secs)       
> cmp
> 30ms (secs)   uncmp 60ms (secs)       cmp 60ms (secs)
> 10000         264     269     265     257     274     275
> 20000         482     492     479     467     499     497
> 30000         703     717     693     676     724     724
> 40000         918     939     902     886     947     944
> 50000         1140    1157    1114    1094    1171    1167
> 60000         1362    1380    1329    1310    1400    1395
> 70000         1583    1597    1547    1526    1637    1627
> 80000         1810    1817    1767    1745    1872    1862
> 90000         2031    2036    1985    1958    2109    2098
> 100000        2257    2260    2223    2184    2385    2355
> 110000        2553    2486    2478    2422    2755    2696
> 120000        2800    2724    2849    2771    3345    3254
> 130000        3078    2994    3356    3257    4125    4006
> 140000        3442    3365    3979    3870    5032    4904
> 150000        3803    3729    4586    4464    5928    5797
> 160000        4148    4075    5168    5034    6801    6661
> 170000        4509    4479    5768    5619    7711    7557
> 180000        4947    4924    6389    6227    8653    8479
> 190000        5858    5855    7302    7107    9768    9566
> 200000        6980    6969    8469    8220    10944   10724
>
>

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Reply via email to