[bitcoin-dev] request BIP number for: "Support for Datastream Compression"

Discussion:

Peter Tschipper via bitcoin-dev

2015-11-09 19:18:10 UTC

This is my first time through this process so please bear with me.

I opened a PR #6973 this morning for Zlib Block Compression for block
relay and at the request of @sipa this should have a BIP associated
with it. The idea is simple, to compress the datastream before
sending, initially for blocks only but it could theoretically be done
for transactions as well. Initial results show an average of 20% block
compression and taking 90 milliseconds for a full block (on a very slow
laptop) to compress. The savings will be mostly in terms of less
bandwidth used, but I would expect there to be a small performance gain
during the transmission of the blocks particularly where network latency
is higher.

I think the BIP title, if accepted should be the more generic, "Support
for Datastream Compression" rather than the PR title of "Zlib
Compression for block relay" since it could also be used for
transactions as well at a later time.

Thanks for your time...

Johnathan Corgan via bitcoin-dev

2015-11-09 20:41:17 UTC

Permalink

On Mon, Nov 9, 2015 at 11:18 AM, Peter Tschipper via bitcoin-dev <

Post by Peter Tschipper via bitcoin-dev
I opened a PR #6973 this morning for Zlib Block Compression for block
with it. The idea is simple, to compress the datastream before
sending, initially for blocks only but it could theoretically be done
for transactions as well. Initial results show an average of 20% block
compression and taking 90 milliseconds for a full block (on a very slow
laptop) to compress. The savings will be mostly in terms of less
bandwidth used, but I would expect there to be a small performance gain
during the transmission of the blocks particularly where network latency
is higher.

âThe trade-off decisions among bandwidth savings, CPU performance, and
latency are local, and I think it shouldn't be assumed that any particular
node will want to support it. I recommend that if P2P message compression
is implemented, it should be negotiated via the services field at
connection time.

--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

Bob McElrath via bitcoin-dev

2015-11-09 21:04:49 UTC

Permalink

I would expect that since a block contains mostly hashes and crypto signatures,
it would be almost totally incompressible. I just calculated compression ratios:

zlib -15% (file is LARGER)
gzip 28%
bzip2 25%

So zlib compression is right out. How much is ~25% bandwidth savings worth to
people? This seems not worth it to me. :-/

Post by Peter Tschipper via bitcoin-dev
This is my first time through this process so please bear with me.
I opened a PR #6973 this morning for Zlib Block Compression for block
with it. The idea is simple, to compress the datastream before
sending, initially for blocks only but it could theoretically be done
for transactions as well. Initial results show an average of 20% block
compression and taking 90 milliseconds for a full block (on a very slow
laptop) to compress. The savings will be mostly in terms of less
bandwidth used, but I would expect there to be a small performance gain
during the transmission of the blocks particularly where network latency
is higher.
I think the BIP title, if accepted should be the more generic, "Support
for Datastream Compression" rather than the PR title of "Zlib
Compression for block relay" since it could also be used for
transactions as well at a later time.
Thanks for your time...
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
!DSPAM:5640ff47206804314022622!

--
Cheers, Bob McElrath

"For every complex problem, there is a solution that is simple, neat, and wrong."
-- H. L. Mencken

gladoscc via bitcoin-dev

2015-11-10 01:58:41 UTC

Permalink

I think 25% bandwidth savings is certainly considerable, especially for
people running full nodes in countries like Australia where internet
bandwidth is lower and there are data caps.

I absolutely would not dismiss 25% compression. gzip and bzip2 compression
is relatively standard, and I'd consider the point of implementation
complexity tradeoff to be somewhere along 5-10%.

On Tue, Nov 10, 2015 at 8:04 AM, Bob McElrath via bitcoin-dev <

Post by Bob McElrath via bitcoin-dev
I would expect that since a block contains mostly hashes and crypto signatures,
zlib -15% (file is LARGER)
gzip 28%
bzip2 25%
So zlib compression is right out. How much is ~25% bandwidth savings worth to
people? This seems not worth it to me. :-/

--
Cheers, Bob McElrath
"For every complex problem, there is a solution that is simple, neat, and wrong."
-- H. L. Mencken
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Johnathan Corgan via bitcoin-dev

2015-11-10 05:40:13 UTC

Permalink

On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <

Post by gladoscc via bitcoin-dev
I think 25% bandwidth savings is certainly considerable, especially for
people running full nodes in countries like Australia where internet
bandwidth is lower and there are data caps.

âThis reinforces the idea that such trade-off decisions should be be local
and negotiated between peers, not a required feature of the network P2P.â

--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

Tier Nolan via bitcoin-dev

2015-11-10 09:44:11 UTC

Permalink

The network protocol is not quite consensus critical, but it is important.

Two implementations of the decompressor might not be bug for bug
compatible. This (potentially) means that a block could be designed that
won't decode properly for some version of the client but would work for
another. This would fork the network.

A "raw" network library is unlikely to have the same problem.

Rather than just compress the stream, you could compress only block
messages only. A new "cblock" message could be created that is a
compressed block. This shouldn't reduce efficiency by much.

If a client fails to decode a cblock, then it can ask for the block to be
re-sent as a standard "block" message.

This means that it is a pure performance improvement. If problems occur,
then the client can just switch back to uncompressed mode for that block.

You should look into the block relay system. This gives a larger
improvement than simply compressing the stream. The main benefit is
latency but it means that actual blocks don't have to be sent, so gives a
potential 50% compression ratio. Normally, a node receives all the
transactions and then those transactions are included later in the block.

On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <

Post by Johnathan Corgan via bitcoin-dev
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <

âThis reinforces the idea that such trade-off decisions should be be local
and negotiated between peers, not a required feature of the network P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Peter Tschipper via bitcoin-dev

2015-11-10 16:17:40 UTC

Permalink

Post by Tier Nolan via bitcoin-dev
The network protocol is not quite consensus critical, but it is important.
Two implementations of the decompressor might not be bug for bug
compatible. This (potentially) means that a block could be designed
that won't decode properly for some version of the client but would
work for another. This would fork the network.
A "raw" network library is unlikely to have the same problem.
Rather than just compress the stream, you could compress only block
messages only. A new "cblock" message could be created that is a
compressed block. This shouldn't reduce efficiency by much.

I chose the more generic datastream compression so we could in the
future apply to possibly to transactions but currently all that is
planned, is to compress blocks, and that was really my only original
intent until I saw that there might be some bandwidth savings for
transactions as well.
The compression however could be applied to any datastream but is not
*forced* . Basically it would just be a method call in CDatastream so
we could do ss.compress and ss.decompress and apply that to blocks and
possibly transactions if worthwhile and only IF compression is turned
on. But there is no intend to apply this to every type of message
since most would be too small to benefit from compression.
Here are some results of using the code in the PR to
compress/decompress blocks using zlib compression level = 6. This
data was taken from the first 275K blocks in the mainnet blockchain.
Clearly once we get past 10KB we get pretty decent compression but
even below that there is some benefit. I'm still collecting data and
will get the same for the whole blockchain.
range = block size range
ubytes = average size of uncompressed blocks
cbytes = average size of compressed blocks
ctime = average time to compress
dtime = average time to decompress
cmp_ratio% = compression ratio
datapoints = number of datapoints taken
range ubytes cbytes ctime dtime cmp_ratio% datapoints
0-250b 215 189 0.001 0.000 12.41 79498
250-500b 440 405 0.001 0.000 7.82 11903
500-1KB 762 702 0.001 0.000 7.83 10448
1KB-10KB 4166 3561 0.001 0.000 14.51 50572
10KB-100KB 40820 31597 0.005 0.001 22.59 75555
100KB-200KB 146238 106320 0.015 0.001 27.30 25024
200KB-300KB 242913 175482 0.025 0.002 27.76 20450
300KB-400KB 343430 251760 0.034 0.003 26.69 2069
400KB-500KB 457448 343495 0.045 0.004 24.91 1889
500KB-600KB 540736 424255 0.056 0.007 21.54 90
600KB-700KB 647851 506888 0.063 0.007 21.76 59
700KB-800KB 749513 586551 0.073 0.007 21.74 48
800KB-900KB 859439 652166 0.086 0.008 24.12 39
900KB-1MB 952333 725191 0.089 0.009 23.85 78

Post by Tier Nolan via bitcoin-dev
If a client fails to decode a cblock, then it can ask for the block
to be re-sent as a standard "block" message.

interesting idea.

Post by Tier Nolan via bitcoin-dev
This means that it is a pure performance improvement. If problems
occur, then the client can just switch back to uncompressed mode for
that block.
You should look into the block relay system. This gives a larger
improvement than simply compressing the stream. The main benefit is
latency but it means that actual blocks don't have to be sent, so
gives a potential 50% compression ratio. Normally, a node receives
all the transactions and then those transactions are included later
in the block.

There are better ways of sending new blocks, that's certainly true but
for sending historical blocks and seding transactions I don't think
so. This PR is really designed to save bandwidth and not intended to
be a huge performance improvement in terms of time spent sending.

Post by Tier Nolan via bitcoin-dev
On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
I think 25% bandwidth savings is certainly considerable,
especially for people running full nodes in countries like
Australia where internet bandwidth is lower and there are
data caps.
âThis reinforces the idea that such trade-off decisions should be
be local and negotiated between peers, not a required feature of
the network P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Jonathan Toomim via bitcoin-dev

2015-11-10 16:21:56 UTC

Permalink

Quick observation: block transmission would be compress-once, send-multiple-times, which makes the tradeoff a little better.

Tier Nolan via bitcoin-dev

2015-11-10 16:30:57 UTC

Permalink

There are better ways of sending new blocks, that's certainly true but for
sending historical blocks and seding transactions I don't think so. This
PR is really designed to save bandwidth and not intended to be a huge
performance improvement in terms of time spent sending.

If the main point is for historical data, then sticking to just blocks is
the best plan.

Since small blocks don't compress well, you could define a "cblocks"
message that handles multiple blocks (just concatenate the block messages
as payload before compression).

The sending peer could combine blocks so that each cblock is compressing at
least 10kB of block data (or whatever is optimal). It is probably worth
specifying a maximum size for network buffer reasons (either 1MB or 1 block
maximum).

Similarly, transactions could be combined together and compressed "ctxs".
The inv messages could be modified so that you can request groups of 10-20
transactions. That would depend on how much of an improvement compressed
transactions would represent.

More generally, you could define a message which is a compressed message
holder. That is probably to complex to be worth the effort though.

On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <

Post by Johnathan Corgan via bitcoin-dev
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <

âThis reinforces the idea that such trade-off decisions should be be
local and negotiated between peers, not a required feature of the network
P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

_______________________________________________

Jeff Garzik via bitcoin-dev

2015-11-10 16:46:15 UTC

Permalink

Comments:

1) cblock seems a reasonable way to extend the protocol. Further wrapping
should probably be done at the stream level.

2) zlib has crappy security track record.

3) A fallback path to non-compressed is required, should compression fail
or crash.

4) Most blocks and transactions have runs of zeroes and/or highly common
bit-patterns, which contributes to useful compression even at smaller
sizes. Peter Ts's most recent numbers bear this out. zlib has a
dictionary (32K?) which works well with repeated patterns such as those you
see with concatenated runs of transactions.

5) LZO should provide much better compression, at a cost of CPU performance
and using a less-reviewed, less-field-tested library.

On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev <

On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper <

There are better ways of sending new blocks, that's certainly true but
for sending historical blocks and seding transactions I don't think so.
This PR is really designed to save bandwidth and not intended to be a huge
performance improvement in terms of time spent sending.

If the main point is for historical data, then sticking to just blocks is
the best plan.
Since small blocks don't compress well, you could define a "cblocks"
message that handles multiple blocks (just concatenate the block messages
as payload before compression).
The sending peer could combine blocks so that each cblock is compressing
at least 10kB of block data (or whatever is optimal). It is probably worth
specifying a maximum size for network buffer reasons (either 1MB or 1 block
maximum).
Similarly, transactions could be combined together and compressed "ctxs".
The inv messages could be modified so that you can request groups of 10-20
transactions. That would depend on how much of an improvement compressed
transactions would represent.
More generally, you could define a message which is a compressed message
holder. That is probably to complex to be worth the effort though.

On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <

Post by Johnathan Corgan via bitcoin-dev
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <

âThis reinforces the idea that such trade-off decisions should be be
local and negotiated between peers, not a required feature of the network
P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

_______________________________________________

_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Peter Tschipper via bitcoin-dev

2015-11-10 17:09:06 UTC

Permalink

Post by Jeff Garzik via bitcoin-dev
1) cblock seems a reasonable way to extend the protocol. Further
wrapping should probably be done at the stream level.

agreed.

Post by Jeff Garzik via bitcoin-dev
2) zlib has crappy security track record.

Zlib had a bad buffer overflow bug but that was in 2005 and it got a lot
of press at the time. It's was fixed in version 1.2.3...we're on 1.2.8
now. I'm not aware of any other current issues with zlib. Do you have a
citation?

Post by Jeff Garzik via bitcoin-dev
3) A fallback path to non-compressed is required, should compression
fail or crash.

agreed.

Post by Jeff Garzik via bitcoin-dev
4) Most blocks and transactions have runs of zeroes and/or highly
common bit-patterns, which contributes to useful compression even at
smaller sizes. Peter Ts's most recent numbers bear this out. zlib
has a dictionary (32K?) which works well with repeated patterns such
as those you see with concatenated runs of transactions.
5) LZO should provide much better compression, at a cost of CPU
performance and using a less-reviewed, less-field-tested library.

I don't think LZO will give as good compression here but I will do some
benchmarking when I can.

Post by Jeff Garzik via bitcoin-dev
On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
There are better ways of sending new blocks, that's certainly
true but for sending historical blocks and seding transactions
I don't think so. This PR is really designed to save
bandwidth and not intended to be a huge performance
improvement in terms of time spent sending.
If the main point is for historical data, then sticking to just
blocks is the best plan.
Since small blocks don't compress well, you could define a
"cblocks" message that handles multiple blocks (just concatenate
the block messages as payload before compression).
The sending peer could combine blocks so that each cblock is
compressing at least 10kB of block data (or whatever is optimal).
It is probably worth specifying a maximum size for network buffer
reasons (either 1MB or 1 block maximum).
Similarly, transactions could be combined together and compressed
"ctxs". The inv messages could be modified so that you can
request groups of 10-20 transactions. That would depend on how
much of an improvement compressed transactions would represent.
More generally, you could define a message which is a compressed
message holder. That is probably to complex to be worth the
effort though.

Post by Tier Nolan via bitcoin-dev
On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
I think 25% bandwidth savings is certainly
considerable, especially for people running full
nodes in countries like Australia where internet
bandwidth is lower and there are data caps.
â This reinforces the idea that such trade-off decisions
should be be local and negotiated between peers, not a
required feature of the network P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Peter Tschipper via bitcoin-dev

2015-11-18 14:00:35 UTC

Permalink

Hi all,

I'm still doing a little more investigation before opening up a formal
bip PR, but getting close. Here are some more findings.

After moving the compression from main.cpp to streams.h (CDataStream) it
was a simple matter to add compression to transactions as well. Results
as follows:

range = block size range
ubytes = average size of uncompressed transactions
cbytes = average size of compressed transactions
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range ubytes cbytes cmp_ratio% datapoints
0-250b 220 227 -3.16 23780
250-500b 356 354 0.68 20882
500-600 534 505 5.29 2772
600-700 653 608 6.95 1853
700-800 757 649 14.22 578
800-900 822 758 7.77 661
900-1KB 954 862 9.69 906
1KB-10KB 2698 2222 17.64 3370
10KB-100KB 15463 12092 21.8 15429

A couple of obvious observations. Transactions don't compress well
below 500 bytes but do very well beyond 1KB where there are a great deal
of those large spam type transactions. However, most transactions
happen to be in the < 500 byte range. So the next step was to appy
bundling, or the creating of a "blob" for those smaller transactions, if
and only if there are multiple tx's in the getdata receive queue for a
peer. Doing that yields some very good compression ratios. Some
examples as follows:

The best one I've seen so far was the following where 175 transactions
were bundled into one blob before being compressed. That yielded a 20%
compression ratio, but that doesn't take into account the savings from
the unneeded 174 message headers (24 bytes each) as well as 174 TCP
ACK's of 52 bytes each which yields and additional 76*174=13224 bytes,
making the overall bandwidth savings 32%, in this particular case.

*2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426 txcount:175*

To be sure, this was an extreme example. Most transaction blobs were in
the 2 to 10 transaction range. Such as the following:

*2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876 txcount:10*

But even here the savings are 10%, far better than the "nothing" we
would get without bundling, but add to that the 76 byte * 9 transaction
savings and we have a total 20% savings in bandwidth for transactions
that otherwise would not be compressible.

The same bundling was applied to blocks and very good compression ratios
are seen when sync'ing the blockchain.

Overall the bundling or blobbing of tx's and blocks seems to be a good
idea for improving bandwith use but also there is a scalability factor
here, when the system is busy, transactions are bundled more often,
compressed, sent faster, keeping message queue and network chatter to a
minimum.

I think I have enough information to put together a formal BIP with the
exception of which compression library to implement. These tests were
done using ZLib but I'll also be running tests in the coming days with
LZO (Jeff Garzik's suggestion) and perhaps Snappy. If there are any
other libraries that people would like me to get results for please let
me know and I'll pick maybe the top 2 or 3 and get results back to the
group.

Some further Block Compression tests results that compare performance
when network latency is added to the mix.
Running two nodes, windows 7, compressionlevel=6, syncing the first
200000 blocks from one node to another. Running on a highspeed
wireless LAN with no connections to the outside world.
Network latency was added by using Netbalancer to induce the 30ms and
60ms latencies.
From the data not only are bandwidth savings seen but also a small
performance savings as well. However, the overall the value in
compressing blocks appears to be in terms of saving bandwidth.
I was also surprised to see that there was no real difference in
performance when no latency was present; apparently the time it takes
to compress is about equal to the performance savings in such a situation.
The following results compare the tests in terms of how long it takes
to sync the blockchain, compressed vs uncompressed and with varying
latencies.
uncmp = uncompressed
cmp = compressed
num blocks sync'd uncmp (secs) cmp (secs) uncmp 30ms (secs) cmp
30ms (secs) uncmp 60ms (secs) cmp 60ms (secs)
10000 264 269 265 257 274 275
20000 482 492 479 467 499 497
30000 703 717 693 676 724 724
40000 918 939 902 886 947 944
50000 1140 1157 1114 1094 1171 1167
60000 1362 1380 1329 1310 1400 1395
70000 1583 1597 1547 1526 1637 1627
80000 1810 1817 1767 1745 1872 1862
90000 2031 2036 1985 1958 2109 2098
100000 2257 2260 2223 2184 2385 2355
110000 2553 2486 2478 2422 2755 2696
120000 2800 2724 2849 2771 3345 3254
130000 3078 2994 3356 3257 4125 4006
140000 3442 3365 3979 3870 5032 4904
150000 3803 3729 4586 4464 5928 5797
160000 4148 4075 5168 5034 6801 6661
170000 4509 4479 5768 5619 7711 7557
180000 4947 4924 6389 6227 8653 8479
190000 5858 5855 7302 7107 9768 9566
200000 6980 6969 8469 8220 10944 10724

Peter Tschipper via bitcoin-dev

2015-11-28 14:48:41 UTC

Permalink

Hi All,

Here are some final results of testing with the reference implementation
for compressing blocks and transactions. This implementation also
concatenates blocks and transactions when possible so you'll see data
sizes in the 1-2MB ranges.

Results below show the time it takes to sync the first part of the
blockchain, comparing Zlib to the LZOx library. (LZOf was also tried
but wasn't found to be as good as LZOx). The following shows tests run
with and without latency. With latency on the network, all compression
libraries performed much better than without compression.

I don't think it's entirely obvious which is better, Zlib or LZO.
Although I prefer the higher compression of Zlib, overall I would have
to give the edge to LZO. With LZO we have the fastest most scalable
option when at the lowest compression setting which will be a boost in
performance for users that want peformance over compression, and then at
the high end LZO provides decent compression which approaches Zlib,
(although at a higher cost) but good for those that want to save more
bandwidth.

Uncompressed 60ms Zlib-1 (60ms) Zlib-6 (60ms) LZOx-1 (60ms) LZOx-999
(60ms)
219 299 296 294 291
432 568 565 558 548
652 835 836 819 811
866 1106 1107 1081 1071
1082 1372 1381 1341 1333
1309 1644 1654 1605 1600
1535 1917 1936 1873 1875
1762 2191 2210 2141 2141
1992 2463 2486 2411 2411
2257 2748 2780 2694 2697
2627 3034 3076 2970 2983
3226 3416 3397 3266 3302
4010 3983 3773 3625 3703
4914 4503 4292 4127 4287
5806 4928 4719 4529 4821
6674 5249 5164 4840 5314
7563 5603 5669 5289 6002
8477 6054 6268 5858 6638
9843 7085 7278 6868 7679
11338 8215 8433 8044 8795

These results from testing on a highspeed wireless LAN (very small latency)

Results in seconds

Num blocks sync'd Uncompressed Zlib-1 Zlib-6 LZOx-1 LZOx-999
10000 255 232 233 231 257
20000 464 414 420 407 453
30000 677 594 611 585 650
40000 887 782 795 760 849
50000 1099 961 977 933 1048
60000 1310 1145 1167 1110 1259
70000 1512 1330 1362 1291 1470
80000 1714 1519 1552 1469 1679
90000 1917 1707 1747 1650 1882
100000 2122 1905 1950 1843 2111
110000 2333 2107 2151 2038 2329
120000 2560 2333 2376 2256 2580
130000 2835 2656 2679 2558 2921
140000 3274 3259 3161 3051 3466
150000 3662 3793 3547 3440 3919
160000 4040 4172 3937 3767 4416
170000 4425 4625 4379 4215 4958
180000 4860 5149 4895 4781 5560
190000 5855 6160 5898 5805 6557
200000 7004 7234 7051 6983 7770

The following show the compression ratio acheived for various sizes of
data. Zlib is the clear
winner for compressibility, with LZOx-999 coming close but at a cost.

range Zlib-1 cmp%
Zlib-6 cmp% LZOx-1 cmp% LZOx-999 cmp%
0-250b 12.44 12.86 10.79 14.34
250-500b 19.33 12.97 10.34 11.11
600-700 16.72 n/a 12.91 17.25
700-800 6.37 7.65 4.83 8.07
900-1KB 6.54 6.95 5.64 7.9
1KB-10KB 25.08 25.65 21.21 22.65
10KB-100KB 19.77 21.57 14.37 19.02
100KB-200KB 21.49 23.56 15.37 21.55
200KB-300KB 23.66 24.18 16.91 22.76
300KB-400KB 23.4 23.7 16.5 21.38
400KB-500KB 24.6 24.85 17.56 22.43
500KB-600KB 25.51 26.55 18.51 23.4
600KB-700KB 27.25 28.41 19.91 25.46
700KB-800KB 27.58 29.18 20.26 27.17
800KB-900KB 27 29.11 20 27.4
900KB-1MB 28.19 29.38 21.15 26.43
1MB -2MB 27.41 29.46 21.33 27.73

The following shows the time in seconds to compress data of various
sizes. LZO1x is the
fastest and as file sizes increase, LZO1x time hardly increases at all.
It's interesing
to note as compression ratios increase LZOx-999 performs much worse than
Zlib. So LZO is faster
on the low end and slower (5 to 6 times slower) on the high end.

range Zlib-1 Zlib-6 LZOx-1 LZOx-999 cmp%
0-250b 0.001 0 0 0
250-500b 0 0 0 0.001
500-1KB 0 0 0 0.001
1KB-10KB 0.001 0.001 0 0.002
10KB-100KB 0.004 0.006 0.001 0.017
100KB-200KB 0.012 0.017 0.002 0.054
200KB-300KB 0.018 0.024 0.003 0.087
300KB-400KB 0.022 0.03 0.003 0.121
400KB-500KB 0.027 0.037 0.004 0.151
500KB-600KB 0.031 0.044 0.004 0.184
600KB-700KB 0.035 0.051 0.006 0.211
700KB-800KB 0.039 0.057 0.006 0.243
800KB-900KB 0.045 0.064 0.006 0.27
900KB-1MB 0.049 0.072 0.006 0.307

Post by Jeff Garzik via bitcoin-dev
1) cblock seems a reasonable way to extend the protocol. Further
wrapping should probably be done at the stream level.
2) zlib has crappy security track record.
3) A fallback path to non-compressed is required, should compression
fail or crash.
4) Most blocks and transactions have runs of zeroes and/or highly
common bit-patterns, which contributes to useful compression even at
smaller sizes. Peter Ts's most recent numbers bear this out. zlib
has a dictionary (32K?) which works well with repeated patterns such
as those you see with concatenated runs of transactions.
5) LZO should provide much better compression, at a cost of CPU
performance and using a less-reviewed, less-field-tested library.
On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
There are better ways of sending new blocks, that's certainly
true but for sending historical blocks and seding transactions
I don't think so. This PR is really designed to save
bandwidth and not intended to be a huge performance
improvement in terms of time spent sending.
If the main point is for historical data, then sticking to just
blocks is the best plan.
Since small blocks don't compress well, you could define a
"cblocks" message that handles multiple blocks (just concatenate
the block messages as payload before compression).
The sending peer could combine blocks so that each cblock is
compressing at least 10kB of block data (or whatever is optimal).
It is probably worth specifying a maximum size for network buffer
reasons (either 1MB or 1 block maximum).
Similarly, transactions could be combined together and compressed
"ctxs". The inv messages could be modified so that you can
request groups of 10-20 transactions. That would depend on how
much of an improvement compressed transactions would represent.
More generally, you could define a message which is a compressed
message holder. That is probably to complex to be worth the
effort though.

Jonathan Toomim via bitcoin-dev

2015-11-29 00:30:20 UTC

Permalink

It appears you're using the term "compression ratio" to mean "size reduction". A compression ratio is the ratio (compressed / uncompressed). A 1 kB file compressed with a 10% compression ratio would be 0.1 kB. It seems you're using (1 - compressed/uncompressed), meaning that the compressed file would be 0.9 kB.

The following show the compression ratio acheived for various sizes of data. Zlib is the clear
winner for compressibility, with LZOx-999 coming close but at a cost.
range Zlib-1 cmp%
Zlib-6 cmp% LZOx-1 cmp% LZOx-999 cmp%
0-250b 12.44 12.86 10.79 14.34
250-500b 19.33 12.97 10.34 11.11

Peter Tschipper via bitcoin-dev

2015-11-29 05:15:32 UTC

Permalink

yes, you're right, it's just the percentage compressed (size reduction)

Post by Jonathan Toomim via bitcoin-dev
It appears you're using the term "compression ratio" to mean "size
reduction". A compression ratio is the ratio (compressed /
uncompressed). A 1 kB file compressed with a 10% compression ratio
would be 0.1 kB. It seems you're using (1 - compressed/uncompressed),
meaning that the compressed file would be 0.9 kB.
On Nov 28, 2015, at 6:48 AM, Peter Tschipper via bitcoin-dev

Post by Peter Tschipper via bitcoin-dev
The following show the compression ratio acheived for various sizes
of data. Zlib is the clear
winner for compressibility, with LZOx-999 coming close but at a cost.
range Zlib-1 cmp%
Zlib-6 cmp% LZOx-1 cmp% LZOx-999 cmp%
0-250b 12.44 12.86 10.79 14.34
250-500b 19.33 12.97 10.34 11.11

Peter Tschipper via bitcoin-dev

2015-11-10 16:46:54 UTC

Permalink

On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
There are better ways of sending new blocks, that's certainly
true but for sending historical blocks and seding transactions I
don't think so. This PR is really designed to save bandwidth and
not intended to be a huge performance improvement in terms of
time spent sending.
If the main point is for historical data, then sticking to just
blocks is the best plan.

at the beginning yes.

Since small blocks don't compress well, you could define a "cblocks"
message that handles multiple blocks (just concatenate the block
messages as payload before compression).

Small block are rare these days (but plenty of historical block), but
still they get a 10% compression, not bad and I think worthwhile and
the time it takes to compress small blocks is less that a millisecond
so no loss there in time. But still you have a good point and
something worthy of doing after getting compression to work. I think
it's wise to keep it simple at first and build on the success later.

The sending peer could combine blocks so that each cblock is
compressing at least 10kB of block data (or whatever is optimal). It
is probably worth specifying a maximum size for network buffer
reasons (either 1MB or 1 block maximum).

Good idea. Same answer as above.

Similarly, transactions could be combined together and compressed
"ctxs". The inv messages could be modified so that you can request
groups of 10-20 transactions. That would depend on how much of an
improvement compressed transactions would represent.

Good idea. Same answer as above.

More generally, you could define a message which is a compressed
message holder. That is probably to complex to be worth the effort
though.

That's actually pretty easy to do and part of the plan. Sending a
cmp_block rather than a block makes it all easier to implement. It's
just a matter of doing pnode->pushmessage("cmp_block",
compressed_block); and handling the "cmp_block" command string at the
other end.

Post by Tier Nolan via bitcoin-dev
On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
I think 25% bandwidth savings is certainly considerable,
especially for people running full nodes in countries
like Australia where internet bandwidth is lower and
there are data caps.
â This reinforces the idea that such trade-off decisions
should be be local and negotiated between peers, not a
required feature of the network P2P.â
--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev