[bitcoin-dev] "Compressed" headers stream

Discussion:

Jim Posen via bitcoin-dev

2017-12-11 20:40:00 UTC

I want to resurrect this thread from August/September because it seems like
a significant improvement for light clients at very little cost. From the
mailing list, it seems like this got stalled in determining how many more
bytes could be save in addition to the prev_block.

The ideas I've gathered from Greg Maxwell's forwarded email are:

1. Omit nBits altogether and have the receiving node determine it from
chain context.
2. Include nBits only on headers with a height that is a multiple of 2016
since it does not change in between.
3. Compress nTime to two bytes by using the bounds on allowed values from
the consensus rules.

I propose just moving ahead with only the exclusion of the prev_block, as
IMO the other savings are not worth the added complexity.

Firstly, I don't like the idea of making the net header encoding dependent
on the specific header validation rules that Bitcoin uses (eg. the fact
that difficulty is only recalculated every 2016 blocks). This would be
coupling together the two layers, breaking net compatibility for some alts,
and possibly making consensus rule changes even more difficult for a
savings with insufficient benefit. So if you buy that argument, I'm not in
favor of #2 or #3.

Option 1 is still viable, though it has some downsides. The implementation
leaks into the validation code, whereas calculating prev_block can occur
just at the net layer (see implementation below). Also, nodes would now be
*required* to sync the header chain from the genesis block, whereas they
had the option of starting from some checkpoint before.

So switching gears, I'd like to ask what the best way to actually implement
this change is. Solutions I can think of are:

1. New headers command name like "cmpctheaders" or "headersv2".
2. Change serialization of existing headers message in a new protocol
version.
3. Change serialization of existing headers message with new service bit.

I wrote up some proof-of-concept implementations in Core a) just omitting
prev_block
<https://github.com/bitcoin/bitcoin/compare/master...jimpo:compact-headers>
and b) omitting nBits as well
<https://github.com/bitcoin/bitcoin/compare/master...jimpo:compact-headers-difficulty>.
If people think a) is reasonable, I'll write up a BIP.

Hi everyone, the Bitcoin headers are probably the most condensed and
important piece of data in the world, their demand is expected to grow.
When sending a stream of continuous block headers, a common case in IBD and
in disconnected clients, I think there is a possible optimization of the
transmitted data: The headers after the first could avoid transmitting the
previous hash cause the receiver could compute it by double hashing the
previous header (an operation he needs to do anyway to verify PoW). In a
long stream, for example 2016 headers, the savings in bandwidth are about
32/80 ~= 40% without compressed headers 2016*80=161280 bytes with
compressed headers 80+2015*48=96800 bytes What do you think? In
OpenTimestamps calendars we are going to use this compression to give
lite-client a reasonable secure proofs (a full node give higher security
but isn't feasible in all situations, for example for in-browser
verification) To speed up sync of a new client Electrum starts with the
download of a file <https://headers.electrum.org/blockchain_headers>
~36MB containing the first 477637 headers. For this kind of clients could
be useful a common http API with fixed position chunks to leverage http
caching. For example /headers/2016/0 returns the headers from the genesis
to the 2015 header included while /headers/2016/1 gives the headers from
the 2016th to the 4031. Other endpoints could have chunks of 20160 blocks
or 201600 such that with about 10 http requests a client could fast sync
the headers

Gregory Maxwell via bitcoin-dev

2017-12-11 21:04:01 UTC

Permalink

On Mon, Dec 11, 2017 at 8:40 PM, Jim Posen via bitcoin-dev

Post by Jim Posen via bitcoin-dev
Firstly, I don't like the idea of making the net header encoding dependent
on the specific header validation rules that Bitcoin uses (eg. the fact that
difficulty is only recalculated every 2016 blocks). This would be coupling

In the last proposal I recall writing up, there was a one byte flag on
each header to indicate what was included.

Nbits _never_ needs to be sent even with other consensus rules because
its more or less necessarily a strict function of the prior headers.
This still holds in every clone of Bitcoin I'm aware of; sending it
with the first header in a group probably makes sense so it can be
checked independently.

Post by Jim Posen via bitcoin-dev
with insufficient benefit.

another >18% reduction in size beyond the removal of prev. is not
insubstantial by any means. I don't think it should lightly be
ignored.

Prev omission itself is not, sadly, magically compatible: I am quite
confident that if there is a bitcoin hardfork it would recover the
nbits/4-guarenteed always-zero bits of prev to use as extra nonce for
miners. This has been proposed many times, implemented at least once,
and the current requirement for mining infrastructure to reach inside
the coinbase txn to increment a nonce has been a reliable source of
failures. So I think we'd want to have the encoding able to encode
leading prev bits.

Many altcoins also change the header structures. If the better thing
is altcoin incompatible, we should still do it. Doing otherwise would
competitively hobble Bitcoin especially considering the frequent
recklessly incompetent moves made by various altcoins and the near
total lack of useful novel development we've seen come out of the
clones.

Probably the most important change in a new header message wouldn't be
the encoding, but it would be changing the fetching mechanism so that
header sets could be pulled in parallel, etc.

I would rather not change the serialization of existing messages,
nodes are going to have to support speaking both messages for a long
time, and I think we already want a different protocol flow for
headers fetching in any case.

Jim Posen via bitcoin-dev

2017-12-11 21:56:08 UTC

Permalink

Post by Gregory Maxwell via bitcoin-dev
On Mon, Dec 11, 2017 at 8:40 PM, Jim Posen via bitcoin-dev

Post by Jim Posen via bitcoin-dev
Firstly, I don't like the idea of making the net header encoding

dependent

Post by Jim Posen via bitcoin-dev
on the specific header validation rules that Bitcoin uses (eg. the fact

that

Post by Jim Posen via bitcoin-dev
difficulty is only recalculated every 2016 blocks). This would be

coupling
In the last proposal I recall writing up, there was a one byte flag on
each header to indicate what was included.

Is there a link somewhere to that proposal? The only thing I could find was
your forwarded email
<https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-September/014904.html>
on
this thread.

Post by Gregory Maxwell via bitcoin-dev
Nbits _never_ needs to be sent even with other consensus rules because
its more or less necessarily a strict function of the prior headers.
This still holds in every clone of Bitcoin I'm aware of; sending it
with the first header in a group probably makes sense so it can be
checked independently.

Post by Jim Posen via bitcoin-dev
with insufficient benefit.

another >18% reduction in size beyond the removal of prev. is not
insubstantial by any means. I don't think it should lightly be
ignored.

Omitting nBits entirely seems reasonable, I wrote up a possible
implementation here
<https://github.com/bitcoin/bitcoin/compare/master...jimpo:compact-headers-difficulty>.
The downside is that it is more complex because it leaks into the
validation code. The extra 4 byte savings is certainly nice though.

Post by Gregory Maxwell via bitcoin-dev
Prev omission itself is not, sadly, magically compatible: I am quite
confident that if there is a bitcoin hardfork it would recover the
nbits/4-guarenteed always-zero bits of prev to use as extra nonce for
miners. This has been proposed many times, implemented at least once,
and the current requirement for mining infrastructure to reach inside
the coinbase txn to increment a nonce has been a reliable source of
failures. So I think we'd want to have the encoding able to encode
leading prev bits.
Many altcoins also change the header structures. If the better thing
is altcoin incompatible, we should still do it. Doing otherwise would
competitively hobble Bitcoin especially considering the frequent
recklessly incompetent moves made by various altcoins and the near
total lack of useful novel development we've seen come out of the
clones.
Probably the most important change in a new header message wouldn't be
the encoding, but it would be changing the fetching mechanism so that
header sets could be pulled in parallel, etc.
I would rather not change the serialization of existing messages,
nodes are going to have to support speaking both messages for a long
time, and I think we already want a different protocol flow for
headers fetching in any case.

Can you elaborate on how parallel header fetching might work? getheaders
requests could probably already be pipelined, where the node requests the
next 2,000 headers before processing the current batch (though would make
sense to check that they are all above min difficulty first).

I'm open to more ideas on how to optimize the header download or design the
serialization format to be more flexible, but I'm concerned that we forgo a
40-45% bandwidth savings on the current protocol for a long time because
something better might be possible later on or there might be a hard fork
that at some point requires another upgrade. I do recognize that supporting
multiple serialization formats simultaneously adds code complexity, but in
this case the change seems simple enough to me that the tradeoff is worth
it.

Tier Nolan via bitcoin-dev

2017-12-11 22:41:50 UTC

Permalink

On Mon, Dec 11, 2017 at 9:56 PM, Jim Posen via bitcoin-dev <

Post by Jim Posen via bitcoin-dev
Omitting nBits entirely seems reasonable, I wrote up a possible
implementation here
<https://github.com/bitcoin/bitcoin/compare/master...jimpo:compact-headers-difficulty>.
The downside is that it is more complex because it leaks into the
validation code. The extra 4 byte savings is certainly nice though.

A compromise would be to have 1 byte indicating the difference since the
last header.

Since the exponent doesn't use the full range you could steal bits from
there to indicate mode.

- no change
- mantissa offset (for small changes)
- full difficulty

This would support any nBits rule and you say 3 of the 4 bytes.

Post by Jim Posen via bitcoin-dev
Can you elaborate on how parallel header fetching might work? getheaders
requests could probably already be pipelined, where the node requests the
next 2,000 headers before processing the current batch (though would make
sense to check that they are all above min difficulty first).

I suggest adding a message where you can ask for the lowest N hashes
between 2 heights on the main chain.

The reply is an array of {height, header} pairs for the N headers with the
lowest hash in the specified range.

All peers should agree on which headers are in the array. If there is
disagreement, then you can at least narrow down on which segment there is
disagreement.

It works kind of like a cut and choose. You pick one segment of the ones
he gave you recursively.

You can ask a peer for proof for a segment between 2 headers of the form.

- first header + coinbase with merkle branch
- all headers in the segment

This proves the segment has the correct height and that all the headers
link up.

There is a method called "high hash highway" that allows compact proofs of
total POW.

Gregory Maxwell via bitcoin-dev

2017-12-11 23:11:24 UTC

Permalink

On Mon, Dec 11, 2017 at 10:41 PM, Tier Nolan via bitcoin-dev

Post by Tier Nolan via bitcoin-dev
There is a method called "high hash highway" that allows compact proofs of
total POW.

That provides no security without additional consensus enforced
commitments, so I think pretty off-topic for this discussion.

Suhas Daftuar via bitcoin-dev

2017-12-12 21:07:11 UTC

Permalink

Hi,

First, thanks for resurrecting this, I agree this is worth pursuing.

On Mon, Dec 11, 2017 at 4:04 PM, Gregory Maxwell via bitcoin-dev <

I think it would be nice, though, to not require the consensus-correct
calculation of nBits in order to process p2p messages. For instance, I
think there's a use for nBits at the p2p layer for calculating the work on
a chain, which can be used as an anti-DoS measure, even without verifying
that the difficulty adjustments are following the consensus rules.

Moreover I think it's a bit messy if the p2p layer depends on intricate
consensus rules in order to reconstruct a message -- either we'd need to
interact with existing consensus logic in a potentially new way, or we'd
reimplement the same logic in the p2p layer, neither of which is very
desirable imo.

But I think we should be able to get nearly all the benefit just by
including nBits in any messages where the value is ambiguous; ie we include
it with the first header in a message, and whenever it changes from the
previous header's nBits.

I would rather not change the serialization of existing messages,

Post by Gregory Maxwell via bitcoin-dev
nodes are going to have to support speaking both messages for a long
time, and I think we already want a different protocol flow for
headers fetching in any case.

I agree with this. Specifically the way I envisioned this working is that
we could introduce a new 'cmpctheaders'/'getcmpcthdrs' message pair for
syncing using this new message type, while leaving the existing
'headers'/'getheaders' messages unchanged. So when communicating with
upgraded peers, we'd never use 'getheaders' messages, and we'd only use
'headers' messages for potentially announcing new blocks.

Of course, we'll have to support the existing protocol for a long time.
But one downside I've discovered from deploying BIP 130, which introduced
block announcements via headers messages, is that overloading a 'headers'
message to be either a block announcement or a response to a 'getheaders'
message results in p2p-handling logic which is more complicated than it
needs to be. So splitting off the headers chain-sync functionality to a
new message pair seems like a nice side-effect benefit, in the long run.

Gregory Maxwell via bitcoin-dev

2017-12-13 00:01:32 UTC

Permalink

Post by Suhas Daftuar via bitcoin-dev
But I think we should be able to get nearly all the benefit just by
including nBits in any messages where the value is ambiguous; ie we include
it with the first header in a message, and whenever it changes from the
previous header's nBits.

Yes, that is what I was thinking last time we discussed it, just with
each header include a one byte flag that lets you express:

bit: meaning
(0) if nbits is the same as last,
(1) if timestamp is a full field or a small offset, (e.g. two bytes
defined as unsigned offset between the last time - 7200 and the new
time).
(2,3,4) if version is the same as the last distinct value .. 7th last,
or a new 32bit distinct value;
(5,6) if prev is entirely there, entirely gone, first 4 bytes
provided, or first 8 bytes provided. (if provided they override the
computed values).

That would be 7 bits in total; the 8th could be reserved or use to
signal "more headers follow" to make the encoding self-delimiting.

The downside with nbits the same as last as the optimization is that
if we ever change consensus rules to ones where difficulty management
works differently it may be the case that nbits changes every block.

Alternatively, nbits could get a differential encoding that could be
opted into for small differences-- though I haven't thought much about
it to see if a one byte difference would be that useful (e.g. can bch
differences usually be expressed with one byte?)

I'm kind of dubious of the consensus layer anti-dos separation: nbits
minimum is so low compared to the speed of a mining device, virtually
any attack that you might do with invalid headers could still be done
with headers at the minimum difficulty. But I'm fully willing to
accept that simpler is better...

Post by Suhas Daftuar via bitcoin-dev

Post by Gregory Maxwell via bitcoin-dev
I would rather not change the serialization of existing messages,
nodes are going to have to support speaking both messages for a long
time, and I think we already want a different protocol flow for
headers fetching in any case.

I agree with this. Specifically the way I envisioned this working is that
we could introduce a new 'cmpctheaders'/'getcmpcthdrs' message pair for
syncing using this new message type, while leaving the existing
'headers'/'getheaders' messages unchanged. So when communicating with
upgraded peers, we'd never use 'getheaders' messages, and we'd only use
'headers' messages for potentially announcing new blocks.
Of course, we'll have to support the existing protocol for a long time. But
one downside I've discovered from deploying BIP 130, which introduced block
announcements via headers messages, is that overloading a 'headers' message
to be either a block announcement or a response to a 'getheaders' message
results in p2p-handling logic which is more complicated than it needs to be.
So splitting off the headers chain-sync functionality to a new message pair
seems like a nice side-effect benefit, in the long run.

Gregory Maxwell via bitcoin-dev

2017-12-13 00:12:45 UTC

Permalink

Post by Suhas Daftuar via bitcoin-dev
I agree with this. Specifically the way I envisioned this working is that
we could introduce a new 'cmpctheaders'/'getcmpcthdrs' message pair for
syncing using this new message type, while leaving the existing
'headers'/'getheaders' messages unchanged. So when communicating with
upgraded peers, we'd never use 'getheaders' messages, and we'd only use
'headers' messages for potentially announcing new blocks.

The question becomes there-- how should it work.

In an ideal world, we'd decide what peers to fetch headers from based
on a compact proof of the total work in their chains... but we cannot
construct such proofs in Bitcoin today.

I think that instead of that a weak heuristic is to fetch first from
the peers with the tip at the highest difficulty. Then work backwards.

See: https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync

Which is the inspiration for the current headers first sync, but
without the reverse part because the protocol didn't permit it: you
can't request a linear wad of headers that come before a header. This
is the thing I was mostly thinking of when I mentioned that we may
want to change the interface.