Discussion:
TXO commitments do not need a soft-fork to be useful
Add Reply
Peter Todd via bitcoin-dev
2017-02-23 01:11:47 UTC
Reply
Permalink
Raw Message
Something I've recently realised is that TXO commitments do not need to be
implemented as a consensus protocol change to be useful. All the benefits they
provide to full nodes with regard to allowing for old UTXO data to be pruned -
and thus solving the UTXO bloat problem - can be implemented even without
having miners commit to the TXO commitment itself. This has a significant
deployment advantage too: we can try out multiple TXO commitment schemes, in
production, without the need for consensus changes.


# Reasoning

1) Like any other merkelized data structure, a TXO commitment allows a data set
- the TXO set - to be securely provided by an untrusted third party, allowing
the data itself to be discarded. So if you have a valid TXO commitment, you can
discard the TXO data itself, and rely on untrusted entities to provide you that
data on demand.

2) The TXO set is a super-set of the UTXO set; all data in the UTXO set is also
present in the TXO set. Thus a TXO commitment with spent TXO's pruned is
equivalent to a UTXO set, doubly so if inner nodes in the commitment tree
commit to the sum-unspent of their children.

3) Where a outpoint-indexed UTXO set has a uniform access pattern, an
insertion-ordered TXO set has a delibrately *non-uniform* access pattern: not
only are new entries to the TXO set always appended to the end - an operation
that requires a known, log2(n), sized set of merkle tips - but due to lost
coins alone we can guarantee that older entries in the TXO set will be less
frequently updated than newer entries.

4) Thus a full node that doesn't have enough local storage to maintain the full
UTXO set can instead keep track of a TXO commitment, and prune older UTXO's
from it that are unlikely to be spent. In the event those UTXO's are spent,
transactions and blocks spending them can trustlessly provide the necessary
data to temporarily fill-in the node's local TXO set database, allowing the
next commitment to be calculated.

5) By *not* committing the TXO commitment in the block itself, we obsolete my
concept of delayed TXO commitments: you don't need to have calculated the TXO
commitment digest to validate a block anyway!


# Deployment Plan

1) Implement a TXO commitment scheme with the ability to efficiently store the
last n versions of the commitment state for the purpose of reorgs (a
reference-counted scheme naturally does this).

2) Add P2P support for advertising to peers what parts of the TXO set you've
pruned.

3) Add P2P support to produce, consume, and update TXO unspentness proofs as
part of transaction and block relaying.

4) Profit.


# Bootstrapping New Nodes

With a TXO commitment scheme implemented, it's also possible to produce
serialized UTXO snapshots for bootstrapping new nodes. Equally, it's obviously
possible to distribute those snapshots, and have people you trust attest to the
validity of those snapshots.

I argue that a snapshot with an attestation from known individuals that you
trust is a *better* security model than having miners attest to validity: the
latter is trusting an unknown set of unaccountable, anonymous, miners.

This security model is not unlike the recently implemented -assumevalid
scheme(1), in that auditing the validity of the assumed valid TXO commitments
is something anyone can do provided they have a full node. Similarly, we could
ship Bitcoin nodes with an assumed-valid TXO commitment, and have those nodes
fill in the UTXO data from their peers.

However it is a weaker security model, in that a false TXO commitment can more
easily be used to trick a node into accepting invalid transactions/blocks;
assumed valid blocks requires proof-of-work to pull off this attack. A
compromise may be to use assumed valid TXO commitments, extending my partial
UTXO set(2) suggestion of having nodes validate the chain backwards, to
eventually validate 100% of the chain.


# References

1) https://github.com/bitcoin/bitcoin/pull/9484
2) [Bitcoin-development] SPV bitcoind? (was: Introducing BitcoinKit.framework),
Peter Todd, Jul 17th 2013, Bitcoin development mailing list,
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2013-July/002917.html
--
https://petertodd.org 'peter'[:-1]@petertodd.org
Eric Lombrozo via bitcoin-dev
2017-02-23 03:30:37 UTC
Reply
Permalink
Raw Message
This kind of thing is long overdue!

I think it’s a great idea to attempt this without soft forking TXO
commitments yet so we can see what works best.


- E

On Wed, Feb 22, 2017 at 5:11 PM, Peter Todd via bitcoin-dev <
Post by Peter Todd via bitcoin-dev
Something I've recently realised is that TXO commitments do not need to be
implemented as a consensus protocol change to be useful. All the benefits they
provide to full nodes with regard to allowing for old UTXO data to be pruned -
and thus solving the UTXO bloat problem - can be implemented even without
having miners commit to the TXO commitment itself. This has a significant
deployment advantage too: we can try out multiple TXO commitment schemes, in
production, without the need for consensus changes.
# Reasoning
1) Like any other merkelized data structure, a TXO commitment allows a data set
- the TXO set - to be securely provided by an untrusted third party, allowing
the data itself to be discarded. So if you have a valid TXO commitment, you can
discard the TXO data itself, and rely on untrusted entities to provide you that
data on demand.
2) The TXO set is a super-set of the UTXO set; all data in the UTXO set is also
present in the TXO set. Thus a TXO commitment with spent TXO's pruned is
equivalent to a UTXO set, doubly so if inner nodes in the commitment tree
commit to the sum-unspent of their children.
3) Where a outpoint-indexed UTXO set has a uniform access pattern, an
insertion-ordered TXO set has a delibrately *non-uniform* access pattern: not
only are new entries to the TXO set always appended to the end - an operation
that requires a known, log2(n), sized set of merkle tips - but due to lost
coins alone we can guarantee that older entries in the TXO set will be less
frequently updated than newer entries.
4) Thus a full node that doesn't have enough local storage to maintain the full
UTXO set can instead keep track of a TXO commitment, and prune older UTXO's
from it that are unlikely to be spent. In the event those UTXO's are spent,
transactions and blocks spending them can trustlessly provide the necessary
data to temporarily fill-in the node's local TXO set database, allowing the
next commitment to be calculated.
5) By *not* committing the TXO commitment in the block itself, we obsolete my
concept of delayed TXO commitments: you don't need to have calculated the TXO
commitment digest to validate a block anyway!
# Deployment Plan
1) Implement a TXO commitment scheme with the ability to efficiently store the
last n versions of the commitment state for the purpose of reorgs (a
reference-counted scheme naturally does this).
2) Add P2P support for advertising to peers what parts of the TXO set you've
pruned.
3) Add P2P support to produce, consume, and update TXO unspentness proofs as
part of transaction and block relaying.
4) Profit.
# Bootstrapping New Nodes
With a TXO commitment scheme implemented, it's also possible to produce
serialized UTXO snapshots for bootstrapping new nodes. Equally, it's obviously
possible to distribute those snapshots, and have people you trust attest to the
validity of those snapshots.
I argue that a snapshot with an attestation from known individuals that you
trust is a *better* security model than having miners attest to validity: the
latter is trusting an unknown set of unaccountable, anonymous, miners.
This security model is not unlike the recently implemented -assumevalid
scheme(1), in that auditing the validity of the assumed valid TXO commitments
is something anyone can do provided they have a full node. Similarly, we could
ship Bitcoin nodes with an assumed-valid TXO commitment, and have those nodes
fill in the UTXO data from their peers.
However it is a weaker security model, in that a false TXO commitment can more
easily be used to trick a node into accepting invalid transactions/blocks;
assumed valid blocks requires proof-of-work to pull off this attack. A
compromise may be to use assumed valid TXO commitments, extending my partial
UTXO set(2) suggestion of having nodes validate the chain backwards, to
eventually validate 100% of the chain.
# References
1) https://github.com/bitcoin/bitcoin/pull/9484
2) [Bitcoin-development] SPV bitcoind? (was: Introducing
BitcoinKit.framework),
Peter Todd, Jul 17th 2013, Bitcoin development mailing list,
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
2013-July/002917.html
--
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Peter Todd via bitcoin-dev
2017-02-23 07:23:10 UTC
Reply
Permalink
Raw Message
Post by Peter Todd via bitcoin-dev
5) By *not* committing the TXO commitment in the block itself, we obsolete my
concept of delayed TXO commitments: you don't need to have calculated the TXO
commitment digest to validate a block anyway!
Thinking about this a bit more, by not being forced to calculate a TXO
commitment for every block, we may be able to do significantly better than
delayed TXO commitments by lazily hashing.

Suppose we have the following perfect merkle tree, which we're using as a
key-value map. We'll represent inner nodes for which we've calculated digests
with "0"'s to represent what version of the tree they correspond too:

0
/ \
/ \
/ \
/ \
/ \
0 0
/ \ / \
/ \ / \
0 0 0 0
/ \ / \ / \ / \
a b c d e f g h

If a value is updated, digests above it become out of date and need to be
recalculated:


1
/ \
/ \
/ \
/ \
/ \
0 1
/ \ / \
/ \ / \
0 0 0 1
/ \ / \ / \ / \
a b c d e f g H

2
/ \
/ \
/ \
/ \
/ \
0 2
/ \ / \
/ \ / \
0 0 2 1
/ \ / \ / \ / \
A b c d e F g H

3
/ \
/ \
/ \
/ \
/ \
0 3
/ \ / \
/ \ / \
0 0 2 3
/ \ / \ / \ / \
a b c d e F G H

Suppose however that your implementation does lazy hashing; after the 3rd
update your state will be:

.
/ \
/ \
/ \
/ \
/ \
0 .
/ \ / \
/ \ / \
0 0 . .
/ \ / \ / \ / \
a b c d e F G H

Basically all the digests on the right side is out of date and need to be
recalculated. Now, first of all it's obviously possible for your implementation
to keep updating values in the tree given their keys - you've essentially
regressed to a bog standard binary tree.

But what happens if you discard part of your dataset? Let's suppose you've
discarded the left half:

.
/ \
/ \
/ \
/ \
/ \
0 .
/ \
/ \
. .
/ \ / \
e F G H

Note how you still have sufficient information to calculate the current merkle
tip commitment: the left side hasn't changed yet. But what happens when someone
gives you an update proof? Specifically, suppose they want to change b -> B.
That requires them to provide you with the part of the merkle tree proving that
position #1 is b. Now you might think that's this data:

3
/ \
/ \
/ \
/ \
/ \
0 3
/ \
/ \
0 0
/ \
a b

But the inner node digests marked "3" are useless to you: you haven't
calculated those digests yet so you can't compare them to anything. What you
can compare is the following:

0
/ \
/ \
0 0
/ \
a b

With that extra data your local knowledge is now:

.
/ \
/ \
/ \
/ \
/ \
0 .
/ \ / \
/ \ / \
0 0 . .
/ \ / \ / \
a b e F G H

Allowing you to apply the update:

.
/ \
/ \
/ \
/ \
/ \
. .
/ \ / \
/ \ / \
. 0 . .
/ \ / \ / \
a B e F G H

If you want to again prune that data, simply recalculate the digests so you
can verify a copy given to you by a peer in the future:

.
/ \
/ \
/ \
/ \
/ \
4 .
/ \ / \
/ \ / \
4 0 . .
/ \ / \ / \
a B e F G H

And prune, leaving you with:

.
/ \
/ \
/ \
/ \
/ \
4 .
/ \
/ \
. .
/ \ / \
e F G H


So tl;dr: the reason this works is that we can substitute commitments for
pointers: our merkle tree can also be viewed as a binary tree. So a reasonable
real-world implementation would be to delay computation of digests for anything
we have in RAM, and only compute digests as in-RAM data is flushed to disk.
Equally, on disk we can use standard time-space tradeoffs to only store a
subset of the digests, recalculating the rest on the fly. Given that'd we could
effectively combine both a cryptographic data structure and a standard
pointer-based data structure in one, I suspect we can get good performance out
of this.

The main subtlety of this approach will be how exactly to handle the proofs:
the level of verification possible depends on what digests a given node has
calculated, and we want to avoid making network splitting attacks possible by
attackers deliberately giving nodes proofs with upper digests that are
incorrect, something only some nodes can detect. Not sure yet exactly what's
the right approach there.

Finally, notice how this entire approach depends on schemes like MMR's where
the overall structure of the tree does not change as nodes are added and
updated; it would be much harder to implement this idea for something like a
merklized red-black tree where the structure changes as the tree is rebalanced.
--
https://petertodd.org 'peter'[:-1]@petertodd.org
Loading...