[bitcoin-dev] Proposal for utxo commitment format

Discussion:

Bram Cohen via bitcoin-dev

2017-02-21 22:00:23 UTC

Here is a Merkle set data structure, whose format may be useful for utxo
commitments in Bitcoin blocks. It may also be useful for any other
distributed computation which wants an audit trail:

https://github.com/bramcohen/MerkleSet

This is a fairly straightforward Patricia Trie, with a simple format and a
simple reference implementation plus a performance optimized non-reference
implementation which is much more cache coherent. It will need to be ported
to C and be properly turned before the potential performance gains can be
realized though.

The clever things which affect the format spec are:

It uses blake2s as the internal hash function. This is the fastest hash
function to use on 512 bit inputs because blake2b uses a 1024 bit block
size. It might make sense to use a hypothetical variant of blake which is
optimized for 64 bits with a 512 bit block size, but that hasn't been
specified. Sha256 would take advantage of hardware acceleration, but that
isn't available everywhere.

Two bits of security are sacrificed to include metadata inline which halves
the CPU cost of hashing.

When one side of a node is empty and the other contains exactly two things
the secure hash of the child is adopted verbatim rather than rehashing it.
This roughly halves the amount of hashing done, and makes it more resistant
to malicious data, and cleans up some implementation details, at the cost
of some extra complexity.

Peter Todd via bitcoin-dev

2017-02-23 01:26:11 UTC

Permalink

Post by Bram Cohen via bitcoin-dev
When one side of a node is empty and the other contains exactly two things
the secure hash of the child is adopted verbatim rather than rehashing it.
This roughly halves the amount of hashing done, and makes it more resistant
to malicious data, and cleans up some implementation details, at the cost
of some extra complexity.

Note that this is a use-case specific concept of an idea I'm calling a
"generalized commitment"

A commitment scheme needs only have the property that it's not feasible to find
two messages m1 and m2 that map to the same commitment; it is *not* required
that it be difficult to find m given the commitment. Equally, it's not required
that commitments always be the same size.

So a perfectly reasonable thing to do is design your scheme such that the
commitment to short messages is the message itself! This adds just a single bit
of data to the minimum serialized size(1) of the commitment, and in situations
where sub-digest-sized messages are common, may overall be a savings.

Another advantage is that the scheme becomes more user-friendly: you *want*
programmers to notice when a commitment is not effectively hiding the message!
If you need message privacy, you should implement an explicit nonce, rather
than relying on the data to not be brute-forcable.

1) The more I look at these systems, the more I'm inclined to consider
bit-granularity serialization schemes... Heck, sub-bit granularity has
advantages too in some cases, e.g. by making all possible inputs to the
deserializer be valid.

--
https://petertodd.org 'peter'[:-1]@petertodd.org

Bram Cohen via bitcoin-dev

2017-02-23 02:56:35 UTC

Permalink

Post by Peter Todd via bitcoin-dev
A commitment scheme needs only have the property that it's not feasible to find
two messages m1 and m2 that map to the same commitment; it is *not* required
that it be difficult to find m given the commitment. Equally, it's not required
that commitments always be the same size.
So a perfectly reasonable thing to do is design your scheme such that the
commitment to short messages is the message itself! This adds just a single bit
of data to the minimum serialized size(1) of the commitment, and in situations
where sub-digest-sized messages are common, may overall be a savings.

Yes I'm basically doing that but to make things be all the same size I'm
including the bit inline, sacrificing one bit of security. Actually I'm
sacrificing two bits of security, to allow for four values: terminal,
middle, empty, and invalid. Invalid is used internally when a value has yet
to be calculated lazily and in proofs to mean 'this is a middle node but
the children are not included'. One effect of this is that the root of a
set containing a single value is just that value with the two high order
bits of the first byte reset to the appropriate value.