Discussion:
Small Nodes: A Better Alternative to Pruned Nodes
Add Reply
David Vorick via bitcoin-dev
2017-04-17 06:54:49 UTC
Reply
Permalink
Raw Message
*Rationale:*

A node that stores the full blockchain (I will use the term archival node)
requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.

The best alternative today to storing the full blockchain is to run a
pruned node, which keeps only the UTXO set and throws away already verified
blocks. The operator of the pruned node is able to enjoy the full security
benefits of a full node, but is essentially leeching the network, as they
performed a large download likely without contributing anything back.

This puts more pressure on the archival nodes, as the archival nodes need
to pick up the slack and help new nodes bootstrap to the network. As the
pressure on archival nodes grows, fewer people will be able to actually run
archival nodes, and the situation will degrade. The situation would likely
become problematic quickly if bitcoin-core were to ship with the defaults
set to a pruned node.

Even further, the people most likely to care about saving 100GB of disk
space are also the people least likely to care about some extra bandwidth
usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
bandwidth is usually the biggest cost of running the node. For home users
however, as long as they stay under their bandwidth cap, the bandwidth is
actually free. Ideally, new nodes would be able to bootstrap from nodes
that do not have to pay for their bandwidth, instead of needing to rely on
a decreasing percentage of heavy-duty archival nodes.

I have (perhaps incorrectly) identified disk space consumption as the most
significant factor in your average user choosing to run a pruned node or a
lite client instead of a full node. The average user is not typically too
worried about bandwidth, and is also not typically too worried about
initial blockchain download time. But the 100GB hit to your disk space can
be a huge psychological factor, especially if your hard drive only has
500GB available in the first place, and 250+ GB is already consumed by
other files you have.

I believe that improving the disk usage situation would greatly benefit
decentralization, especially if it could be done without putting pressure
on archival nodes.

*Small Nodes Proposal:*

I propose an alternative to the pruned node that does not put undue
pressure on archival nodes, and would be acceptable and non-risky to ship
as a default in bitcoin-core. For lack of a better name, I'll call this new
type of node a 'small node'. The intention is that bitcoin-core would
eventually ship 'small nodes' by default, such that the expected amount of
disk consumption drops from today's 100+ GB to less than 30 GB.

My alternative proposal has the following properties:

+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the entire
blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes running
today will be unable to prevent new nodes from downloading the full
blockchain, even if the attacker is also able to eliminate all archival
nodes. (assuming all nodes today were small nodes instead of archival nodes)

Method:

A small node will pick an index [5, 256). This index is that node's
permanent index. When storing a block, instead of storing the full block,
the node will use Reed-Solomon coding to erasure code the block using a
5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
the block each. The node picks the piece that corresponds to its index, and
stores that instead. (Indexes 0-4 are reserved for archival nodes -
explained later)

The node is now storing a fragment of every block. Alone, this fragment
cannot be used to recover any piece of the blockchain. However, when paired
with any 5 unique fragments (fragments of the same index will not be
unique), the full block can be recovered.

Nodes can optionally store more than 1 fragment each. At 5 fragments, the
node becomes a full archival node, and the chosen indexes should be 0-4.
This is advantageous for the archival node as the encoded data for the
first 5 indexes will actually be identical to the block itself - there is
no computational overhead for selecting the first indexes. There is also no
need to choose random indexes, because the full block can be recovered no
matter which indexes are chosen.

When connecting to new peers, the indexes of each peer needs to be known.
Once peers totaling 5 unique indexes are discovered, blockchain download
can begin. Connecting to just 5 small node peers provides a >95% chance of
getting 5 uniques, with exponentially improving odds of success as you
connect to more peers. Connecting to a single archive node guarantees that
any gaps can be filled.

A good encoder should be able to turn a block into a 5-of-256 piece set in
under 10 milliseconds using a single core on a standard consumer desktop.
This should not slow down initial blockchain download substantially, though
the overhead is more than a rounding error.

*DoS Prevention:*

A malicious node may provide garbage data instead of the actual piece.
Given just the garbage data and 4 other correct pieces, it is impossible
(best I know anyway) to tell which piece is the garbage piece.

One option in this case would be to seek out an archival node that could
verify the correctness of the pieces, and identify the malicious node.

Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.

Another solution would be to find additional pieces and brute-force
combinations of 5 until a working combination was discovered. Though this
sounds nasty, it should take less than five seconds of computation to find
the working combination given 5 correct pieces and 2 incorrect pieces. This
computation only needs to be performed once to identify the malicious peers.

I also believe that alternative erasure coding schemes exist which actually
are able to identify the bad pieces given sufficient good pieces, however I
don't know if they have the same computational performance as the best
Reed-Solomon coding implementations.

*Deployment:*

Small nodes are completely useless unless the critical mass of 5 pieces can
be obtained. The first version that supports small node block downloads
should default everyone to an archival node (meaning indexes 0-4 are used)

Once there are enough small-node-enabled archive nodes, the default can be
switched so that nodes only have a single index by default. In the first
few days, when there are only a few small nodes, the previously-deployed
archival nodes can help fill in the gaps, and the small nodes can be useful
for blockchain download right away.

----------------------------------

This represents a non-trivial amount of code, but I believe that the result
would be a non-trivial increase in the percentage of users running full
nodes, and a healthier overall network.
Danny Thorpe via bitcoin-dev
2017-04-17 07:11:07 UTC
Reply
Permalink
Raw Message
1TB HDD is now available for under $40 USD. How is the 100GB storage
requirement preventing anyone from setting up full nodes?

On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" <
Post by David Vorick via bitcoin-dev
*Rationale:*
A node that stores the full blockchain (I will use the term archival node)
requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.
The best alternative today to storing the full blockchain is to run a
pruned node, which keeps only the UTXO set and throws away already verified
blocks. The operator of the pruned node is able to enjoy the full security
benefits of a full node, but is essentially leeching the network, as they
performed a large download likely without contributing anything back.
This puts more pressure on the archival nodes, as the archival nodes need
to pick up the slack and help new nodes bootstrap to the network. As the
pressure on archival nodes grows, fewer people will be able to actually run
archival nodes, and the situation will degrade. The situation would likely
become problematic quickly if bitcoin-core were to ship with the defaults
set to a pruned node.
Even further, the people most likely to care about saving 100GB of disk
space are also the people least likely to care about some extra bandwidth
usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
bandwidth is usually the biggest cost of running the node. For home users
however, as long as they stay under their bandwidth cap, the bandwidth is
actually free. Ideally, new nodes would be able to bootstrap from nodes
that do not have to pay for their bandwidth, instead of needing to rely on
a decreasing percentage of heavy-duty archival nodes.
I have (perhaps incorrectly) identified disk space consumption as the most
significant factor in your average user choosing to run a pruned node or a
lite client instead of a full node. The average user is not typically too
worried about bandwidth, and is also not typically too worried about
initial blockchain download time. But the 100GB hit to your disk space can
be a huge psychological factor, especially if your hard drive only has
500GB available in the first place, and 250+ GB is already consumed by
other files you have.
I believe that improving the disk usage situation would greatly benefit
decentralization, especially if it could be done without putting pressure
on archival nodes.
*Small Nodes Proposal:*
I propose an alternative to the pruned node that does not put undue
pressure on archival nodes, and would be acceptable and non-risky to ship
as a default in bitcoin-core. For lack of a better name, I'll call this new
type of node a 'small node'. The intention is that bitcoin-core would
eventually ship 'small nodes' by default, such that the expected amount of
disk consumption drops from today's 100+ GB to less than 30 GB.
+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the
entire blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes running
today will be unable to prevent new nodes from downloading the full
blockchain, even if the attacker is also able to eliminate all archival
nodes. (assuming all nodes today were small nodes instead of archival nodes)
A small node will pick an index [5, 256). This index is that node's
permanent index. When storing a block, instead of storing the full block,
the node will use Reed-Solomon coding to erasure code the block using a
5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
the block each. The node picks the piece that corresponds to its index, and
stores that instead. (Indexes 0-4 are reserved for archival nodes -
explained later)
The node is now storing a fragment of every block. Alone, this fragment
cannot be used to recover any piece of the blockchain. However, when paired
with any 5 unique fragments (fragments of the same index will not be
unique), the full block can be recovered.
Nodes can optionally store more than 1 fragment each. At 5 fragments, the
node becomes a full archival node, and the chosen indexes should be 0-4.
This is advantageous for the archival node as the encoded data for the
first 5 indexes will actually be identical to the block itself - there is
no computational overhead for selecting the first indexes. There is also no
need to choose random indexes, because the full block can be recovered no
matter which indexes are chosen.
When connecting to new peers, the indexes of each peer needs to be known.
Once peers totaling 5 unique indexes are discovered, blockchain download
can begin. Connecting to just 5 small node peers provides a >95% chance of
getting 5 uniques, with exponentially improving odds of success as you
connect to more peers. Connecting to a single archive node guarantees that
any gaps can be filled.
A good encoder should be able to turn a block into a 5-of-256 piece set in
under 10 milliseconds using a single core on a standard consumer desktop.
This should not slow down initial blockchain download substantially, though
the overhead is more than a rounding error.
*DoS Prevention:*
A malicious node may provide garbage data instead of the actual piece.
Given just the garbage data and 4 other correct pieces, it is impossible
(best I know anyway) to tell which piece is the garbage piece.
One option in this case would be to seek out an archival node that could
verify the correctness of the pieces, and identify the malicious node.
Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.
Another solution would be to find additional pieces and brute-force
combinations of 5 until a working combination was discovered. Though this
sounds nasty, it should take less than five seconds of computation to find
the working combination given 5 correct pieces and 2 incorrect pieces. This
computation only needs to be performed once to identify the malicious peers.
I also believe that alternative erasure coding schemes exist which
actually are able to identify the bad pieces given sufficient good pieces,
however I don't know if they have the same computational performance as the
best Reed-Solomon coding implementations.
*Deployment:*
Small nodes are completely useless unless the critical mass of 5 pieces
can be obtained. The first version that supports small node block downloads
should default everyone to an archival node (meaning indexes 0-4 are used)
Once there are enough small-node-enabled archive nodes, the default can be
switched so that nodes only have a single index by default. In the first
few days, when there are only a few small nodes, the previously-deployed
archival nodes can help fill in the gaps, and the small nodes can be useful
for blockchain download right away.
----------------------------------
This represents a non-trivial amount of code, but I believe that the
result would be a non-trivial increase in the percentage of users running
full nodes, and a healthier overall network.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
David Vorick via bitcoin-dev
2017-04-17 07:27:35 UTC
Reply
Permalink
Raw Message
Most people do not want to go out and buy new hardware to run a Bitcoin
node. The want to use the hardware that they already own, and usually that
hardware is going to have a non-generous amount of disk space. 500GB SSD
with no HDD is common in computers today.

But really, the best test is to go out and talk to people. Ask them if they
run a full node, and if they say no, ask them why not. In my experience,
the most common answer by a significant margin is that they don't want to
lose the disk space. That psychology is far more important than any example
of cheap hard drives. People don't want to go out and buy a hard drive so
that they can run Bitcoin. It's a non-starter.
Erik Aronesty via bitcoin-dev
2017-04-20 15:50:24 UTC
Reply
Permalink
Raw Message
Try to find 1TB dedicated server hosting ...

If you want to set up an ecommerce site somewhere besides your living room,
storage costs are still a concern.

On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bitcoin-dev <
Post by Danny Thorpe via bitcoin-dev
1TB HDD is now available for under $40 USD. How is the 100GB storage
requirement preventing anyone from setting up full nodes?
On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" <
Post by David Vorick via bitcoin-dev
*Rationale:*
A node that stores the full blockchain (I will use the term archival
node) requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.
The best alternative today to storing the full blockchain is to run a
pruned node, which keeps only the UTXO set and throws away already verified
blocks. The operator of the pruned node is able to enjoy the full security
benefits of a full node, but is essentially leeching the network, as they
performed a large download likely without contributing anything back.
This puts more pressure on the archival nodes, as the archival nodes need
to pick up the slack and help new nodes bootstrap to the network. As the
pressure on archival nodes grows, fewer people will be able to actually run
archival nodes, and the situation will degrade. The situation would likely
become problematic quickly if bitcoin-core were to ship with the defaults
set to a pruned node.
Even further, the people most likely to care about saving 100GB of disk
space are also the people least likely to care about some extra bandwidth
usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
bandwidth is usually the biggest cost of running the node. For home users
however, as long as they stay under their bandwidth cap, the bandwidth is
actually free. Ideally, new nodes would be able to bootstrap from nodes
that do not have to pay for their bandwidth, instead of needing to rely on
a decreasing percentage of heavy-duty archival nodes.
I have (perhaps incorrectly) identified disk space consumption as the
most significant factor in your average user choosing to run a pruned node
or a lite client instead of a full node. The average user is not typically
too worried about bandwidth, and is also not typically too worried about
initial blockchain download time. But the 100GB hit to your disk space can
be a huge psychological factor, especially if your hard drive only has
500GB available in the first place, and 250+ GB is already consumed by
other files you have.
I believe that improving the disk usage situation would greatly benefit
decentralization, especially if it could be done without putting pressure
on archival nodes.
*Small Nodes Proposal:*
I propose an alternative to the pruned node that does not put undue
pressure on archival nodes, and would be acceptable and non-risky to ship
as a default in bitcoin-core. For lack of a better name, I'll call this new
type of node a 'small node'. The intention is that bitcoin-core would
eventually ship 'small nodes' by default, such that the expected amount of
disk consumption drops from today's 100+ GB to less than 30 GB.
+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the
entire blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes running
today will be unable to prevent new nodes from downloading the full
blockchain, even if the attacker is also able to eliminate all archival
nodes. (assuming all nodes today were small nodes instead of archival nodes)
A small node will pick an index [5, 256). This index is that node's
permanent index. When storing a block, instead of storing the full block,
the node will use Reed-Solomon coding to erasure code the block using a
5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
the block each. The node picks the piece that corresponds to its index, and
stores that instead. (Indexes 0-4 are reserved for archival nodes -
explained later)
The node is now storing a fragment of every block. Alone, this fragment
cannot be used to recover any piece of the blockchain. However, when paired
with any 5 unique fragments (fragments of the same index will not be
unique), the full block can be recovered.
Nodes can optionally store more than 1 fragment each. At 5 fragments, the
node becomes a full archival node, and the chosen indexes should be 0-4.
This is advantageous for the archival node as the encoded data for the
first 5 indexes will actually be identical to the block itself - there is
no computational overhead for selecting the first indexes. There is also no
need to choose random indexes, because the full block can be recovered no
matter which indexes are chosen.
When connecting to new peers, the indexes of each peer needs to be known.
Once peers totaling 5 unique indexes are discovered, blockchain download
can begin. Connecting to just 5 small node peers provides a >95% chance of
getting 5 uniques, with exponentially improving odds of success as you
connect to more peers. Connecting to a single archive node guarantees that
any gaps can be filled.
A good encoder should be able to turn a block into a 5-of-256 piece set
in under 10 milliseconds using a single core on a standard consumer
desktop. This should not slow down initial blockchain download
substantially, though the overhead is more than a rounding error.
*DoS Prevention:*
A malicious node may provide garbage data instead of the actual piece.
Given just the garbage data and 4 other correct pieces, it is impossible
(best I know anyway) to tell which piece is the garbage piece.
One option in this case would be to seek out an archival node that could
verify the correctness of the pieces, and identify the malicious node.
Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.
Another solution would be to find additional pieces and brute-force
combinations of 5 until a working combination was discovered. Though this
sounds nasty, it should take less than five seconds of computation to find
the working combination given 5 correct pieces and 2 incorrect pieces. This
computation only needs to be performed once to identify the malicious peers.
I also believe that alternative erasure coding schemes exist which
actually are able to identify the bad pieces given sufficient good pieces,
however I don't know if they have the same computational performance as the
best Reed-Solomon coding implementations.
*Deployment:*
Small nodes are completely useless unless the critical mass of 5 pieces
can be obtained. The first version that supports small node block downloads
should default everyone to an archival node (meaning indexes 0-4 are used)
Once there are enough small-node-enabled archive nodes, the default can
be switched so that nodes only have a single index by default. In the first
few days, when there are only a few small nodes, the previously-deployed
archival nodes can help fill in the gaps, and the small nodes can be useful
for blockchain download right away.
----------------------------------
This represents a non-trivial amount of code, but I believe that the
result would be a non-trivial increase in the percentage of users running
full nodes, and a healthier overall network.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Aymeric Vitte via bitcoin-dev
2017-04-20 23:42:03 UTC
Reply
Permalink
Raw Message
??? what do you mean? (https://www.soyoustart.com/fr/serveurs-essential/)
Post by Erik Aronesty via bitcoin-dev
Try to find 1TB dedicated server hosting ...
If you want to set up an ecommerce site somewhere besides your living
room, storage costs are still a concern.
On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bitcoin-dev
1TB HDD is now available for under $40 USD. How is the 100GB
storage requirement preventing anyone from setting up full nodes?
On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev"
*Rationale:*
A node that stores the full blockchain (I will use the term
archival node) requires over 100GB of disk space, which I
believe is one of the most significant barriers to more people
running full nodes. And I believe the ecosystem would benefit
substantially if more users were running full nodes.
The best alternative today to storing the full blockchain is
to run a pruned node, which keeps only the UTXO set and throws
away already verified blocks. The operator of the pruned node
is able to enjoy the full security benefits of a full node,
but is essentially leeching the network, as they performed a
large download likely without contributing anything back.
This puts more pressure on the archival nodes, as the archival
nodes need to pick up the slack and help new nodes bootstrap
to the network. As the pressure on archival nodes grows, fewer
people will be able to actually run archival nodes, and the
situation will degrade. The situation would likely become
problematic quickly if bitcoin-core were to ship with the
defaults set to a pruned node.
Even further, the people most likely to care about saving
100GB of disk space are also the people least likely to care
about some extra bandwidth usage. For datacenter nodes, and
for nodes doing lots of bandwidth, the bandwidth is usually
the biggest cost of running the node. For home users however,
as long as they stay under their bandwidth cap, the bandwidth
is actually free. Ideally, new nodes would be able to
bootstrap from nodes that do not have to pay for their
bandwidth, instead of needing to rely on a decreasing
percentage of heavy-duty archival nodes.
I have (perhaps incorrectly) identified disk space consumption
as the most significant factor in your average user choosing
to run a pruned node or a lite client instead of a full node.
The average user is not typically too worried about bandwidth,
and is also not typically too worried about initial blockchain
download time. But the 100GB hit to your disk space can be a
huge psychological factor, especially if your hard drive only
has 500GB available in the first place, and 250+ GB is already
consumed by other files you have.
I believe that improving the disk usage situation would
greatly benefit decentralization, especially if it could be
done without putting pressure on archival nodes.
*Small Nodes Proposal:*
I propose an alternative to the pruned node that does not put
undue pressure on archival nodes, and would be acceptable and
non-risky to ship as a default in bitcoin-core. For lack of a
better name, I'll call this new type of node a 'small node'.
The intention is that bitcoin-core would eventually ship
'small nodes' by default, such that the expected amount of
disk consumption drops from today's 100+ GB to less than 30 GB.
+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to
recover the entire blockchain by connecting to 6 random small
node peers.
+ An attacker that can eliminate a chosen+ 95% of the full
nodes running today will be unable to prevent new nodes from
downloading the full blockchain, even if the attacker is also
able to eliminate all archival nodes. (assuming all nodes
today were small nodes instead of archival nodes)
A small node will pick an index [5, 256). This index is that
node's permanent index. When storing a block, instead of
storing the full block, the node will use Reed-Solomon coding
to erasure code the block using a 5-of-256 scheme. The result
will be 256 pieces that are 20% of the size of the block each.
The node picks the piece that corresponds to its index, and
stores that instead. (Indexes 0-4 are reserved for archival
nodes - explained later)
The node is now storing a fragment of every block. Alone, this
fragment cannot be used to recover any piece of the
blockchain. However, when paired with any 5 unique fragments
(fragments of the same index will not be unique), the full
block can be recovered.
Nodes can optionally store more than 1 fragment each. At 5
fragments, the node becomes a full archival node, and the
chosen indexes should be 0-4. This is advantageous for the
archival node as the encoded data for the first 5 indexes will
actually be identical to the block itself - there is no
computational overhead for selecting the first indexes. There
is also no need to choose random indexes, because the full
block can be recovered no matter which indexes are chosen.
When connecting to new peers, the indexes of each peer needs
to be known. Once peers totaling 5 unique indexes are
discovered, blockchain download can begin. Connecting to just
5 small node peers provides a >95% chance of getting 5
uniques, with exponentially improving odds of success as you
connect to more peers. Connecting to a single archive node
guarantees that any gaps can be filled.
A good encoder should be able to turn a block into a 5-of-256
piece set in under 10 milliseconds using a single core on a
standard consumer desktop. This should not slow down initial
blockchain download substantially, though the overhead is more
than a rounding error.
*DoS Prevention:*
A malicious node may provide garbage data instead of the
actual piece. Given just the garbage data and 4 other correct
pieces, it is impossible (best I know anyway) to tell which
piece is the garbage piece.
One option in this case would be to seek out an archival node
that could verify the correctness of the pieces, and identify
the malicious node.
Another option would be to have the small nodes store a
cryptographic checksum of each piece. Obtaining the
cryptographic checksum for all 256 pieces would incur a
nontrivial amount of hashing (post segwit, as much as 100MB of
extra hashing per block), and would require an additional ~4kb
of storage per block. The hashing overhead here may be
prohibitive.
Another solution would be to find additional pieces and
brute-force combinations of 5 until a working combination was
discovered. Though this sounds nasty, it should take less than
five seconds of computation to find the working combination
given 5 correct pieces and 2 incorrect pieces. This
computation only needs to be performed once to identify the
malicious peers.
I also believe that alternative erasure coding schemes exist
which actually are able to identify the bad pieces given
sufficient good pieces, however I don't know if they have the
same computational performance as the best Reed-Solomon coding
implementations.
*Deployment:*
Small nodes are completely useless unless the critical mass of
5 pieces can be obtained. The first version that supports
small node block downloads should default everyone to an
archival node (meaning indexes 0-4 are used)
Once there are enough small-node-enabled archive nodes, the
default can be switched so that nodes only have a single index
by default. In the first few days, when there are only a few
small nodes, the previously-deployed archival nodes can help
fill in the gaps, and the small nodes can be useful for
blockchain download right away.
----------------------------------
This represents a non-trivial amount of code, but I believe
that the result would be a non-trivial increase in the
percentage of users running full nodes, and a healthier
overall network.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
<https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
<https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
--
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
David Kaufman via bitcoin-dev
2017-04-21 13:35:51 UTC
Reply
Permalink
Raw Message
Hi Danny,
Post by Danny Thorpe via bitcoin-dev
1TB HDD is now available for under $40 USD. How is the 100GB storage
requirement preventing anyone from setting up full nodes?
Yeah, but that's because most people (well, using myself as the
"target market" anyway) are upgrading to SSD's for the faster boot and
response times. Modern consumer OS's run incredibly slow on
non-ssd drives! And since the vast majority of consumer laptops sold
today fall into the $400 to $700 range, a 200 - 500gb SSD is about the
most storage upgrade people can afford.

And so I think David's premise, that having to devote only 30GB to
running a full node instead of 100, would remove a major obstacle that
prevents many more people running full bitcoin nodes.

My only suggestion is, does it scale? I mean, if the bitcoin network
volume grows exponentially and in 2 years the blockchain is 500GB, can
the "small node" be adjusted down from one fifth of the blockchain to
just one-tenth, or one twentieth? Can different smalInesses
interoperate? Can I choose to store a small node with 20 - 30% of the
blockchain, while others chose to share just 5% or 10% of it? Can I run
"less small" node today that's 50GB?

Can the default install be a "small node" that requires about 30GB of
storage (if that is indeed the sweet spot for enticing many more users to
bringing nodes online), but allow the user at install time, to choose *how*
small? To, say, drag a slider anywhere up and down the range from
10GB to 100GB?

If not, then it will have to be revisited constantly as the blockchain
grows, and disk storage prices drop. I suspect the blockchain will
grow in size, at some point in the not too distant future, much faster
than storage prices drop, so making small, smaller and smallest nodes
that can be configured to store more or less of it will be necessary
to motivate most users to run nodes at all. But when that happens,
there is likely to be exponentially *more* people using bitcoin, too!
So an exponentially growing number of users running (smaller and
smaller) nodes would take up the slack.

Then, the blockchain would begin to look a lot more like a bittorrent,
right? ;-) but -- happily -- one that you never need to download fully.

-dave
Leandro Coutinho via bitcoin-dev
2017-04-21 15:58:43 UTC
Reply
Permalink
Raw Message
Maybe it already exists ...

#9484 <https://github.com/bitcoin/bitcoin/pull/9484> 812714f
<https://github.com/bitcoin/bitcoin/commit/812714f> Introduce assumevalid
setting to skip validation presumed valid scripts (gmaxwell)
https://github.com/bitcoin/bitcoin/pull/9484

..., but ...
It would be very interesting if a new node could decide to be a pruned node:
- it would need to trust one or more peers for the initial blockchain
download, because the blocks downloaded would not be validated
- it would decide a time from when to get the blocks, like a week before
- once a day a routine would run that would prune blocks older than the
chosen time

"

*The unspent transaction outputs (which is the only essential piece ofdata
necessary for validation) are already kept in a separate database,so
technically removing old blocks is perfectly possible.*" Pieter Wuille
https://bitcoin.stackexchange.com/questions/11170/why-is-pruning-not-considered-already-at-the-moment


On Fri, Apr 21, 2017 at 10:35 AM, David Kaufman via bitcoin-dev <
Post by David Kaufman via bitcoin-dev
Hi Danny,
Post by Danny Thorpe via bitcoin-dev
1TB HDD is now available for under $40 USD. How is the 100GB storage
requirement preventing anyone from setting up full nodes?
Yeah, but that's because most people (well, using myself as the
"target market" anyway) are upgrading to SSD's for the faster boot and
response times. Modern consumer OS's run incredibly slow on
non-ssd drives! And since the vast majority of consumer laptops sold
today fall into the $400 to $700 range, a 200 - 500gb SSD is about the
most storage upgrade people can afford.
And so I think David's premise, that having to devote only 30GB to
running a full node instead of 100, would remove a major obstacle that
prevents many more people running full bitcoin nodes.
My only suggestion is, does it scale? I mean, if the bitcoin network
volume grows exponentially and in 2 years the blockchain is 500GB, can
the "small node" be adjusted down from one fifth of the blockchain to
just one-tenth, or one twentieth? Can different smalInesses
interoperate? Can I choose to store a small node with 20 - 30% of the
blockchain, while others chose to share just 5% or 10% of it? Can I run
"less small" node today that's 50GB?
Can the default install be a "small node" that requires about 30GB of
storage (if that is indeed the sweet spot for enticing many more users to
bringing nodes online), but allow the user at install time, to choose *how*
small? To, say, drag a slider anywhere up and down the range from
10GB to 100GB?
If not, then it will have to be revisited constantly as the blockchain
grows, and disk storage prices drop. I suspect the blockchain will
grow in size, at some point in the not too distant future, much faster
than storage prices drop, so making small, smaller and smallest nodes
that can be configured to store more or less of it will be necessary
to motivate most users to run nodes at all. But when that happens,
there is likely to be exponentially *more* people using bitcoin, too!
So an exponentially growing number of users running (smaller and
smaller) nodes would take up the slack.
Then, the blockchain would begin to look a lot more like a bittorrent,
right? ;-) but -- happily -- one that you never need to download fully.
-dave
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Aymeric Vitte via bitcoin-dev
2017-04-17 10:14:25 UTC
Reply
Permalink
Raw Message
While I fully agree with the intent (increasing full nodes so a big
miner waking up in a bad mood can't threaten the world any longer every
day as it is now) I am not sure to get the interest of this proposal,
because:

- it's probably not a good idea to encourage the home users to run full
nodes, there are many people running servers far from their capacity
that could easily run efficient full nodes

- if someone can't allocate 100 GB today to run a full node, then we
can't expect him to allocate more in the future

- the download time is a real concern

- this proposal is a kind of reinventing torrents, while limiting the
number of connections to something not efficient at all, I don't see why
something that is proven to be super efficient (torrents) would be
needed to be reinvented, I am not saying that it should be used as the
bittorrent network is doing but the concepts can be reused

- I don't get at all the concept of "archival" nodes since it's another
useless step toward centralization

I think the only way to increase full nodes it to design an incentive
for people to run them
Post by David Vorick via bitcoin-dev
*Rationale:*
A node that stores the full blockchain (I will use the term archival
node) requires over 100GB of disk space, which I believe is one of the
most significant barriers to more people running full nodes. And I
believe the ecosystem would benefit substantially if more users were
running full nodes.
The best alternative today to storing the full blockchain is to run a
pruned node, which keeps only the UTXO set and throws away already
verified blocks. The operator of the pruned node is able to enjoy the
full security benefits of a full node, but is essentially leeching the
network, as they performed a large download likely without
contributing anything back.
This puts more pressure on the archival nodes, as the archival nodes
need to pick up the slack and help new nodes bootstrap to the network.
As the pressure on archival nodes grows, fewer people will be able to
actually run archival nodes, and the situation will degrade. The
situation would likely become problematic quickly if bitcoin-core were
to ship with the defaults set to a pruned node.
Even further, the people most likely to care about saving 100GB of
disk space are also the people least likely to care about some extra
bandwidth usage. For datacenter nodes, and for nodes doing lots of
bandwidth, the bandwidth is usually the biggest cost of running the
node. For home users however, as long as they stay under their
bandwidth cap, the bandwidth is actually free. Ideally, new nodes
would be able to bootstrap from nodes that do not have to pay for
their bandwidth, instead of needing to rely on a decreasing percentage
of heavy-duty archival nodes.
I have (perhaps incorrectly) identified disk space consumption as the
most significant factor in your average user choosing to run a pruned
node or a lite client instead of a full node. The average user is not
typically too worried about bandwidth, and is also not typically too
worried about initial blockchain download time. But the 100GB hit to
your disk space can be a huge psychological factor, especially if your
hard drive only has 500GB available in the first place, and 250+ GB is
already consumed by other files you have.
I believe that improving the disk usage situation would greatly
benefit decentralization, especially if it could be done without
putting pressure on archival nodes.
*Small Nodes Proposal:*
I propose an alternative to the pruned node that does not put undue
pressure on archival nodes, and would be acceptable and non-risky to
ship as a default in bitcoin-core. For lack of a better name, I'll
call this new type of node a 'small node'. The intention is that
bitcoin-core would eventually ship 'small nodes' by default, such that
the expected amount of disk consumption drops from today's 100+ GB to
less than 30 GB.
+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the
entire blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes
running today will be unable to prevent new nodes from downloading the
full blockchain, even if the attacker is also able to eliminate all
archival nodes. (assuming all nodes today were small nodes instead of
archival nodes)
A small node will pick an index [5, 256). This index is that node's
permanent index. When storing a block, instead of storing the full
block, the node will use Reed-Solomon coding to erasure code the block
using a 5-of-256 scheme. The result will be 256 pieces that are 20% of
the size of the block each. The node picks the piece that corresponds
to its index, and stores that instead. (Indexes 0-4 are reserved for
archival nodes - explained later)
The node is now storing a fragment of every block. Alone, this
fragment cannot be used to recover any piece of the blockchain.
However, when paired with any 5 unique fragments (fragments of the
same index will not be unique), the full block can be recovered.
Nodes can optionally store more than 1 fragment each. At 5 fragments,
the node becomes a full archival node, and the chosen indexes should
be 0-4. This is advantageous for the archival node as the encoded data
for the first 5 indexes will actually be identical to the block itself
- there is no computational overhead for selecting the first indexes.
There is also no need to choose random indexes, because the full block
can be recovered no matter which indexes are chosen.
When connecting to new peers, the indexes of each peer needs to be
known. Once peers totaling 5 unique indexes are discovered, blockchain
download can begin. Connecting to just 5 small node peers provides a
95% chance of getting 5 uniques, with exponentially improving odds of
success as you connect to more peers. Connecting to a single archive
node guarantees that any gaps can be filled.
A good encoder should be able to turn a block into a 5-of-256 piece
set in under 10 milliseconds using a single core on a standard
consumer desktop. This should not slow down initial blockchain
download substantially, though the overhead is more than a rounding error.
*DoS Prevention:*
A malicious node may provide garbage data instead of the actual piece.
Given just the garbage data and 4 other correct pieces, it is
impossible (best I know anyway) to tell which piece is the garbage piece.
One option in this case would be to seek out an archival node that
could verify the correctness of the pieces, and identify the malicious
node.
Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all
256 pieces would incur a nontrivial amount of hashing (post segwit, as
much as 100MB of extra hashing per block), and would require an
additional ~4kb of storage per block. The hashing overhead here may be
prohibitive.
Another solution would be to find additional pieces and brute-force
combinations of 5 until a working combination was discovered. Though
this sounds nasty, it should take less than five seconds of
computation to find the working combination given 5 correct pieces and
2 incorrect pieces. This computation only needs to be performed once
to identify the malicious peers.
I also believe that alternative erasure coding schemes exist which
actually are able to identify the bad pieces given sufficient good
pieces, however I don't know if they have the same computational
performance as the best Reed-Solomon coding implementations.
*Deployment:*
Small nodes are completely useless unless the critical mass of 5
pieces can be obtained. The first version that supports small node
block downloads should default everyone to an archival node (meaning
indexes 0-4 are used)
Once there are enough small-node-enabled archive nodes, the default
can be switched so that nodes only have a single index by default. In
the first few days, when there are only a few small nodes, the
previously-deployed archival nodes can help fill in the gaps, and the
small nodes can be useful for blockchain download right away.
----------------------------------
This represents a non-trivial amount of code, but I believe that the
result would be a non-trivial increase in the percentage of users
running full nodes, and a healthier overall network.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
--
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
David Vorick via bitcoin-dev
2017-04-19 17:30:30 UTC
Reply
Permalink
Raw Message
Post by Jonas Schnelli via bitcoin-dev
Hi Dave
*1. I agree that we need to have a way for pruned nodes to partially serve
historical blocks.*
My personal measurements told me that around ~80% of historical block
serving are between tip and -1’000 blocks.
Currently, Core nodes have only two modes of operations, „server all
historical blocks“ or „none“.
This makes little sense especially if you prune to a target size of, lets
say, 80GB (~80% of the chain).
Ideally, there would be a mode where your full node can signal a third
mode „I keep the last 1000 blocks“ (or make this more dynamic).
That probably makes sense with small nodes too. The past 1000 blocks are
such a small footprint compared to the rest of the chain.
Post by Jonas Schnelli via bitcoin-dev
*2. Bootstrapping new peers*
I’m not sure if full nodes must be the single point of historical data
storage. Full nodes provide a valuable service (verification, relay,
filtering, etc.). I’m not sure if serving historical blocks is one of them.
Historical blocks could be made available on CDN’s or other file storage
networks. You are going to verify them anyways,... the serving part is pure
data storage.
I’m also pretty sure that some users have stopping running full nodes
because their upstream bandwidth consumption (because of serving historical
blocks) was getting intolerable.
Especially „consumer“ peers must have been hit by this (little experience
in how to reduce traffic, upstream in general is bad for
consumers-connections, little resources in general).
Perhaps it is not, but I would think that it would be pretty
straightforward to configure a global bandwidth limit within Bitcoin. I
know many torrent clients, and clients for protocols like Tor and i2p
include the ability to set both speed limits and monthly bandwidth limits.
Shipping core with sane default limits is probably sufficient to solve
bandwidth issues for most users. I don't know if default limits may result
in today's archive nodes pulling less weight though - changing the defaults
to have limits may slow the network down as a whole.

In my experience (living in a city where most people have uncapped
connections), disk usage is usually the bigger issue, but certainly
bandwidth is a known problem (especially for rural users) as well.
Post by Jonas Schnelli via bitcoin-dev
Having a second option built into full nodes (or as an external bootstrap
service/app) how to download historical blocks during bootstrapping could
probably be a relieve for "small nodes“.
It could be a little daemon that downloads historical blocks from CDN’s,
etc. and feeds them into your full node over p2p/8333 and kickstarts your
bootstrapping without bothering valuable peers.
Or, the alternative download, could be built into the full nodes main
logic.
And, if it wasn’t obvious, this must not bypass the verification!
I worry about any type of CDN being a central point of failure. CDNs cost
money, which means someone is footing the bill. Torrenting typically relies
on a DHT, which is much easier to attack than Bitcoin's peer network. It's
possible that a decentralized CDN could be used, but I don't think any yet
exist (though I am building one for unrelated reasons) which are both
sufficiently secure and incentive-compatible to be considered as an
alternative to using full nodes to bootstrap.

I just don't want to end up in a situation where 90% of users are getting
their blockchain from the same 3 websites or centralized services. And I
also don't want to rely on any p2p solution which would not stand up to a
serious adversary. Right now, I think the bitcoin p2p network is by
significant margin the best we've got. The landscape for decentralized data
distribution is evolving rapidly though, perhaps in a few years there will
be a superior alternative.
Post by Jonas Schnelli via bitcoin-dev
*To your proposal:*
- Isn’t there a tiny finger-printing element if peers have to pick an
segmentation index?
- SPV bloom filter clients can’t use fragmented blocks to filter txns?
Right? How could they avoid connecting to those peers?
</jonas>
Yes, there is finger-print that happens if you have nodes pick an index.
And the fingerprint gets a lot worse if you have a node pick multiple
indexes. Though, isn't it already required that nodes have some sort of IP
address or hidden service domain? I want to say that the fingerprint
created by picking an index is not a big deal, because it can be separated
from activity like transaction relaying and mining. Though, I am not
certain and perhaps it is a problem.

To be honest, I hadn't really considered SPV nodes at the time of writing.
Small nodes would still be seeing all of the new blocks, and per your
suggestion may also be storing the 1000 or so most recent blocks, but SPV
nodes would still need some way to find all of their historical
transactions. The problem is not fetching blocks, it's figuring out which
blocks are worth fetching. It may be sufficient to have nodes store a bloom
filter for each block indicating which addresses had activity. The bloom
filter would probably only need to be about 1% of the size of the full
block. That's not too much overhead (now you are storing 21% of the block
instead of just 20%), and would retain SPV compatibility.

On Mon, Apr 17, 2017 at 12:17 PM, praxeology_guy <
Post by Jonas Schnelli via bitcoin-dev
FYI With unspent coin snapshots, needing archive data becomes less
important. People can synchronize from a later snapshot instead of the
genesis block. See https://petertodd.org/2016/delayed-txo-commitments
for my current favorite idea.
This is something that I think could help a lot too. If the build processes
included verifying the chain and then creating a utxo snapshot at say,
block 400,000, then nodes would no longer need to download, store, upload,
or share blocks prior to that height. It means that a reorg deeper than
that point would hardfork the network. But a reorg 60k blocks deep is going
to cause problems worse than a network split. Then, the only people who
would ever need to care about the early blocks are developers, and it's
more reasonable to expect developers to go through a longer process and
have more resources than everyday users.

On Mon, Apr 17, 2017 at 6:14 AM, Aymeric Vitte via bitcoin-dev <
Post by Jonas Schnelli via bitcoin-dev
While I fully agree with the intent (increasing full nodes so a big miner
waking up in a bad mood can't threaten the world any longer every day as it
- it's probably not a good idea to encourage the home users to run full
nodes, there are many people running servers far from their capacity that
could easily run efficient full nodes
Running a full node is the only way to avoid needing to trust others. It's
also how you make your opinion worthwhile for events like hard forks and
UASFs. If decentralization is the primary motivation, it absolutely makes
sense to encourage people to run their own full nodes. Without a full node,
you are at the mercy of the governance decisions by those who do have full
nodes. But if you have a full node, you can chose to opt-out of any upgrade
(example: ethereum classic nodes).
Post by Jonas Schnelli via bitcoin-dev
- if someone can't allocate 100 GB today to run a full node, then we can't
expect him to allocate more in the future
That's why I'm proposing something to decrease the storage requirements.
Post by Jonas Schnelli via bitcoin-dev
- this proposal is a kind of reinventing torrents, while limiting the
number of connections to something not efficient at all, I don't see why
something that is proven to be super efficient (torrents) would be needed
to be reinvented, I am not saying that it should be used as the bittorrent
network is doing but the concepts can be reused
It's different from torrents in that it uses specialized erasure coding to
make sure that every block is always available, even if an adversary is
running around targeting all the nodes with a particular piece.
Post by Jonas Schnelli via bitcoin-dev
- I don't get at all the concept of "archival" nodes since it's another
useless step toward centralization
"archival" nodes are simply nodes with the full blockchain. Nobody can
bootstrap on the network without them. Today, every bitcoin-core node is an
archival node by default.
Post by Jonas Schnelli via bitcoin-dev
I think the only way to increase full nodes it to design an incentive for
people to run them
The primary incentive is the sovereignty that it gives you. Running a
Bitcoin full node gives you security and protection against political
garbage that you can't get any other way. The network does currently depend
on altruism to allow people to download the blockchain, but as long as we
can keep the resource requirements of this altruism low, I think we can
expect it to continue. This proposal attempts to keep those requirements
low.
Post by Jonas Schnelli via bitcoin-dev
Post by David Vorick via bitcoin-dev
The best alternative today to storing the full blockchain is to run a
pruned node
The idea looks a little overly complex to me.
I suggested something similar which is a much simpler version;
https://zander.github.io/scaling/Pruning/
Post by David Vorick via bitcoin-dev
# Random pruning mode
There is a large gap between the two current modes of everything
(currently 75GB) and only what we need (2GB or so).
This mode would have two areas, it would keep a days worth of blocks to
make sure that any reorgs etc would not cause a re-download, but it would
have additionally have an area that can be used to store historical data
to be shared on the network. Maybe 20 or 50GB.
One main feature of Bitcoin is that we have massive replication. Each
node
Post by David Vorick via bitcoin-dev
currently holds all the same data that every other node holds. But this
doesn't have to be the case with pruned nodes. A node itself has no need
for historic data at all.
The suggestion is that a node stores a random set of blocks. Dropping
random blocks as the node runs out of disk-space. Additionally, we would
introduce a new way to download blocks from other nodes which allows the
node to say it doesn't actually have the block requested.
The effect of this setup is that many different nodes together end up
having the total amount of blocks, even though each node only has a
fraction of the total amount.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Your proposal has a significant disadvantage: If every peer is dropping 75%
of all blocks randomly, then you need to connect to a large number of peers
to download the whole blockchain. Any given peer has a 25% chance of
holding a block, or rather, a 75% chance of not holding a block. If you
have n peers, your probability of not being able to download a given block
is 0.75^n. If you are downloading 450,000 blocks, you will need to connect
to an expected 46 peers to download the whole blockchain.

Your proposal is also a lot less able to handle active adversaries: if
nodes are randomly dropping blocks, the probability that one block in
particular is dropped by everyone goes up significantly. And the problem
gets a lot worse in the presence of an active adversary. If there are 8000
nodes each dropping 75% of the blocks, then each block on average will only
be held by 2000 nodes. Probabilistically, some unlucky blocks will be held
by fewer than 2000 nodes. An active adversary needs only to eliminate about
2000 nodes (a chosen 2000 nodes) to knock a block off of the network. But
missing even a single block is a significant problem.

Your proposal essentially requires that archive nodes still exist and be a
part of a typical blockchain download. Given that, I don't think it would
be a sound idea to ship as a default in bitcoin core.



On Tue, Apr 18, 2017 at 9:07 AM, Tier Nolan via bitcoin-dev <
Post by Jonas Schnelli via bitcoin-dev
This has been discussed before.
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
2015-May/008101.html
including a list of nice to have features by Maxwell
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
2015-May/008110.html
You meet most of these rules, though you do have to download blocks from
multiple peers.
The suggestion in that thread were for a way to compactly indicate which
blocks a node has. Each node would then store a sub-set of all the
blocks. You just download the blocks you want from the node that has them.
Each node would be recommended to store the last few days worth anyway.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
I believe that my proposal does meet all of the requirements listed by
Maxwell. Having a set of 8 random peers gives you a very high probability
of being able to recover every single block. You would need to connect to
at least 5 peers (and this is already >90% likely to be sufficient to
recover every block), but if you cannot connect to 5 random peers your node
is probably in trouble anyway. Highly parallel, high speed downloads are
just as possible with small nodes as with archive nodes. It only takes a
few bytes to indicate which part of the blockchain you have, and any 2
peers have a less than 1% chance of overlapping.
Aymeric Vitte via bitcoin-dev
2017-04-20 11:27:32 UTC
Reply
Permalink
Raw Message
Thanks but you did not answer all the points and some of your statements
look wrong, like the global ideas behind this proposal from my
standpoint, which basically is inventing strange things not reusing what
is already proven to be working well and could provide the same result,
which at the end is not the expected one, ie increasing full nodes, it
sounds like a strange workaround to prevent the centralization of the
blockchain when pruning will become the default

To answer some other comments in this thread, giving an incentive to run
full nodes does not mean that someone setting up tomorrow 10K nodes will
become rich and/or will be able to control the network, the later being
not unlikely at all to happen in the current situation, the idea is more
to motivate people that already have the resources to run full nodes,
then we mix the concepts of optimizing the resources at no additional
costs (and even decreasing costs since you get rewarded for the part
that you have already paid but don't use) with the one of running nodes
to protect its business

For example
https://gist.github.com/Ayms/aab6f8e08fef0792ab3448f542a826bf#proposal
is showing some concepts where nodes can't position themselves where
they like and are registered in the system by the others (but forget the
proof of something as written in this gist since I think the rewards
should not depend on usual miners) , so it becomes quite difficult that
they position themselves where they like to possibly get the rewards,
fake the system, freeride, cheat, collude in pools or setup plenty of nodes

Comments below
I know many torrent clients, and clients for protocols like Tor
and i2p include the ability to set both speed limits and monthly
bandwidth limits.
Yes, that's the easy part, the issue is more for the network to check
that users have sufficient bandwidth and don't cheat
I worry about any type of CDN being a central point of failure.
Of course
Torrenting typically relies on a DHT, which is much easier to attack
than Bitcoin's peer network.
Then please explain how you would attack the bittorrent DHT and why it's
"much easier" than attacking the btc network today, bittorrent is not
designed for security/privacy, including its DHT, which btw is great,
it's a common sign of misinformation to conclude that all DHTs are
necessarily insecure
I think the bitcoin p2p network is by significant margin the best
we've got.
The btc network can't be considered as a p2p network in its current form
then can't be the best one for now (and if it was then we would not be
in today's situation)
Yes, there is finger-print that happens if you have nodes pick an
index. And the fingerprint gets a lot worse if you have a node pick
multiple indexes.
This is another problem of your proposal, as well as
fingerprinting/tracking peers based on what they have
Though, isn't it already required that nodes have some sort of IP
address or hidden service domain? I want to say that the fingerprint
created by picking an index is not a big deal, because it can be
separated from activity like transaction relaying and mining. Though,
I am not certain and perhaps it is a problem.
Are you suggesting that the btc "p2p" network should be using the Tor
network, and especially the nodes hosting the/(a part of) the
blockchain? This is of course a very bad idea and you would not
eliminate the tracking issue, a simple example is that despite ot the
size of the network it's not difficult to track the peers on the
bittorrent network, you might not know who is the peer but you can
follow whatever he is doing, and hidding behind Tor or a VPN does not
prevent this
On Mon, Apr 17, 2017 at 6:14 AM, Aymeric Vitte via bitcoin-dev
While I fully agree with the intent (increasing full nodes so a
big miner waking up in a bad mood can't threaten the world any
longer every day as it is now) I am not sure to get the interest
- it's probably not a good idea to encourage the home users to run
full nodes, there are many people running servers far from their
capacity that could easily run efficient full nodes
Running a full node is the only way to avoid needing to trust others.
It's also how you make your opinion worthwhile for events like hard
forks and UASFs. If decentralization is the primary motivation, it
absolutely makes sense to encourage people to run their own full
nodes. Without a full node, you are at the mercy of the governance
decisions by those who do have full nodes. But if you have a full
node, you can chose to opt-out of any upgrade (example: ethereum
classic nodes).
If you really know the Tor network, then you know why encouraging home
users to run full nodes is probably not a good idea

"Probably" because the situation is not the same for btc and indeed UASF
for example is referring to "users" who today are not really "users" but
intermediate nodes, so the decision finally is not made by the users
- if someone can't allocate 100 GB today to run a full node, then
we can't expect him to allocate more in the future
That's why I'm proposing something to decrease the storage requirements.
This is just delaying the problem, you are just proposing to store some
parts of the blockchain not explaining how the peers will first setup
the nodes, some parts that will of course increase... then falling back
in the issue that you are trying to address
- this proposal is a kind of reinventing torrents, while limiting
the number of connections to something not efficient at all, I
don't see why something that is proven to be super efficient
(torrents) would be needed to be reinvented, I am not saying that
it should be used as the bittorrent network is doing but the
concepts can be reused
It's different from torrents in that it uses specialized erasure
coding to make sure that every block is always available, even if an
adversary is running around targeting all the nodes with a particular
piece.
You are reinventing something that would be achieved easily using the
concepts of torrents or incremental ones (ie someone would seed the
whole thing, some others some parts of it, etc)
- I don't get at all the concept of "archival" nodes since it's
another useless step toward centralization
"archival" nodes are simply nodes with the full blockchain. Nobody can
bootstrap on the network without them. Today, every bitcoin-core node
is an archival node by default.
What I meant is that you can't build a hierarchy of btc nodes: big
nodes, medium nodes, small nodes, each node is free to decide how/if it
participates to the network, so the wording of "archival" nodes for me
is not adapted since it makes immediately think to centralized entities,
big organizations hosting the blockchain, etc
I think the only way to increase full nodes it to design an
incentive for people to run them
The primary incentive is the sovereignty that it gives you. Running a
Bitcoin full node gives you security and protection against political
garbage that you can't get any other way. The network does currently
depend on altruism to allow people to download the blockchain, but as
long as we can keep the resource requirements of this altruism low, I
think we can expect it to continue. This proposal attempts to keep
those requirements low.
This is the usual answer but I don't believe it, people will rely on
others to run full nodes and secure them, and so on...
I believe that my proposal does meet all of the requirements listed by
Maxwell.
I did noy read them again, some others are listed in the link above
Having a set of 8 random peers gives you a very high probability of
being able to recover every single block. You would need to connect to
at least 5 peers (and this is already >90% likely to be sufficient to
recover every block), but if you cannot connect to 5 random peers your
node is probably in trouble anyway. Highly parallel, high speed
downloads are just as possible with small nodes as with archive nodes.
It only takes a few bytes to indicate which part of the blockchain you
have, and any 2 peers have a less than 1% chance of overlapping.
Again, you don't explain how you bootstrap the full nodes first, which
is the main issue, and if the idea is that the pruning nodes will never
desync then you should try downloading x GB to resync connecting to 5/8
peers possibly operating from home in different countries
--
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
Tom Zander via bitcoin-dev
2017-04-20 09:46:33 UTC
Reply
Permalink
Raw Message
On Wednesday, 19 April 2017 19:30:30 CEST David Vorick via bitcoin-dev
Post by David Vorick via bitcoin-dev
Post by Tom Zander via bitcoin-dev
I suggested something similar which is a much simpler version;
https://zander.github.io/scaling/Pruning/
Your proposal has a significant disadvantage: If every peer is dropping
75% of all blocks randomly, then you need to connect to a large number of
peers to download the whole blockchain.
...
Post by David Vorick via bitcoin-dev
If you are downloading 450,000 blocks, you will need to
connect to an expected 46 peers to download the whole blockchain.
I don’t really see the problem here, even if your math is a off. (Statistics
is difficult, I know). Connecting to many nodes to download faster is really
not an issue and already happens.
Post by David Vorick via bitcoin-dev
Your proposal is also a lot less able to handle active adversaries: if
nodes are randomly dropping blocks, the probability that one block in
particular is dropped by everyone goes up significantly.
You make the assumption that this new mode of pruning will be used by 100%
of the network, this is not how distributed systems work.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Andrew Poelstra via bitcoin-dev
2017-04-20 20:32:12 UTC
Reply
Permalink
Raw Message
Post by Tom Zander via bitcoin-dev
On Wednesday, 19 April 2017 19:30:30 CEST David Vorick via bitcoin-dev
Post by David Vorick via bitcoin-dev
Post by Tom Zander via bitcoin-dev
I suggested something similar which is a much simpler version;
https://zander.github.io/scaling/Pruning/
Your proposal has a significant disadvantage: If every peer is dropping
75% of all blocks randomly, then you need to connect to a large number of
peers to download the whole blockchain.
...
Post by David Vorick via bitcoin-dev
If you are downloading 450,000 blocks, you will need to
connect to an expected 46 peers to download the whole blockchain.
I don’t really see the problem here, even if your math is a off. (Statistics
is difficult, I know). Connecting to many nodes to download faster is really
not an issue and already happens.
I think the expected number of peers is actually ~47.75, which is pretty
close to David's estimate, which was wrong in a way that was actually
more favorable to the "everyone stores random blocks" scheme than the
truth.

Even assuming no archival nodes, and all nodes storing only one random
index between 5 and 255 inclusive, the chance of five arbitrary nodes
giving unique indices by chance is about 98.4%. To get the same probability
from a scheme where each peer has only 25% of the blocks, you need to
connect to 59.59 nodes.

This is over a ten-times increase in the number of nodes required to
download the entire chain, and requires participating nodes to use 25%
more space than David's proposal.
Post by Tom Zander via bitcoin-dev
Post by David Vorick via bitcoin-dev
Your proposal is also a lot less able to handle active adversaries: if
nodes are randomly dropping blocks, the probability that one block in
particular is dropped by everyone goes up significantly.
You make the assumption that this new mode of pruning will be used by 100%
of the network, this is not how distributed systems work.
Storing random but complete blocks requires the assumption this is _not_ the
case; David's does not make any assumptions. So on top of the performance
considerations there is this potential DoS vector.
--
Andrew Poelstra
Mathematics Department, Blockstream
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew

"A goose alone, I suppose, can know the loneliness of geese
who can never find their peace,
whether north or south or west or east"
--Joanna Newsom
Tom Zander via bitcoin-dev
2017-04-21 08:27:36 UTC
Reply
Permalink
Raw Message
Post by Andrew Poelstra via bitcoin-dev
Post by Tom Zander via bitcoin-dev
Post by David Vorick via bitcoin-dev
If you are downloading 450,000 blocks, you will need to
connect to an expected 46 peers to download the whole blockchain.
I don’t really see the problem here, even if your math is a off.
(Statistics is difficult, I know). Connecting to many nodes to download
faster is really not an issue and already happens.
I think the expected number of peers is actually ~47.75
Nice to join bitcoin-dev, Andrew. Haven’t seen you post here before.

I’m not sure how you reached that strange number, but I have to point out
your number is quite useless.

The actual amount of nodes you need to be 100% sure you find all the blocks
when you know each node will have a completely random 25% of the blocks is
not a maths problem that leads to a single answer because of the randomness
involved.
The actual answer is a series of probabilities.

Same as the answer is to the age old question; how many coin flips does it
take to be 100% certain I have at least one “Heads”.

In our blocks retrieval scenario; with num-nodes < 4, probability is zero.
There is a really really small chance you will get 100% of the blocks with 4
nodes (actual number depends on the amount of total blocks you are looking
for).
And this goes up as you add more nodes, but never reaches 100%

At the other end of this question you can ask what the chance is of at least
one block being lost when there are N nodes, a block nobody has. That chance
is small with current > 6000 nodes, but not zero (a second reason why the
previous parag never reaches 100%).

Bottom line, it is silly to assume 100% of the nodes would be partial-
pruning, and if you continue on that path you will only have probabilities
to predict how many nodes it takes to have 100% coverage, exact numbers are
worse than useless, they are misleading.

As I said in my initial email, statistics is hard. Crypto is much easier in
that it is absolute. Either correct or false. Never in between.

To repeat, the goal of this pruning method is not to replace a full
“archival” node, the goal of this pruning node is to provide an improvement
over the current pruning node which stops any and all serving of historical
blocks.
Anyone that feels the need to talk about pruning modes like 100% of the full
nodes will run it are in actual fact not talking about the real world.
Distributed systems will never (and should never) end up being a mono-
culture. Diversity is the essential thing you aim for.

I would suggest we focus on the real world and not on irreleavant math
experiments that only lead to confusion.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Jonas Schnelli via bitcoin-dev
2017-04-18 07:43:52 UTC
Reply
Permalink
Raw Message
Hi Dave
A node that stores the full blockchain (I will use the term archival node) requires over 100GB of disk space, which I believe is one of the most significant barriers to more people running full nodes. And I believe the ecosystem would benefit substantially if more users were running full nodes.
Thanks for your proposal.

I agree that 100GB of data may be cumbersome for some systems, especially if you target end user systems (Laptops/Desktops). Though, in my opinion, for those systems, CPU consumption is the biggest UX blocker.
Bootstrapping a full node on a decent consumer system with default parameters takes days, and, during this period, you probably run at full CPU capacity and you will be disturbed by constant fan noise. Standard tasks may be impossible because your system will be slowed down to a point where even word processing may get difficult.
This is because Core (with its default settings) is made to sync as fast as possible.

Once you have verified the chain and you reach the chain tip, indeed, it will be much better (until you shutdown for a couple of days/hours and have to re-sync/catch-up).

1. I agree that we need to have a way for pruned nodes to partially serve historical blocks.
My personal measurements told me that around ~80% of historical block serving are between tip and -1’000 blocks.
Currently, Core nodes have only two modes of operations, „server all historical blocks“ or „none“.
This makes little sense especially if you prune to a target size of, lets say, 80GB (~80% of the chain).
Ideally, there would be a mode where your full node can signal a third mode „I keep the last 1000 blocks“ (or make this more dynamic).

2. Bootstrapping new peers
I’m not sure if full nodes must be the single point of historical data storage. Full nodes provide a valuable service (verification, relay, filtering, etc.). I’m not sure if serving historical blocks is one of them. Historical blocks could be made available on CDN’s or other file storage networks. You are going to verify them anyways,... the serving part is pure data storage.
I’m also pretty sure that some users have stopping running full nodes because their upstream bandwidth consumption (because of serving historical blocks) was getting intolerable.
Especially „consumer“ peers must have been hit by this (little experience in how to reduce traffic, upstream in general is bad for consumers-connections, little resources in general).

Having a second option built into full nodes (or as an external bootstrap service/app) how to download historical blocks during bootstrapping could probably be a relieve for "small nodes“.
It could be a little daemon that downloads historical blocks from CDN’s, etc. and feeds them into your full node over p2p/8333 and kickstarts your bootstrapping without bothering valuable peers.
Or, the alternative download, could be built into the full nodes main logic.
And, if it wasn’t obvious, this must not bypass the verification!

I’m also aware of the downsides of this. This can eventually reduce decentralisation of the storage of historical bitcoin blockchain data and – eventually – increase the upstream bandwidth of peers willing to serve historical blocks (especially in a transition phase to a second „download“-option).
Maybe it’s a tradeoff between reducing decentralisation by killing low resource nodes because serving historical blocks is getting too resource-intense _or_ reducing decentralisation by moving some percentage of the historical data storage away from the bitcoin p2p network.
The later seems more promising to me.


To your proposal:
- Isn’t there a tiny finger-printing element if peers have to pick an segmentation index?
- SPV bloom filter clients can’t use fragmented blocks to filter txns? Right? How could they avoid connecting to those peers?

</jonas>
Tom Zander via bitcoin-dev
2017-04-18 10:50:31 UTC
Reply
Permalink
Raw Message
Post by David Vorick via bitcoin-dev
The best alternative today to storing the full blockchain is to run a
pruned node
The idea looks a little overly complex to me.

I suggested something similar which is a much simpler version;
https://zander.github.io/scaling/Pruning/
Post by David Vorick via bitcoin-dev
# Random pruning mode
There is a large gap between the two current modes of everything
(currently 75GB) and only what we need (2GB or so).
This mode would have two areas, it would keep a days worth of blocks to
make sure that any reorgs etc would not cause a re-download, but it would
have additionally have an area that can be used to store historical data
to be shared on the network. Maybe 20 or 50GB.
One main feature of Bitcoin is that we have massive replication. Each node
currently holds all the same data that every other node holds. But this
doesn't have to be the case with pruned nodes. A node itself has no need
for historic data at all.
The suggestion is that a node stores a random set of blocks. Dropping
random blocks as the node runs out of disk-space. Additionally, we would
introduce a new way to download blocks from other nodes which allows the
node to say it doesn't actually have the block requested.
The effect of this setup is that many different nodes together end up
having the total amount of blocks, even though each node only has a
fraction of the total amount.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Tier Nolan via bitcoin-dev
2017-04-18 13:07:09 UTC
Reply
Permalink
Raw Message
This has been discussed before.

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008101.html

including a list of nice to have features by Maxwell

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008110.html

You meet most of these rules, though you do have to download blocks from
multiple peers.

The suggestion in that thread were for a way to compactly indicate which
blocks a node has. Each node would then store a sub-set of all the
blocks. You just download the blocks you want from the node that has them.

Each node would be recommended to store the last few days worth anyway.
Aymeric Vitte via bitcoin-dev
2017-04-18 23:19:04 UTC
Reply
Permalink
Raw Message
From the initial post " The situation would likely become problematic
quickly if bitcoin-core were to ship with the defaults set to a pruned
node."

Sorry to be straight, I read the (painful) thread below, and most of
what is in there is inept, wrong, obsolete... or biased, cf the first
sentence above, if the idea is to invent a workaround to the fact that
pruning might/will become the default or might/will be set by the users
as the default so full nodes might/will disappear then just say it
clearly instead of proposing this kind of non-solution as a solution to
secure the blockchain

I can't believe this is serious, people now are supposed to prune but
will be forced to host a part of the blockchain, how do you expect this
to work, why people would do this? Knowing that to start pruning they
need a full node, then since we are there, why not continuing with a
full node... but indeed, why should they continue with a full node, and
therefore why should they accept to host a part of the blockchain if
they decline the first proposal?

This is absurd, you are not addressing the first priority given the
context which is to quickly increase the full nodes and which obviously
includes an incentive for people to run them

It gives also the feeling that bitcoin wants to reinvent everything not
capitalizing on/knowing what exists, sorry again but the concepts of the
proposal and others like archival nodes are just funny
Post by Tier Nolan via bitcoin-dev
This has been discussed before.
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008101.html
including a list of nice to have features by Maxwell
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008110.html
You meet most of these rules, though you do have to download blocks
from multiple peers.
The suggestion in that thread were for a way to compactly indicate
which blocks a node has. Each node would then store a sub-set of all
the blocks. You just download the blocks you want from the node that
has them.
Each node would be recommended to store the last few days worth anyway.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
--
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
udevNull via bitcoin-dev
2017-04-19 04:28:48 UTC
Reply
Permalink
Raw Message
I'd like to add to this. There is definitely a barrier of entry with regards to setting up a full node. Unless you're living in a first world country, the bandwidth requirements alone, will outright prevent you from even setting up a full node (sync since genesis).

To maintain that also becomes a sunk cost, as there is no financial incentive to run a node, only an idealogical one. Most of the people who benefit and will benefit from Bitcoin, are the un-banked. Which you will find in 3rd world countries, that don't have ISPs that provide the data packages, to cater for the requirements of running a full node. I'm sure many would like to, but simply cannot afford it.

A user may not want to run a node at home, but rather on a digital ocean or AWS server, which they cannot afford to do either considering the bandwidth and storage costs associated with it. However, I don't think they should be excluded from participating in the network (supporting proposals, voicing their opinions, running their own wallets, writing their own applications on top of Bitcoin [which I think is extremely important]).

So I would definitely be in favour of a small node of sorts. It will present us with some interesting technical challenges along the way but it's definitely worth while looking into.

Financially incentivising nodes is a really weird area because it would allow someone to essentially automate the deployment of nodes. i.e. if a node can pay for itself 100% (even at a lesser value, it just becomes cheaper overall), you could write an application that uses an AWS API or a digital ocean API to automatically deploy 100's of nodes. Which sounds great but not if that person is malicious and wants to prevent the community adopting proposals.

Just my 2 cents worth.

Sent with [ProtonMail](https://protonmail.com) Secure Email.
Angel Leon via bitcoin-dev
2017-04-19 13:47:40 UTC
Reply
Permalink
Raw Message
Post by udevNull via bitcoin-dev
Financially incentivising nodes is a really weird area because it would
allow someone to essentially automate the deployment of nodes. i.e. if a
node can pay for itself 100% (even at a lesser value, it just becomes
cheaper overall), you could write an application that uses an AWS API or a
digital ocean API to automatically deploy 100's of nodes. Which sounds
great but not if that person is malicious and wants to prevent the
community adopting proposals.

what other projects have done to avoid such attacks (while incentivizing
economically running full nodes) is to only distribute part of the block
rewards back such nodes if that node has committed/frozen a predetermined
amount of coins that can't be spent. This also leaves less liquidity for
market speculation and a incentives for long term commitments.

On Wed, Apr 19, 2017 at 5:14 AM udevNull via bitcoin-dev <
Post by udevNull via bitcoin-dev
I'd like to add to this. There is definitely a barrier of entry with
regards to setting up a full node. Unless you're living in a first world
country, the bandwidth requirements alone, will outright prevent you from
even setting up a full node (sync since genesis).
To maintain that also becomes a sunk cost, as there is no financial
incentive to run a node, only an idealogical one. Most of the people who
benefit and will benefit from Bitcoin, are the un-banked. Which you will
find in 3rd world countries, that don't have ISPs that provide the data
packages, to cater for the requirements of running a full node. I'm sure
many would like to, but simply cannot afford it.
A user may not want to run a node at home, but rather on a digital ocean
or AWS server, which they cannot afford to do either considering the
bandwidth and storage costs associated with it. However, I don't think they
should be excluded from participating in the network (supporting proposals,
voicing their opinions, running their own wallets, writing their own
applications on top of Bitcoin [which I think is extremely important]).
So I would definitely be in favour of a small node of sorts. It will
present us with some interesting technical challenges along the way but
it's definitely worth while looking into.
Financially incentivising nodes is a really weird area because it would
allow someone to essentially automate the deployment of nodes. i.e. if a
node can pay for itself 100% (even at a lesser value, it just becomes
cheaper overall), you could write an application that uses an AWS API or a
digital ocean API to automatically deploy 100's of nodes. Which sounds
great but not if that person is malicious and wants to prevent the
community adopting proposals.
Just my 2 cents worth.
Sent with ProtonMail <https://protonmail.com> Secure Email.
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Gregory Maxwell via bitcoin-dev
2017-04-21 20:38:36 UTC
Reply
Permalink
Raw Message
On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev
Post by David Vorick via bitcoin-dev
A node that stores the full blockchain (I will use the term archival node)
requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.
Hi David,

I've been thinking about this subject for a long time (Tier Nolan
linked some of the threads; see also the coloration part of
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and
about using FEC with the history for the last year or so. (we use FEC
in practice in fibre for relay at the tip now, and that has a design
history going back to 2013).

As Jonas points out, we're all set to also offer the 'recent blocks'
separately, so that is obviously going to happen and will help. The
free design parameter is the definition of recent, but we have good
measurements for setting that now. But about history...

I think I might have designed myself into a corner and perhaps you've
shown a way out-- I think there are some serious limits in your
proposal but perhaps we can fix them. Let me give you what I had been
thinking about, then hand you at least a couple improvements to your
thinking:

As you hopefully now know (per Tier Nolan's) post: I'd previously been
thinking about nodes keeping a deterministic random, independently
selected subset which is self-leveling based on a small seed. The
connection overhead can can be mitigated by working with chunks of
blocks rather than single blocks.

But as you've observed, the failure probabilities are rather high,
especially if an active attacker targets nodes carrying less commonly
available blocks. So I thought, okay we can use FEC to help improve
the recovery odds.

So I considered using a sparse random code (E.g. an LDPC erasure code
like in RFC 5170) the advantage of these codes is that they are very
fast to decode, and they support having enormous codewords, so you can
a high probability of every peer having a unique ID, so there would
effectively never need to be any duplication.

But then I ran into the problem that I had no reasonable way to
recover from bad data, short of pulling a single group from an archive
then banning whatever peers gave you bad chunks.

So that is kind of where I got stuck.

I didn't even consider the advantage of only being able to use a few
peers total, as I still assumed that it would be doing the random
groups thing as well. That's a great point.

So on your design:

Being able to have a lower bound of 20% seems like a serious
limitation to me: it would be fine today, but the chain will quite
possibly be twice the current size in a year.... and in four years
we're back to where we are now. What I'd been thinking would be able
to scale arbitrarily low.

20% is small, but is it small enough? -- today that would get us back
into common SSDs being able to hold the whole chain, but that property
will probably be lost again in a couple years. I think with any fixed
fraction we'll constantly be fighting against the opportunity cost of
upgrading storage, if not the cost of the storage itself. (and I agree
this is the vast majority of the response from actual users, sync
time and ongoing bandwidth usage seeming to tie for second; the latter
of which can be mitigated in other ways but for the former see my
remarks at the end)

The fact that having only a few required blocks needed lets you brute
force out the decode is a great point. I hadn't considered that, and
the schemed I'd been considering are not communications efficient with
only a few blocks required e.g. they sometimes require a couple extra
blocks to successfully decode, which is a lot of overhead if you're
only splitting 5 ways.
Post by David Vorick via bitcoin-dev
I also believe that alternative erasure coding schemes exist which actually
are able to identify the bad pieces given sufficient good pieces, however I
don't know if they have the same computational performance as the best
Reed-Solomon coding implementations.
Actually RS codes are _the_ codes you want to use for with decoding
with errors but the 'computationally efficient' error correcting
decoding (which is itself heinously slow) requires 2*errors + data
number of blocks to decode. Not so useful if you're only looking at 5
parts.

RS decoding is pretty slow generally, esp compared to binary sparse
codes. We were unable to make RS coding make sense for use in fast
block relay for this reason-- decoding time bottlenecked
reconstruction. The most highly optimized RS codes make a special
optimization which is not useful for your proposal: They're much
faster to decode if you're decoding from the first couple correction
blocks. Without these optimizations speeds from from 1GB/s to more
like 100MB/s or worse. Though I suppose with assumevalid initial sync
now is pretty cpu-idle on most hardware. (If 100MB/s sounds fast,
keep in mind that time spent decoding is time that can't be spent
hashing/verifying/etc.. and on a lot hardware with fast broadband with
signature validation enabled we're cpu bound already)
Post by David Vorick via bitcoin-dev
Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.
This was a complete non-starter when thinking about using these sparse
codes where the number of possible blocks is effectively unlimited.

Your 4kb assumption isn't correct though: How you do it is that after
downloading a block you compute the 255 fragments (as an aside, you're
saying 256 but the most you can get is 255 for an 8-bit RS code).
then you compute a hashtree over them. Every node remembers the root,
and the membership proofs for their chunk(s)... this is 256 bytes of
extra storage.

When you decode you use a majority to decide what root you are trying
to decode. If it fails to result in valid blocks, you blacklist that
root, ban all those peers, and try the next. Worst case cost is the
number of invalid roots rather than peers choose 5.

I'm not sure if combinitoral decode or the above would be easier to implement.
Post by David Vorick via bitcoin-dev
This represents a non-trivial amount of code, but I believe that the result
would be a non-trivial increase in the percentage of users running full
nodes, and a healthier overall network.
The RS coder stuff is small, even doing the fancy w/ error decodes it
isn't huge. I expect complexity mostly will show up in the garbage
input handling.

A couple other thoughts:

The coded blocks are not useful for things like bloom scanning or
other lookup. With the committed filter proposals you could still
keep the committed filters (even 5 way shared themselves...) so
perhaps not that much of a concern.

Coding the blocks will make their encoding normative. The current P2P
one we use is by no means more efficient, Pieter and I have an
encoding that works on a transaction by transaction basis and
decreases the size of the whole chain by ~28%, block-wide entropy
encoding could reduce the size even further. We'd hoped to at least
start using this per transaction coding locally, converting on the fly
to the original serialization where needed (the idea would be to
eventually support the compacted serialization on the wire too).
Maybe the answer there is to just get in whatever improvements we
think are reasonable before doing the coded block.

I think the 20% floor is still too high, and too many nodes will be
forced to just do the recent blocks things. perhaps not today, but
eventually. I suppose we could just assume that the block is split
10 ways, and the default is two indexes, but there is flexibility to
go down to one in the future, at the cost of needing more peers...
could run numbers on the decode failure odds for the 10 of 255 case...
But that would only improve it to 10%. I suppose the proposal could
be combined with sparse chain storage like I was thinking years ago,
but much less sparsity would be needed so the downsides would be less
severe.

E.g. if you want to store <10% of the chain you'd keep some 10% blocks
for random groups. such a feature could be introduced later when it
was more important to keep less than 10 or 20 percent.

Recovery odds could be improved with a second level of coding. E.g. if
your ID was not a constant but a seed into PRNG(height)%250+5 then
you also have a fraction of nodes store the xor between adjacent pairs
then you can fill in unrecoverable groups. the limit of this thinking
is exactly the sparse random code ECC schemes) Probably not worth the
complexity, but just a thing to keep in mind...

Probably the next step is to do some more focused benchmarks. I have
some code I can recommend offline.

I can't help but feel that this might be a little bit of a waste of
time. I think the long term survival of the system is likely going to
ultimately depend on doing an assumevalid like gated sync of a UTXO
set. Ethereum already does something far more reckless (they let
miners pick the state and blindly trust it with almost no depth
required) and no one cares. If that is done then all these concerns
are greatly reduced, along with the 100(+++?)GB/history-year transfer
costs. You'd still want to have archives, but do you care about 20%
archives?
Post by David Vorick via bitcoin-dev
A user may not want to run a node at home, but rather on a digital ocean or AWS server
This is almost if not quite entirely pointless. There are already many
nodes on AWS and DO, adding more does not improve your security (you
must trust DO/AWS), does not improve your reliability (DO/AWS), does
not improve the network's capacity, etc. About the only arguable
benefit I can see there beyond making more money for these hosts is
feel good points for (incorrectly) thinking you're helping the
network, and negligibly reducing the density of spy nodes (but wait:
AWS/DO may well just be spying on your connections too..). and it it
isn't like fast storage on these services is cheap.
Post by David Vorick via bitcoin-dev
Perhaps it is not, but I would think that it would be pretty straightforward to configure a global bandwidth limit within Bitcoin.
We have outbound transmission limits though not by default. Setting
them sensibly is something of a black art. Obviously these will
improve over time... and are more reasons that I agree with your
relative cost assumptions except for the sync-time part.
Post by David Vorick via bitcoin-dev
Fingerprinting?
Unavoidable, but should be made inconsequential by making transaction
announcement more private independent of any of this. There are
already _MANY_ ways to fingerprint nodes, please don't think that
existing software has any real immunity here. We avoid adding new
fingerprinting where we can... but they're very fingerprintable
already. Transaction announcement privacy MUST be fixed. I assume if
any of this is worth doing, it will also be worth the increase in
fingerprinting. And at least this proposal would 'only' create 255
node classes.
Post by David Vorick via bitcoin-dev
SPV?
As I pointed out above, Vorick's idea is totally compatible with
committed filters, and the filters can even be shared like the blocks
are.
Post by David Vorick via bitcoin-dev
[People who fail at math]
The SNR on this list really sucks. If you aren't spending a couple
hours thinking about your responses or at least citing research that
took days of effort or more then you are probably wasting people's
time. Please respect the other users of the list.
Post by David Vorick via bitcoin-dev
Could there be a slider?
Absolutely, I assume if Vorick's proposal were implemented that nodes
would have the follow options: Pruned [UTXO + recent two weeks of
blocks], 20%, 40%, 60%, 80%, 100% (archive).
Aymeric Vitte via bitcoin-dev
2017-04-23 16:27:15 UTC
Reply
Permalink
Raw Message
"Absolutely, I assume if Vorick's proposal were implemented that nodes
would have the follow options: Pruned [UTXO + recent two weeks of
blocks], 20%, 40%, 60%, 80%, 100% (archive)."

Yes, and I think that they can have this in "disorder" (only 20% of
blocks in the middle of the blockchain for example)

There are many reasons why I dislike David's proposal as it is, you
mention some below, why 20%? too much? too small? what will happen
tomorrow? etc, I gave others, and this still does not explain why people
should go for more than two weeks of blocks

Maybe what I suggested here again
https://gist.github.com/Ayms/aab6f8e08fef0792ab3448f542a826bf#proposal
could be considered

This is just a suggestion based on incremental "torrents", fortunately
it comes from much more than days of work as you require below and is
the concatenation of thoughts from different projects/studies

It does follow your 8 rules (although I am not sure what you mean in
rule 2 "The decision to contact a node should need O(1) communications",
1 M limit works?) and proposes others, and it solves the issues
mentioned below, or please someone tell me why it does not (I know
people here dislike DHTs, not sure why)

Except fingerprinting (and I don't know what is the SPV issue exactly)
but still is better, if the nodes behave like the groups with most
numerous peers (ie the group seeding, 20%, or the one seeding 40%, or
the one seeding about nothing (sic... subliminal message here...), etc)
then they just can't be fingerprinted (at least based on this feature)

I think the main concepts are detailed enough but maybe that's not the
case, it's a really draft, if you look well the pruning case is
considered, but not explained, like some other points, because
continuing this work with no incentive solution for the seeders looks to
me useless
Post by Gregory Maxwell via bitcoin-dev
On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev
Post by David Vorick via bitcoin-dev
A node that stores the full blockchain (I will use the term archival node)
requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.
Hi David,
I've been thinking about this subject for a long time (Tier Nolan
linked some of the threads; see also the coloration part of
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and
about using FEC with the history for the last year or so. (we use FEC
in practice in fibre for relay at the tip now, and that has a design
history going back to 2013).
As Jonas points out, we're all set to also offer the 'recent blocks'
separately, so that is obviously going to happen and will help. The
free design parameter is the definition of recent, but we have good
measurements for setting that now. But about history...
I think I might have designed myself into a corner and perhaps you've
shown a way out-- I think there are some serious limits in your
proposal but perhaps we can fix them. Let me give you what I had been
thinking about, then hand you at least a couple improvements to your
As you hopefully now know (per Tier Nolan's) post: I'd previously been
thinking about nodes keeping a deterministic random, independently
selected subset which is self-leveling based on a small seed. The
connection overhead can can be mitigated by working with chunks of
blocks rather than single blocks.
But as you've observed, the failure probabilities are rather high,
especially if an active attacker targets nodes carrying less commonly
available blocks. So I thought, okay we can use FEC to help improve
the recovery odds.
So I considered using a sparse random code (E.g. an LDPC erasure code
like in RFC 5170) the advantage of these codes is that they are very
fast to decode, and they support having enormous codewords, so you can
a high probability of every peer having a unique ID, so there would
effectively never need to be any duplication.
But then I ran into the problem that I had no reasonable way to
recover from bad data, short of pulling a single group from an archive
then banning whatever peers gave you bad chunks.
So that is kind of where I got stuck.
I didn't even consider the advantage of only being able to use a few
peers total, as I still assumed that it would be doing the random
groups thing as well. That's a great point.
Being able to have a lower bound of 20% seems like a serious
limitation to me: it would be fine today, but the chain will quite
possibly be twice the current size in a year.... and in four years
we're back to where we are now. What I'd been thinking would be able
to scale arbitrarily low.
20% is small, but is it small enough? -- today that would get us back
into common SSDs being able to hold the whole chain, but that property
will probably be lost again in a couple years. I think with any fixed
fraction we'll constantly be fighting against the opportunity cost of
upgrading storage, if not the cost of the storage itself. (and I agree
this is the vast majority of the response from actual users, sync
time and ongoing bandwidth usage seeming to tie for second; the latter
of which can be mitigated in other ways but for the former see my
remarks at the end)
The fact that having only a few required blocks needed lets you brute
force out the decode is a great point. I hadn't considered that, and
the schemed I'd been considering are not communications efficient with
only a few blocks required e.g. they sometimes require a couple extra
blocks to successfully decode, which is a lot of overhead if you're
only splitting 5 ways.
Post by David Vorick via bitcoin-dev
I also believe that alternative erasure coding schemes exist which actually
are able to identify the bad pieces given sufficient good pieces, however I
don't know if they have the same computational performance as the best
Reed-Solomon coding implementations.
Actually RS codes are _the_ codes you want to use for with decoding
with errors but the 'computationally efficient' error correcting
decoding (which is itself heinously slow) requires 2*errors + data
number of blocks to decode. Not so useful if you're only looking at 5
parts.
RS decoding is pretty slow generally, esp compared to binary sparse
codes. We were unable to make RS coding make sense for use in fast
block relay for this reason-- decoding time bottlenecked
reconstruction. The most highly optimized RS codes make a special
optimization which is not useful for your proposal: They're much
faster to decode if you're decoding from the first couple correction
blocks. Without these optimizations speeds from from 1GB/s to more
like 100MB/s or worse. Though I suppose with assumevalid initial sync
now is pretty cpu-idle on most hardware. (If 100MB/s sounds fast,
keep in mind that time spent decoding is time that can't be spent
hashing/verifying/etc.. and on a lot hardware with fast broadband with
signature validation enabled we're cpu bound already)
Post by David Vorick via bitcoin-dev
Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.
This was a complete non-starter when thinking about using these sparse
codes where the number of possible blocks is effectively unlimited.
Your 4kb assumption isn't correct though: How you do it is that after
downloading a block you compute the 255 fragments (as an aside, you're
saying 256 but the most you can get is 255 for an 8-bit RS code).
then you compute a hashtree over them. Every node remembers the root,
and the membership proofs for their chunk(s)... this is 256 bytes of
extra storage.
When you decode you use a majority to decide what root you are trying
to decode. If it fails to result in valid blocks, you blacklist that
root, ban all those peers, and try the next. Worst case cost is the
number of invalid roots rather than peers choose 5.
I'm not sure if combinitoral decode or the above would be easier to implement.
Post by David Vorick via bitcoin-dev
This represents a non-trivial amount of code, but I believe that the result
would be a non-trivial increase in the percentage of users running full
nodes, and a healthier overall network.
The RS coder stuff is small, even doing the fancy w/ error decodes it
isn't huge. I expect complexity mostly will show up in the garbage
input handling.
The coded blocks are not useful for things like bloom scanning or
other lookup. With the committed filter proposals you could still
keep the committed filters (even 5 way shared themselves...) so
perhaps not that much of a concern.
Coding the blocks will make their encoding normative. The current P2P
one we use is by no means more efficient, Pieter and I have an
encoding that works on a transaction by transaction basis and
decreases the size of the whole chain by ~28%, block-wide entropy
encoding could reduce the size even further. We'd hoped to at least
start using this per transaction coding locally, converting on the fly
to the original serialization where needed (the idea would be to
eventually support the compacted serialization on the wire too).
Maybe the answer there is to just get in whatever improvements we
think are reasonable before doing the coded block.
I think the 20% floor is still too high, and too many nodes will be
forced to just do the recent blocks things. perhaps not today, but
eventually. I suppose we could just assume that the block is split
10 ways, and the default is two indexes, but there is flexibility to
go down to one in the future, at the cost of needing more peers...
could run numbers on the decode failure odds for the 10 of 255 case...
But that would only improve it to 10%. I suppose the proposal could
be combined with sparse chain storage like I was thinking years ago,
but much less sparsity would be needed so the downsides would be less
severe.
E.g. if you want to store <10% of the chain you'd keep some 10% blocks
for random groups. such a feature could be introduced later when it
was more important to keep less than 10 or 20 percent.
Recovery odds could be improved with a second level of coding. E.g. if
your ID was not a constant but a seed into PRNG(height)%250+5 then
you also have a fraction of nodes store the xor between adjacent pairs
then you can fill in unrecoverable groups. the limit of this thinking
is exactly the sparse random code ECC schemes) Probably not worth the
complexity, but just a thing to keep in mind...
Probably the next step is to do some more focused benchmarks. I have
some code I can recommend offline.
I can't help but feel that this might be a little bit of a waste of
time. I think the long term survival of the system is likely going to
ultimately depend on doing an assumevalid like gated sync of a UTXO
set. Ethereum already does something far more reckless (they let
miners pick the state and blindly trust it with almost no depth
required) and no one cares. If that is done then all these concerns
are greatly reduced, along with the 100(+++?)GB/history-year transfer
costs. You'd still want to have archives, but do you care about 20%
archives?
Post by David Vorick via bitcoin-dev
A user may not want to run a node at home, but rather on a digital ocean or AWS server
This is almost if not quite entirely pointless. There are already many
nodes on AWS and DO, adding more does not improve your security (you
must trust DO/AWS), does not improve your reliability (DO/AWS), does
not improve the network's capacity, etc. About the only arguable
benefit I can see there beyond making more money for these hosts is
feel good points for (incorrectly) thinking you're helping the
AWS/DO may well just be spying on your connections too..). and it it
isn't like fast storage on these services is cheap.
Post by David Vorick via bitcoin-dev
Perhaps it is not, but I would think that it would be pretty straightforward to configure a global bandwidth limit within Bitcoin.
We have outbound transmission limits though not by default. Setting
them sensibly is something of a black art. Obviously these will
improve over time... and are more reasons that I agree with your
relative cost assumptions except for the sync-time part.
Post by David Vorick via bitcoin-dev
Fingerprinting?
Unavoidable, but should be made inconsequential by making transaction
announcement more private independent of any of this. There are
already _MANY_ ways to fingerprint nodes, please don't think that
existing software has any real immunity here. We avoid adding new
fingerprinting where we can... but they're very fingerprintable
already. Transaction announcement privacy MUST be fixed. I assume if
any of this is worth doing, it will also be worth the increase in
fingerprinting. And at least this proposal would 'only' create 255
node classes.
Post by David Vorick via bitcoin-dev
SPV?
As I pointed out above, Vorick's idea is totally compatible with
committed filters, and the filters can even be shared like the blocks
are.
Post by David Vorick via bitcoin-dev
[People who fail at math]
The SNR on this list really sucks. If you aren't spending a couple
hours thinking about your responses or at least citing research that
took days of effort or more then you are probably wasting people's
time. Please respect the other users of the list.
Post by David Vorick via bitcoin-dev
Could there be a slider?
Absolutely, I assume if Vorick's proposal were implemented that nodes
would have the follow options: Pruned [UTXO + recent two weeks of
blocks], 20%, 40%, 60%, 80%, 100% (archive).
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
--
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
Loading...