Discussion:
[Bitcoin-development] Proposed additional options for pruned nodes
(too old to reply)
gabe appleton
2015-05-12 15:26:55 UTC
Permalink
Hi,

There's been a lot of talk in the rest of the community about how the 20MB
step would increase storage needs, and that switching to pruned nodes
(partially) would reduce network security. I think I may have a solution.

There could be a hybrid option in nodes. Selecting this would do the
following:
Flip the --no-wallet toggle
Select a section of the blockchain to store fully (percentage based,
possibly on hash % sections?)
Begin pruning all sections not included in 2
The idea is that you can implement it similar to how a Koorde is done, in
that the network will decide which sections it retrieves. So if the user
prompts it to store 50% of the blockchain, it would look at its peers, and
at their peers (if secure), and choose the least-occurring options from
them.

This would allow them to continue validating all transactions, and still
store a full copy, just distributed among many nodes. It should overall
have little impact on security (unless I'm mistaken), and it would
significantly reduce storage needs on a node.

It would also allow for a retroactive --max-size flag, where it will prune
until it is at the specified size, and continue to prune over time, while
keeping to the sections defined by the network.

What sort of side effects or network vulnerabilities would this introduce?
I know some said it wouldn't be Sybil resistant, but how would this be less
so than a fully pruned node?
Jeff Garzik
2015-05-12 16:05:44 UTC
Permalink
A general assumption is that you will have a few archive nodes with the
full blockchain, and a majority of nodes are pruned, able to serve only the
tail of the chains.
Post by gabe appleton
Hi,
There's been a lot of talk in the rest of the community about how the 20MB
step would increase storage needs, and that switching to pruned nodes
(partially) would reduce network security. I think I may have a solution.
There could be a hybrid option in nodes. Selecting this would do the
Flip the --no-wallet toggle
Select a section of the blockchain to store fully (percentage based,
possibly on hash % sections?)
Begin pruning all sections not included in 2
The idea is that you can implement it similar to how a Koorde is done, in
that the network will decide which sections it retrieves. So if the user
prompts it to store 50% of the blockchain, it would look at its peers, and
at their peers (if secure), and choose the least-occurring options from
them.
This would allow them to continue validating all transactions, and still
store a full copy, just distributed among many nodes. It should overall
have little impact on security (unless I'm mistaken), and it would
significantly reduce storage needs on a node.
It would also allow for a retroactive --max-size flag, where it will prune
until it is at the specified size, and continue to prune over time, while
keeping to the sections defined by the network.
What sort of side effects or network vulnerabilities would this introduce?
I know some said it wouldn't be Sybil resistant, but how would this be less
so than a fully pruned node?
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
gabe appleton
2015-05-12 16:56:37 UTC
Permalink
Yes, but that just increases the incentive for partially-full nodes. It
would add to the assumed-small number of full nodes.

Or am I misunderstanding?
Post by Jeff Garzik
A general assumption is that you will have a few archive nodes with the
full blockchain, and a majority of nodes are pruned, able to serve only the
tail of the chains.
Post by gabe appleton
Hi,
There's been a lot of talk in the rest of the community about how the
20MB step would increase storage needs, and that switching to pruned nodes
(partially) would reduce network security. I think I may have a solution.
There could be a hybrid option in nodes. Selecting this would do the
Flip the --no-wallet toggle
Select a section of the blockchain to store fully (percentage based,
possibly on hash % sections?)
Begin pruning all sections not included in 2
The idea is that you can implement it similar to how a Koorde is done, in
that the network will decide which sections it retrieves. So if the user
prompts it to store 50% of the blockchain, it would look at its peers, and
at their peers (if secure), and choose the least-occurring options from
them.
This would allow them to continue validating all transactions, and still
store a full copy, just distributed among many nodes. It should overall
have little impact on security (unless I'm mistaken), and it would
significantly reduce storage needs on a node.
It would also allow for a retroactive --max-size flag, where it will
prune until it is at the specified size, and continue to prune over time,
while keeping to the sections defined by the network.
What sort of side effects or network vulnerabilities would this
introduce? I know some said it wouldn't be Sybil resistant, but how would
this be less so than a fully pruned node?
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
Peter Todd
2015-05-12 17:16:40 UTC
Permalink
Post by Jeff Garzik
A general assumption is that you will have a few archive nodes with the
full blockchain, and a majority of nodes are pruned, able to serve only the
tail of the chains.
Hmm?

Lots of people are tossing around ideas for partial archival nodes that
would store a subset of blocks, such that collectively the whole
blockchain would be available even if no one node had the entire chain.
--
'peter'[:-1]@petertodd.org
0000000000000000156d2069eeebb3309455f526cfe50efbf8a85ec630df7f7c
Tier Nolan
2015-05-12 18:23:48 UTC
Permalink
Post by Peter Todd
Lots of people are tossing around ideas for partial archival nodes that
would store a subset of blocks, such that collectively the whole
blockchain would be available even if no one node had the entire chain.
A compact way to describe which blocks are stored helps to mitigate against
fingerprint attacks.

It also means that a node could compactly indicate which blocks it stores
with service bits.

The node could pick two numbers

W = window = a power of 2
P = position = random value less than W

The node would store all blocks with a height of P mod W. The block hash
could be used too.

This has the nice feature that the node can throw away half of its data and
still represent what is stored.

W_new = W * 2
P_new = (random_bool()) ? P + W/2 : P;

Half of the stored blocks would match P_new mod W_new and the other half
could be deleted. This means that the store would use up between 50% and
100% of the allocated size.

Another benefit is that it increases the probability that at least someone
has every block.

If N nodes each store 1% of the blocks, then the odds of a block being
stored is pow(0.99, N). For 1000 nodes, that gives odds of 1 in 23,164
that a block will be missing. That means that around 13 out of 300,000
blocks would be missing. There would likely be more nodes than that, and
also storage nodes, so it is not a major risk.

If everyone is storing 1% of blocks, then they would set W to 128. As long
as all of the 128 buckets is covered by some nodes, then all blocks are
stored. With 1000 nodes, that gives odds of 0.6% that at least one bucket
will be missed. That is better than around 13 blocks being missing.

Nodes could inform peers of their W and P parameters on connection. The
version message could be amended or a "getparams" message of some kind
could be added.

W could be encoded with 4 bits and P could be encoded with 16 bits, for 20
in total. W = 1 << bits[19:16] and P = bits[14:0]. That gives a maximum W
of 32768, which is likely to many bits for P.

Initial download would be harder, since new nodes would have to connect to
at least 100 different nodes. They could download from random nodes, and
just download the ones they are missing from storage nodes. Even storage
nodes could have a range of W values.
Gregory Maxwell
2015-05-12 19:03:55 UTC
Permalink
It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.

Summarizing from memory:

(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.

(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.

(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.

(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.

(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.

(5) The communication about what blocks a node has should be compact.

(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)

(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.

I've previously proposed schemes which come close but fail one of the above.

(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.; or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)

So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.
gabe appleton
2015-05-12 19:24:20 UTC
Permalink
0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
give the signed (by sender) hash of the first and last block in your range.
This is less data dense than the idea above, but it might work better.

That said, this is likely a less secure way to do it. To improve upon that,
a node could request a block of random height within that range and verify
it, but that violates point 2. And the scheme in itself definitely violates
point 7.
Post by Gregory Maxwell
It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
(5) The communication about what blocks a node has should be compact.
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
I've previously proposed schemes which come close but fail one of the above.
(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.; or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)
So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Jeff Garzik
2015-05-12 19:38:20 UTC
Permalink
One general problem is that security is weakened when an attacker can DoS a
small part of the chain by DoS'ing a small number of nodes - yet the impact
is a network-wide DoS because nobody can complete a sync.
Post by gabe appleton
0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
give the signed (by sender) hash of the first and last block in your range.
This is less data dense than the idea above, but it might work better.
That said, this is likely a less secure way to do it. To improve upon
that, a node could request a block of random height within that range and
verify it, but that violates point 2. And the scheme in itself definitely
violates point 7.
Post by Gregory Maxwell
It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
(5) The communication about what blocks a node has should be compact.
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
I've previously proposed schemes which come close but fail one of the above.
(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.; or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)
So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
gabe appleton
2015-05-12 19:43:45 UTC
Permalink
Yet this holds true in our current assumptions of the network as well: that
it will become a collection of pruned nodes with a few storage nodes.

A hybrid option makes this better, because it spreads the risk, rather than
concentrating it in full nodes.
Post by Jeff Garzik
One general problem is that security is weakened when an attacker can DoS
a small part of the chain by DoS'ing a small number of nodes - yet the
impact is a network-wide DoS because nobody can complete a sync.
Post by gabe appleton
0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
give the signed (by sender) hash of the first and last block in your range.
This is less data dense than the idea above, but it might work better.
That said, this is likely a less secure way to do it. To improve upon
that, a node could request a block of random height within that range and
verify it, but that violates point 2. And the scheme in itself definitely
violates point 7.
Post by Gregory Maxwell
It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
(5) The communication about what blocks a node has should be compact.
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
I've previously proposed schemes which come close but fail one of the above.
(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.; or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)
So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
gb
2015-05-12 21:30:03 UTC
Permalink
This seems like a good place to add in an idea I had about
partially-connected nodes that are able to throttle bandwidth demands.
While we will be having partial-blockchain nodes with a spectrum of
storage options the requirement to be connected is somewhat binary, I
think many users manually throttle by turning nodes on/off already with
a minimum to just keep the chain up to date. A throttling option would
leverage on bitcoin's asychronous design to reduce bandwidth demands for
weaker nodes.

So throttling to allow for a spectrum of bandwidth connectivity:

1) an option for the user -throttle=XXX that would allow the user to
specify a desirable total bandwidth XXX in Gbytes/day the bitcoin client
can use.

2) the client reduces the number of continuous connections, transaction
or block relaying to achieve the desired throttling rate

3) it could do this by being partially connected throughout the duty
cycle or cycling the node on/off for a percentage of a 24(?) hr period

4) have an auto setting where some smart traffic management 'just takes
care of it' and manual settings that can be user configured

5) reduces minimum requirement for any 24(?) hr period it has received a
full copy of all blocks to remain fully-validating

Not sure if anyone has bought such an idea forward or if there are
obvious holes, so pre-emptive apologies for time-wasting if so.
that it will become a collection of pruned nodes with a few storage
nodes.
A hybrid option makes this better, because it spreads the risk, rather
than concentrating it in full nodes.
One general problem is that security is weakened when an
attacker can DoS a small part of the chain by DoS'ing a small
number of nodes - yet the impact is a network-wide DoS because
nobody can complete a sync.
On Tue, May 12, 2015 at 12:24 PM, gabe appleton
0, 1, 3, 4, 5, 6 can be solved by looking at chunks
chronologically. Ie, give the signed (by sender) hash
of the first and last block in your range. This is
less data dense than the idea above, but it might work
better.
That said, this is likely a less secure way to do it.
To improve upon that, a node could request a block of
random height within that range and verify it, but
that violates point 2. And the scheme in itself
definitely violates point 7.
On May 12, 2015 3:07 PM, "Gregory Maxwell"
It's a little frustrating to see this just
repeated without even
paying attention to the desirable
characteristics from the prior
discussions.
(0) Block coverage should have locality;
historical blocks are
(almost) always needed in contiguous ranges.
Having random peers
with totally random blocks would be horrific
for performance; as you'd
have to hunt down a working peer and make a
connection for each block
with high probability.
(1) Block storage on nodes with a fraction of
the history should not
depend on believing random peers; because
listening to peers can
easily create attacks (e.g. someone could
break the network; by
convincing nodes to become unbalanced) and not
useful-- it's not like
the blockchain is substantially different for
anyone; if you're to the
point of needing to know coverage to fill then
something is wrong.
Gaps would be handled by archive nodes, so
there is no reason to
increase vulnerability by doing anything but
behaving uniformly.
(2) The decision to contact a node should need
O(1) communications,
not just because of the delay of chasing
around just to find who has
someone; but because that chasing process
usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has
should be compact (e.g.
not a dense list of blocks) so it can be
rumored efficiently.
(4) Figuring out what block (ranges) a peer
has given should be
computationally efficient.
(5) The communication about what blocks a node
has should be compact.
(6) The coverage created by the network should
be uniform, and should
remain uniform as the blockchain grows;
ideally it you shouldn't need
to update your state to know what blocks a
peer will store in the
future, assuming that it doesn't change the
amount of data its
planning to use. (What Tier Nolan proposes
sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause
much (or any) need to
refetch old blocks.
I've previously proposed schemes which come
close but fail one of the above.
(e.g. a scheme based on reservoir sampling
that gives uniform
selection of contiguous ranges, communicating
only 64 bits of data to
know what blocks a node claims to have,
remaining totally uniform as
the chain grows, without any need to refetch
-- but needs O(height)
work to figure out what blocks a peer has from
the data it
communicated.; or another scheme based on
consistent hashes that has
log(height) computation; but sometimes may
result in a node needing to
go refetch an old block range it previously
didn't store-- creating
re-balancing traffic.)
So far something that meets all those criteria
(and/or whatever ones
I'm not remembering) has not been discovered;
but I don't really think
much time has been spent on it. I think its
very likely possible.
------------------------------------------------------------------------------
One dashboard for servers and applications
across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with
50+ applications
Performance metrics, stats and reports that
give you Actionable Insights
Deep dive visibility with transaction tracing
using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+
applications
Performance metrics, stats and reports that give you
Actionable Insights
Deep dive visibility with transaction tracing using
APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
Gregory Maxwell
2015-05-12 20:02:36 UTC
Permalink
Post by Jeff Garzik
One general problem is that security is weakened when an attacker can DoS a
small part of the chain by DoS'ing a small number of nodes - yet the impact
is a network-wide DoS because nobody can complete a sync.
It might be more interesting to think of that attack as a bandwidth
exhaustion DOS attack on the archive nodes... if you can't get a copy
without them, thats where you'll go.

So the question arises: does the option make some nodes that would
have been archive not be? Probably some-- but would it do so much that
it would offset the gain of additional copies of the data when those
attacks are not going no. I suspect not.

It's also useful to give people incremental ways to participate even
when they can't swollow the whole pill; or choose to provide the
resource thats cheap for them to provide. In particular, if there is
only two kinds of full nodes-- archive and pruned; then the archive
nodes take both a huge disk and bandwidth cost; where as if there are
fractional then archives take low(er) bandwidth unless the fractionals
get DOS attacked.
Jeff Garzik
2015-05-12 20:10:56 UTC
Permalink
True. Part of the issue rests on the block sync horizon/cliff. There is a
value X which is the average number of blocks the 90th percentile of nodes
need in order to sync. It is sufficient for the [semi-]pruned nodes to
keep X blocks, after which nodes must fall back to archive nodes for older
data.

There is simply far, far more demand for recent blocks, and the demand for
old blocks very rapidly falls off.

There was even a more radical suggestion years ago - refuse to sync if too
old (>2 weeks?), and force the user to download ancient data via torrent.
Post by gabe appleton
Post by Jeff Garzik
One general problem is that security is weakened when an attacker can
DoS a
Post by Jeff Garzik
small part of the chain by DoS'ing a small number of nodes - yet the
impact
Post by Jeff Garzik
is a network-wide DoS because nobody can complete a sync.
It might be more interesting to think of that attack as a bandwidth
exhaustion DOS attack on the archive nodes... if you can't get a copy
without them, thats where you'll go.
So the question arises: does the option make some nodes that would
have been archive not be? Probably some-- but would it do so much that
it would offset the gain of additional copies of the data when those
attacks are not going no. I suspect not.
It's also useful to give people incremental ways to participate even
when they can't swollow the whole pill; or choose to provide the
resource thats cheap for them to provide. In particular, if there is
only two kinds of full nodes-- archive and pruned; then the archive
nodes take both a huge disk and bandwidth cost; where as if there are
fractional then archives take low(er) bandwidth unless the fractionals
get DOS attacked.
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
gabe appleton
2015-05-12 20:41:10 UTC
Permalink
I suppose this begs two questions:

1) why not have a partial archive store the most recent X% of the
blockchain by default?

2) why not include some sort of torrent in QT, to mitigate this risk? I
don't think this is necessarily a good idea, but I'd like to hear the
reasoning.
Post by Jeff Garzik
True. Part of the issue rests on the block sync horizon/cliff. There is
a value X which is the average number of blocks the 90th percentile of
nodes need in order to sync. It is sufficient for the [semi-]pruned nodes
to keep X blocks, after which nodes must fall back to archive nodes for
older data.
There is simply far, far more demand for recent blocks, and the demand for
old blocks very rapidly falls off.
There was even a more radical suggestion years ago - refuse to sync if too
old (>2 weeks?), and force the user to download ancient data via torrent.
Post by gabe appleton
Post by Jeff Garzik
One general problem is that security is weakened when an attacker can
DoS a
Post by Jeff Garzik
small part of the chain by DoS'ing a small number of nodes - yet the
impact
Post by Jeff Garzik
is a network-wide DoS because nobody can complete a sync.
It might be more interesting to think of that attack as a bandwidth
exhaustion DOS attack on the archive nodes... if you can't get a copy
without them, thats where you'll go.
So the question arises: does the option make some nodes that would
have been archive not be? Probably some-- but would it do so much that
it would offset the gain of additional copies of the data when those
attacks are not going no. I suspect not.
It's also useful to give people incremental ways to participate even
when they can't swollow the whole pill; or choose to provide the
resource thats cheap for them to provide. In particular, if there is
only two kinds of full nodes-- archive and pruned; then the archive
nodes take both a huge disk and bandwidth cost; where as if there are
fractional then archives take low(er) bandwidth unless the fractionals
get DOS attacked.
--
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc. https://bitpay.com/
Gregory Maxwell
2015-05-12 20:47:41 UTC
Permalink
Post by Jeff Garzik
True. Part of the issue rests on the block sync horizon/cliff. There is a
value X which is the average number of blocks the 90th percentile of nodes
need in order to sync. It is sufficient for the [semi-]pruned nodes to keep
X blocks, after which nodes must fall back to archive nodes for older data.
Prior discussion had things like "the definition of pruned means you
have and will serve at least the last 288 from your tip" (which is
what I put in the pruned service bip text); and another flag for "I
have at least the last 2016". (2016 should be reevaluated-- it was
just a round number near where sipa's old data showed the fetch
probability flatlined.

But that data was old, but what it showed that the probability of a
block being fetched vs depth looked like a exponential drop-off (I
think with a 50% at 3-ish days); plus a constant low probability.
Which is probably what we should have expected.
Post by Jeff Garzik
There was even a more radical suggestion years ago - refuse to sync if too
old (>2 weeks?), and force the user to download ancient data via torrent.
I'm not fond of this; it makes the system dependent on centralized
services (e.g. trackers and sources of torrents). A torrent also
cannot very efficiently handle fractional copies; cannot efficiently
grow over time. Bitcoin should be complete-- plus, many nodes already
have the data.
Adam Weiss
2015-05-12 21:17:14 UTC
Permalink
FYI on behalf of jgarzik...

---------- Forwarded message ----------
From: Jeff Garzik <***@bitpay.com>
Date: Tue, May 12, 2015 at 4:48 PM
Subject: Re: [Bitcoin-development] Proposed additional options for pruned
nodes
To: Adam Weiss <***@signal11.com>


Maybe you could forward my response to the list as an FYI?
You are the 12th person to report this. It is SF, not bitpay, rewriting
email headers and breaking authentication.
fyi, your email to bitcoin-dev is still generating google spam warnings...
--adam
Tier Nolan
2015-05-12 22:00:33 UTC
Permalink
Post by Gregory Maxwell
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
(5) The communication about what blocks a node has should be compact.
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
M = 1,000,000
N = number of "starts"

S(0) = hash(seed) mod M
...
S(n) = hash(S(n-1)) mod M

This generates a sequence of start points. If the start point is less than
the block height, then it counts as a hit.

The node stores the 50MB of data starting at the block at height S(n).

As the blockchain increases in size, new starts will be less than the block
height. This means some other runs would be deleted.

A weakness is that it is random with regards to block heights. Tiny blocks
have the same priority as larger blocks.

0) Blocks are local, in 50MB runs
1) Agreed, nodes should download headers-first (or some other compact way
of finding the highest POW chain)
2) M could be fixed, N and the seed are all that is required. The seed
doesn't have to be that large. If 1% of the blockchain is stored, then 16
bits should be sufficient so that every block is covered by seeds.
3) N is likely to be less than 2 bytes and the seed can be 2 bytes
4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
That is 10 hashes. They don't even necessarily need to be crypt hashes
5) Isn't this the same as 3?
6) Every block has the same odds of being included. There inherently needs
to be an update when a node deletes some info due to exceeding its cap. N
can be dropped one run at a time.
7) When new starts drop below the tip height, N can be decremented and that
one run is deleted.

There would need to be a special rule to ensure the low height blocks are
covered. Nodes should keep the first 50MB of blocks with some probability
(10%?)
gabe appleton
2015-05-12 22:09:44 UTC
Permalink
This is exactly the sort of solution I was hoping for. It seems this is the
minimal modification to make it work, and, if someone was willing to work
with me, I would love to help implement this.

My only concern would be if the - - max-size flag is not included than this
delivers significantly less benefit to the end user. Still a good chunk,
but possibly not enough.
Post by Tier Nolan
Post by Gregory Maxwell
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
(5) The communication about what blocks a node has should be compact.
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
M = 1,000,000
N = number of "starts"
S(0) = hash(seed) mod M
...
S(n) = hash(S(n-1)) mod M
This generates a sequence of start points. If the start point is less
than the block height, then it counts as a hit.
The node stores the 50MB of data starting at the block at height S(n).
As the blockchain increases in size, new starts will be less than the
block height. This means some other runs would be deleted.
A weakness is that it is random with regards to block heights. Tiny
blocks have the same priority as larger blocks.
0) Blocks are local, in 50MB runs
1) Agreed, nodes should download headers-first (or some other compact way
of finding the highest POW chain)
2) M could be fixed, N and the seed are all that is required. The seed
doesn't have to be that large. If 1% of the blockchain is stored, then 16
bits should be sufficient so that every block is covered by seeds.
3) N is likely to be less than 2 bytes and the seed can be 2 bytes
That is 10 hashes. They don't even necessarily need to be crypt hashes
5) Isn't this the same as 3?
6) Every block has the same odds of being included. There inherently
needs to be an update when a node deletes some info due to exceeding its
cap. N can be dropped one run at a time.
7) When new starts drop below the tip height, N can be decremented and
that one run is deleted.
There would need to be a special rule to ensure the low height blocks are
covered. Nodes should keep the first 50MB of blocks with some probability
(10%?)
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Daniel Kraft
2015-05-13 05:19:54 UTC
Permalink
Hi all!
In the context of this discussion, let me also restate an idea I've
proposed in Bitcointalk for this. It is probably not perfect and could
surely be adapted (I'm interested in that), but I think it meets
most/all of the criteria stated below. It is similar to the idea with
"start points", but gives O(log height) instead of O(height) for
determining which blocks a node has.

Let me for simplicity assume that the node wants to store 50% of all
blocks. It is straight-forward to extend the scheme so that this is
configurable:

1) Create some kind of "seed" that can be compact and will be sent to
other peers to define which blocks the node has. Use it to initialise a
PRNG of some sort.

2) Divide the range of all blocks into intervals with exponentially
growing size. I. e., something like this:

1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...

With this, only O(log height) intervals are necessary to cover height
blocks.

3) Using the PRNG, *one* of the two intervals of each length is
selected. The node stores these blocks and discards the others.
(Possibly keeping the last 200 or 2,016 or whatever blocks additionally.)
Post by Gregory Maxwell
(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges. Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.
You get contiguous block ranges (with at most O(log height) "breaks").
Also ranges of newer blocks are longer, which may be an advantage if
those blocks are needed more often.
Post by Gregory Maxwell
(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.
With my proposal, each node determines randomly and on its own which
blocks to store. No believing anyone.
Post by Gregory Maxwell
(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.
Not exactly sure what you mean by that, but I think that's fulfilled.
You can (locally) compute in O(log height) from a node's seed whether or
not it has the blocks you need. This needs only communication about the
node's seed.
Post by Gregory Maxwell
(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.
See above.
Post by Gregory Maxwell
(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.
O(log height). Not O(1), but that's probably not a big issue.
Post by Gregory Maxwell
(5) The communication about what blocks a node has should be compact.
See above.
Post by Gregory Maxwell
(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)
Coverage will be uniform if the seed is created randomly and the PRNG
has good properties. No need to update the seed if the other node's
fraction is unchanged. (Not sure if you suggest for nodes to define a
"fraction" or rather an "absolute size".)
Post by Gregory Maxwell
(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.
No need to do that with the scheme.

What do you think about this idea? Some random thoughts from myself:

*) I need to formulate it in a more general way so that the fraction can
be arbitrary and not just 50%. This should be easy to do, and I can do
it if there's interest.

*) It is O(log height) and not O(1), but that should not be too
different for the heights that are relevant.

*) Maybe it would be better / easier to not use the PRNG at all; just
decide to *always* use the first or the second interval with a given
size. Not sure about that.

*) With the proposed scheme, the node's actual fraction of stored blocks
will vary between 1/2 and 2/3 (if I got the mathematics right, it is
still early) as the blocks come in. Not sure if that's a problem. I
can do a precise analysis of this property for an extended scheme if you
are interested in it.

Yours,
Daniel
--
http://www.domob.eu/
OpenPGP: 1142 850E 6DFF 65BA 63D6 88A8 B249 2AC4 A733 0737
Namecoin: id/domob -> https://nameid.org/?name=domob
--
Done: Arc-Bar-Cav-Hea-Kni-Ran-Rog-Sam-Tou-Val-Wiz
To go: Mon-Pri
Tier Nolan
2015-05-13 09:34:03 UTC
Permalink
Post by Daniel Kraft
2) Divide the range of all blocks into intervals with exponentially
1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...
Interesting. This can be combined with the system I suggested.

A node broadcasts 3 pieces of information

Seed (16 bits): This is the seed
M_bits_lsb (1 bit): Used to indicate M during a transition
N (7 bits): This is the count of the last range held (or partially held)

M = 1 << M_bits

M should be set to the lowest power of 2 greater than double the block
chain height

That gives M = 1 million at the moment. During changing M, some nodes will
be using the higher M and others will use the lower M.

The M_bits_lsb field allows those to be distinguished.

As the block height approaches 512k, nodes can begin to upgrade. For a
period around block 512k, some nodes could use M = 1 million and others
could use M = 2 million.

Assuming M is around 3 times higher than the block height, then the odds of
a start being less than the block height is around 35%. If they runs by
25% each step, then that is approx a double for each hit.

Size(n) = ((4 + (n & 0x3)) << (n >> 2)) * 2.5MB

This gives an exponential increase, but groups of 4 are linearly
interpolated.


*Size(0) = 10 MB*
Size(1) = 12.5MB
Size(2) = 15 MB
Size(3) = 17.5MB
Size(4) = 20MB

*Size(5) = 25MB*
Size(6) = 30MB
Size(7) = 35MB

*Size(8) = 40MB*

Start(n) = Hash(seed + n) mod M

A node should store as much of its last start as possible. Assuming start
0, 5, and 8 were "hits" but the node had a max size of 60MB. It can store
0 and 5 and have 25MB left. That isn't enough to store all of run 8, but
it should store 25MB of the blocks in run 8 anyway.

Size(255) = pow(2, 31) * 17.5MB = 35,840 TB

Decreasing N only causes previously accepted runs to be invalidated.

When a node approaches a transition point for N, it would select a block
height within 25,000 of the transition point. Once it reaches that block,
it will begin downloading the new runs that it needs. When updating, it
can set N to zero. This spreads out the upgrade (over around a year), with
only a small number of nodes upgrading at any time.

New nodes should use the higher M, if near a transition point (say within
100,000).

Continue reading on narkive:
Loading...