Discussion:
[Bitcoin-development] Service bits for pruned nodes
(too old to reply)
Pieter Wuille
2013-04-28 15:51:55 UTC
Permalink
Hello all,

I think it is time to move forward with pruning nodes, i.e. nodes that
fully validate and relay blocks and transactions, but which do not keep
(all) historic blocks around, and thus cannot be queried for these.

The biggest roadblock is making sure new and old nodes that start up are
able to find nodes to synchronize from. To help them find peers, I would
like to propose adding two extra service bits to the P2P protocol:
* NODE_VALIDATE: relay and validate blocks and transactions, but is only
guaranteed to answer getdata requests for (recently) relayed blocks and
transactions, and mempool transactions.
* NODE_BLOCKS_2016: can be queried for the last 2016 blocks, but without
guarantee for relaying/validating new blocks and transactions.
* NODE_NETWORK (which existed before) will imply NODE_VALIDATE and
guarantee availability of all historic blocks.

The idea is to separate the different responsibilities of network nodes
into separate bits, so they can - at some point - be
implemented independently. Perhaps we want more than just one degree (2016
blocks), maybe also 144 or 210000, but those can be added later if
necessary. I monitored the frequency of block depths requested from my
public node, and got this frequency distribution:
Loading Image... so it seems 2016 nicely matches the
set of frequently-requested blocks (indicating that few nodes are offline
for more than 2 weeks consecutively.

I'll write a BIP to formalize this, but wanted to get an idea of how much
support there is for a change like this.

Cheers,
--
Pieter
Mike Hearn
2013-04-28 16:29:11 UTC
Permalink
I'd imagined that nodes would be able to pick their own ranges to keep
rather than have fixed chosen intervals. "Everything or two weeks" is
rather restrictive - presumably node operators are constrained by physical
disk space, which means the quantity of blocks they would want to keep can
vary with sizes of blocks, cost of storage, etc.

Adding new fields to the addr message and relaying those fields to newer
nodes means every node could advertise the height at which it pruned. I
know it means a longer time before the data is available everywhere vs
service bits, but it seems like most nodes won't be pruning right away
anyway. There's plenty of time for upgrades. If an old node connected to a
new node and getdata-d blocks that had been pruned, immediate disconnection
should make the old node go find a different one. It means the combination
of old node+not run for a long time might take a while before it can find a
node that has what it wants, but that doesn't seem like a big deal.

What is the use case for NODE_VALIDATE? Nodes that throw away blocks almost
immediately? Why would a node do that?
Post by Pieter Wuille
Hello all,
I think it is time to move forward with pruning nodes, i.e. nodes that
fully validate and relay blocks and transactions, but which do not keep
(all) historic blocks around, and thus cannot be queried for these.
The biggest roadblock is making sure new and old nodes that start up are
able to find nodes to synchronize from. To help them find peers, I would
* NODE_VALIDATE: relay and validate blocks and transactions, but is only
guaranteed to answer getdata requests for (recently) relayed blocks and
transactions, and mempool transactions.
* NODE_BLOCKS_2016: can be queried for the last 2016 blocks, but without
guarantee for relaying/validating new blocks and transactions.
* NODE_NETWORK (which existed before) will imply NODE_VALIDATE and
guarantee availability of all historic blocks.
The idea is to separate the different responsibilities of network nodes
into separate bits, so they can - at some point - be
implemented independently. Perhaps we want more than just one degree (2016
blocks), maybe also 144 or 210000, but those can be added later if
necessary. I monitored the frequency of block depths requested from my
http://bitcoin.sipa.be/depth-small.png so it seems 2016 nicely matches
the set of frequently-requested blocks (indicating that few nodes are
offline for more than 2 weeks consecutively.
I'll write a BIP to formalize this, but wanted to get an idea of how much
support there is for a change like this.
Cheers,
--
Pieter
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Pieter Wuille
2013-04-28 16:44:52 UTC
Permalink
Post by Mike Hearn
I'd imagined that nodes would be able to pick their own ranges to keep
rather than have fixed chosen intervals. "Everything or two weeks" is
rather restrictive - presumably node operators are constrained by physical
disk space, which means the quantity of blocks they would want to keep can
vary with sizes of blocks, cost of storage, etc.
Sure, that's why eventually several levels may be useful.

Adding new fields to the addr message and relaying those fields to newer
Post by Mike Hearn
nodes means every node could advertise the height at which it pruned. I
know it means a longer time before the data is available everywhere vs
service bits, but it seems like most nodes won't be pruning right away
anyway. There's plenty of time for upgrades.
That's a more flexible model, indeed. I'm not sure how important speed of
propagation will be though - it may be very slow, given that there are
100000s of IPs circulating, and only a few are relayed in one go between
nodes. Even then, I'd like to see the "relay/validation" responsibility
split off from the "serve historic data" one, and have separate service
bits for those.
Post by Mike Hearn
If an old node connected to a new node and getdata-d blocks that had been
pruned, immediate disconnection should make the old node go find a
different one. It means the combination of old node+not run for a long time
might take a while before it can find a node that has what it wants, but
that doesn't seem like a big deal.
Disconnecting in case something is requested that isn't served seems like
an acceptable behaviour, yes. A specific message indicating data is pruned
may be more flexible, but more complex to handle too.

What is the use case for NODE_VALIDATE? Nodes that throw away blocks almost
Post by Mike Hearn
immediately? Why would a node do that?
NODE_VALIDATE doesn't say anything about which blocks are available, it
just means it relays and validates (and thus is not an SPV node). It can be
combined with NODE_BLOCKS_2016 if those blocks are also served.

The reason for splitting them is that I think over time these may be
handled by different implementations. You could have stupid
storage/bandwidth nodes that just keep the blockchain around, and others
that validate it. Even if that doesn't happen implementation-wise, I think
these are sufficiently independent functions to start thinking about them
as such.
--
Pieter
Mike Hearn
2013-04-28 16:57:53 UTC
Permalink
That's true. It can be perhaps be represented as "I keep the last N blocks"
and then most likely for any given node the policy doesn't change all that
fast, so if you know the best chain height you can calculate which nodes
have what.
Post by Pieter Wuille
Disconnecting in case something is requested that isn't served seems like
an acceptable behaviour, yes. A specific message indicating data is pruned
may be more flexible, but more complex to handle too.
Well, old nodes would ignore it and new nodes wouldn't need it?
Post by Pieter Wuille
The reason for splitting them is that I think over time these may be
handled by different implementations. You could have stupid
storage/bandwidth nodes that just keep the blockchain around, and others
that validate it. Even if that doesn't happen implementation-wise, I think
these are sufficiently independent functions to start thinking about them
as such.
Maybe so, with a "last N blocks" in addr messages though such nodes could
just set their advertised history to zero and not have to deal with serving
blocks to nodes.

If you have a node that serves the chain but doesn't validate it, how does
it know what the best chain is? Just whatever the hardest is?
Pieter Wuille
2013-05-03 12:30:19 UTC
Permalink
(generic comment on the discussion that spawned off: ideas about how to
allow additional protocols for block exchange are certainly interesting,
and in the long term we should certainly consider that. For now I'd like to
keep this about the more immediate way forward with making the P2P protocol
not break in the presence of pruning nodes)
Post by Mike Hearn
That's true. It can be perhaps be represented as "I keep the last N
blocks" and then most likely for any given node the policy doesn't change
all that fast, so if you know the best chain height you can calculate which
nodes have what.
Yes, I like that better than broadcasting the exact height starting at
which you serve (though I would put that information immediately in the
version announcement). I don't think we can rely on the addr broadcasting
mechanism for fast information exchange anyway. One more problem with this:
DNS seeds cannot convey this information (neither do they currently convey
service bits, but at least those can be indexed separately, and served
explicitly through asking for a specific subdomain or so).

So to summarize:
* Add a field to addr messages (after protocol number increase) that
maintains number of top blocks served)?
* Add a field to version message to announce the actual first block served?
* Add service bits to separately enable "relaying/verifying node" and
"serves (part of) the historic chain"? My original reason for suggesting
this was different, I think better compatibility with DNS seeds may be a
good reason for this. You could ask the seed first for a subset that at
least serves some part of the historic chain, until you hit a node that has
enough, and once caught up, ask for nodes that relay.

Disconnecting in case something is requested that isn't served seems like
Post by Mike Hearn
Post by Pieter Wuille
an acceptable behaviour, yes. A specific message indicating data is pruned
may be more flexible, but more complex to handle too.
Well, old nodes would ignore it and new nodes wouldn't need it?
I'm sure there will be cases where a new node connects based on outdated
information. I'm just stating that I agree with the generic policy of "if a
node requests something it should have known the peer doesn't serve, it is
fair to be disconnected."
Post by Mike Hearn
The reason for splitting them is that I think over time these may be
Post by Pieter Wuille
handled by different implementations. You could have stupid
storage/bandwidth nodes that just keep the blockchain around, and others
that validate it. Even if that doesn't happen implementation-wise, I think
these are sufficiently independent functions to start thinking about them
as such.
Maybe so, with a "last N blocks" in addr messages though such nodes could
just set their advertised history to zero and not have to deal with serving
blocks to nodes.
If you have a node that serves the chain but doesn't validate it, how does
it know what the best chain is? Just whatever the hardest is?
Maybe it validates, maybe it doesn't. What matters is that it doesn't
guarantee relaying fresh blocks and transactions. Maybe it does validate,
maybe it just stores any blocks, and uses a validating node to know what to
announce as best chain, or it uses an SPV mechanism to determine that. Or
it only validates and relays blocks, but not transactions. My point is that
"serving historic data" and "relaying fresh data" are separate
responsibilities, and there's no need to require them to be combined.
--
Pieter
Mike Hearn
2013-05-03 14:06:29 UTC
Permalink
Post by Pieter Wuille
Yes, I like that better than broadcasting the exact height starting at
which you serve (though I would put that information immediately in the
version announcement). I don't think we can rely on the addr broadcasting
DNS seeds cannot convey this information (neither do they currently convey
service bits, but at least those can be indexed separately, and served
explicitly through asking for a specific subdomain or so).
That's true, but we can extend the DNS seeding protocol a little bit - you
could query <current-chain-height>.dnsseed.whatever.com and the DNS server
then only returns nodes it knows matches your requirement.

This might complicate existing seeds a bit, and it's a bit of a hack, but
protocol-wise it's still possible. Of course if you want to add more
dimensions it gets uglier fast.
Peter Todd
2013-05-03 14:18:01 UTC
Permalink
Post by Mike Hearn
That's true, but we can extend the DNS seeding protocol a little bit - you
could query <current-chain-height>.dnsseed.whatever.com and the DNS server
then only returns nodes it knows matches your requirement.
If you're going to take a step like that, the <current-chain-height>
should be rounded off, perhaps to some number of bits, or you'll allow
DNS caching to be defeated.

Make clients check for the largest "rounded off" value first, and then
drill down if required. Some complexity involved...
Post by Mike Hearn
This might complicate existing seeds a bit, and it's a bit of a hack, but
protocol-wise it's still possible. Of course if you want to add more
dimensions it gets uglier fast.
Maybe I should make my blockheaders-over-dns thing production worthy
first so we can see how many ISP's come at us with pitchforks? :P
--
'peter'[:-1]@petertodd.org
00000000000000142de0244ee8fac516e7c0a29da1eafc0d43f2da8b6388b387
Mike Hearn
2013-05-03 15:02:26 UTC
Permalink
Post by Peter Todd
If you're going to take a step like that, the <current-chain-height>
should be rounded off, perhaps to some number of bits, or you'll allow
DNS caching to be defeated.
Don't the seeds already set small times? I'm not sure we want these
responses to be cacheable, otherwise there's a risk of a wall of traffic
suddenly showing up at one set of nodes if a large ISP caches a response.
(yes yes, I know, SPV node should be remembering addr broadcasts and such).
Peter Todd
2013-05-03 15:11:57 UTC
Permalink
Post by Mike Hearn
Post by Peter Todd
If you're going to take a step like that, the <current-chain-height>
should be rounded off, perhaps to some number of bits, or you'll allow
DNS caching to be defeated.
Don't the seeds already set small times? I'm not sure we want these
responses to be cacheable, otherwise there's a risk of a wall of traffic
suddenly showing up at one set of nodes if a large ISP caches a response.
(yes yes, I know, SPV node should be remembering addr broadcasts and such).
Hmm, on second thought you're probably right for the standard case where
it's really P2P. On the other hand it kinda limits us in the future if
seeds have high-bandwidth nodes they can just point clients too, but
maybe just assuming the DNS seed might need high bandwidth as well is
acceptable.

I dunno, given how badly behaved a lot of ISP dns servers are re:
caching, maybe we're better off keeping it simple.
--
'peter'[:-1]@petertodd.org
000000000000013bfdf35da40a40c35ccd75e09652ae541d94d26130a695f757
John Dillon
2013-05-04 18:07:42 UTC
Permalink
I think you too should ask yourself why you are putting so much effort into
optimizing a centralized service, the DNS seeds, rather than putting effort
into optimizing the P2P peer discovery instead. DNS seeds are a necessary evil,
one that shouldn't be promoted with additional features beyond simply obtaining
your initial set of peers.

After all Peter, just like you have implemented alternate block header
distribution over twitter, in the future we should have many different means of
peer discovery. Right now we have DNS seeds, a fixed list, and IRC discovery
that does not work because the servers it was pointed too no longer exist. Not
a good place to be.

Some random ideas:

search engines - search for "bitcoin seed address" or something and try IP's
found (twitter is similar)

ipv4 scanning - not exactly friendly, but the density of bitcoin nodes is
probably getting to the point where a brute force search is feasible

anycast peers - would work best with UDP probably, who has the resources to set
this up?


It is probably not worth the effort implementing the above immediately, but it
is worth the effort to ensure that we don't make the DNS seed system so complex
and sophisticated that we depend on it.
Jeff Garzik
2013-05-04 18:55:54 UTC
Permalink
On Sat, May 4, 2013 at 2:07 PM, John Dillon
Post by John Dillon
After all Peter, just like you have implemented alternate block header
distribution over twitter, in the future we should have many different means of
peer discovery. Right now we have DNS seeds, a fixed list, and IRC discovery
that does not work because the servers it was pointed too no longer exist. Not
a good place to be.
Let's not confuse bootstrapping with overall peer discovery.

Peer exchange between P2P nodes is the primary and best method of
obtaining free peers.

Obviously you need to bootstrap into that, though. DNS seed and fixed
list are those bootstrap methods (IRC code was deleted), but are only
used to limp along until you can contact a real P2P node, at which
point peer discovery truly begins.
--
Jeff Garzik
exMULTI, Inc.
***@exmulti.com
John Dillon
2013-05-05 13:12:15 UTC
Permalink
Sorry I should have used the word bootstrapping there rather than discovery.
But again I think that shows my point clearly. Centralized methods like DNS
should be used for as little as possible, just simple initial bootstrapping,
and focus the development efforts towards the non-centralized peer discovery
mechanisms.
Mike Hearn
2013-05-06 08:19:35 UTC
Permalink
You are welcome to optimise P2P addr broadcasts or develop better bootstrap
mechanisms.


On Sun, May 5, 2013 at 3:12 PM, John Dillon
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Sorry I should have used the word bootstrapping there rather than discovery.
But again I think that shows my point clearly. Centralized methods like DNS
should be used for as little as possible, just simple initial
bootstrapping,
and focus the development efforts towards the non-centralized peer discovery
mechanisms.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQEcBAEBCAAGBQJRhlpyAAoJEEWCsU4mNhiP+NwH/3RY5vBpSYkwKgTmdKHRc/gw
BJCSV/1MEDECgBTxaRYSzYZyargjsdG50KaIaCq8M1+8DWkBEkH8JFif7UYMlZGM
WROMP6UjAnP1fJ3B2JChdMgRv1HdXJQDQVcO8UnSJschhX8lZZiUySbaqIPuRuV/
lI7/JkUZvmnms4+HGiaqwfbPO0k6ytJNKxORrk4TzFnThh4dy9WytElc8JHZOFaQ
ly159X5JuEwh8DLOoUtPhaR6tJaJbJLBEt+QJiGnSktPsJCE8p9+4HQ0kMCQr3Ha
05EHTZEw+TqEPaA7vFLgA/9tWjK9s1Y6sqLOAYiLp/0wSKzCkBO0C5LWFHsJ/XQ=
=aCgi
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Pieter Wuille
2013-05-06 13:13:55 UTC
Permalink
Post by Mike Hearn
You are welcome to optimise P2P addr broadcasts or develop better bootstrap
mechanisms.
I think John's actually has a point here. If we're judging the quality of a
protocol change by how compatible it is with DNS seeding, then we're clearly not
using DNS seeding as seeding anymore (=getting an entry point into the P2P
network), but as a mechanism for choosing (all) peers.

Eventually, I think it makes sense to move to a system where you get seeds from
a DNS (or other mechanism), connect to one or a few of the results, do a getaddr,
fill your peer IP database with it, and disconnect from the DNS seeded peer.

This probably means we need to look at ways to optimize current peer exchange,
but that's certainly welcome in any case.
--
Pieter
Gregory Maxwell
2013-04-28 19:50:22 UTC
Permalink
Post by Mike Hearn
I'd imagined that nodes would be able to pick their own ranges to keep
rather than have fixed chosen intervals. "Everything or two weeks" is rather
X most recent is special for two reasons: It meshes well with actual demand,
and the data is required for reorganization.

So whatever we do for historic data, N most recent should be treated
specially.

But I also agree that its important that <everything> be splittable into ranges
because otherwise when having to choose between serving historic data
and— say— 40 GB storage, a great many are going to choose not to serve
historic data... and so nodes may be willing to contribute 4-39 GB storage
to the network there will be no good way for them to do so and we may end
up with too few copies of the historic data available.

As can be seen in the graph, once you get past the most recent 4000
blocks the probability is fairly uniform... so "N most recent" is not a
good way to divide load for the older blocks. But simple ranges— perhaps
quantized to groups of 100 or 1000 blocks or something— would work fine.

This doesn't have to come in the first cut, however— and it needs new
addr messages in any case.
John Dillon
2013-04-29 02:57:53 UTC
Permalink
Post by Gregory Maxwell
But I also agree that its important that <everything> be splittable into ranges
because otherwise when having to choose between serving historic data
and— say— 40 GB storage, a great many are going to choose not to serve
historic data... and so nodes may be willing to contribute 4-39 GB storage
to the network there will be no good way for them to do so and we may end
up with too few copies of the historic data available.
Have we considered just leaving that problem to a different protocol such as
BitTorrent? Offering up a few GB of storage capacity is a nice idea but it
means we would soon have to add structure to the network to allow nodes to find
each other to actually get that data. BitTorrent already has that issue thought
through carefully with it's DHT support.

What are the logistics of either integrating a DHT capable BitTorrent client,
or just calling out to some library? We could still use the Bitcoin network to
bootstrap the BitTorrent DHT.
Gregory Maxwell
2013-04-29 03:36:49 UTC
Permalink
On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
Post by John Dillon
Have we considered just leaving that problem to a different protocol such as
BitTorrent? Offering up a few GB of storage capacity is a nice idea but it
means we would soon have to add structure to the network to allow nodes to find
each other to actually get that data. BitTorrent already has that issue thought
through carefully with it's DHT support.
I think this is not a great idea on a couple levels—

Least importantly, our own experience with tracker-less torrents on
the bootstrap files that they don't work very well in practice— and
thats without someone trying to DOS attack it.

More importantly, I think it's very important that the process of
offering up more storage not take any more steps. The software could
have user overridable defaults based on free disk space to make
contributing painless. This isn't possible if it takes extra software,
requires opening additional ports.. etc. Also means that someone
would have to be constantly creating new torrents, there would be
issues with people only seeding the old ones, etc.

It's also the case that bittorrent is blocked on many networks and is
confused with illicit copying. We would have the same problems with
that that we had with IRC being confused with botnets.

We already have to worry about nodes finding each other just for basic
operation. The only addition this requires is being able to advertise
what parts of the chain they have.
Post by John Dillon
What are the logistics of either integrating a DHT capable BitTorrent client,
or just calling out to some library? We could still use the Bitcoin network to
bootstrap the BitTorrent DHT.
Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
more reliable, but then again it might cause commercial services that
are in the business of poisoning the bittorrent DHT to target the
Bitcoin network.

Integration also brings up the question of network exposed attack surface.

Seems like it would be more work than just adding the ability to add
ranges to address messages. I think we already want to revise the
address message format in order to have signed flags and to support
I2P peers.
Robert Backhaus
2013-04-29 03:42:46 UTC
Permalink
While I like the idea of a client using a DHT blockchain or UTXO list, I
don't think that the reference client is the place for it. But it would
make for a very interesting experimental project!
Post by Gregory Maxwell
On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
Post by John Dillon
Have we considered just leaving that problem to a different protocol
such as
Post by John Dillon
BitTorrent? Offering up a few GB of storage capacity is a nice idea but
it
Post by John Dillon
means we would soon have to add structure to the network to allow nodes
to find
Post by John Dillon
each other to actually get that data. BitTorrent already has that issue
thought
Post by John Dillon
through carefully with it's DHT support.
I think this is not a great idea on a couple levels—
Least importantly, our own experience with tracker-less torrents on
the bootstrap files that they don't work very well in practice— and
thats without someone trying to DOS attack it.
More importantly, I think it's very important that the process of
offering up more storage not take any more steps. The software could
have user overridable defaults based on free disk space to make
contributing painless. This isn't possible if it takes extra software,
requires opening additional ports.. etc. Also means that someone
would have to be constantly creating new torrents, there would be
issues with people only seeding the old ones, etc.
It's also the case that bittorrent is blocked on many networks and is
confused with illicit copying. We would have the same problems with
that that we had with IRC being confused with botnets.
We already have to worry about nodes finding each other just for basic
operation. The only addition this requires is being able to advertise
what parts of the chain they have.
Post by John Dillon
What are the logistics of either integrating a DHT capable BitTorrent
client,
Post by John Dillon
or just calling out to some library? We could still use the Bitcoin
network to
Post by John Dillon
bootstrap the BitTorrent DHT.
Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
more reliable, but then again it might cause commercial services that
are in the business of poisoning the bittorrent DHT to target the
Bitcoin network.
Integration also brings up the question of network exposed attack surface.
Seems like it would be more work than just adding the ability to add
ranges to address messages. I think we already want to revise the
address message format in order to have signed flags and to support
I2P peers.
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
John Dillon
2013-04-29 03:48:18 UTC
Permalink
Post by Gregory Maxwell
On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
Post by John Dillon
Have we considered just leaving that problem to a different protocol such as
BitTorrent? Offering up a few GB of storage capacity is a nice idea but it
means we would soon have to add structure to the network to allow nodes to find
each other to actually get that data. BitTorrent already has that issue thought
through carefully with it's DHT support.
I think this is not a great idea on a couple levels—
Least importantly, our own experience with tracker-less torrents on
the bootstrap files that they don't work very well in practice— and
thats without someone trying to DOS attack it.
Unfortunate. What makes them not work out? DHT torrents seem pretty popular.
Post by Gregory Maxwell
More importantly, I think it's very important that the process of
offering up more storage not take any more steps. The software could
have user overridable defaults based on free disk space to make
contributing painless. This isn't possible if it takes extra software,
requires opening additional ports.. etc. Also means that someone
would have to be constantly creating new torrents, there would be
issues with people only seeding the old ones, etc.
Now don't get me wrong, I'm not proposing we do this if it requires additional
steps or other software. I only mean if it is possible in an easy way to
integrate the BitTorrent technology into Bitcoin in an automatic fashion. Yes
part of that may have to be finding a way to re-use the existing port for
instance.
Post by Gregory Maxwell
We already have to worry about nodes finding each other just for basic
operation. The only addition this requires is being able to advertise
what parts of the chain they have.
Sure I guess my concern is more how do you find the specific part of the chian
you need without some structure to the network? Although I guess it may be
enough to just add that structure or depend on just walking the nodes
advertising themselves until you find what you want.

We can build this stuff incrementally I'll agree. It won't be the case that one
in a thousand nodes serve up the part of the chain you need overnight. So many
I am over engineering the solution with BitTorrent.
Post by Gregory Maxwell
Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
more reliable, but then again it might cause commercial services that
are in the business of poisoning the bittorrent DHT to target the
Bitcoin network.
Good point. Sadly one that may apply to the Tor network too in the future.
Peter Todd
2013-04-29 03:55:23 UTC
Permalink
Post by John Dillon
We can build this stuff incrementally I'll agree. It won't be the case that one
in a thousand nodes serve up the part of the chain you need overnight. So many
I am over engineering the solution with BitTorrent.
I think that pretty much sums it up.

With the block-range served in the anounce message you just need to find
an annoucement with the right range, and at worst connect to a few more
node to get what you need. It will be a long time before the bandwidth
used for finding a node with the part of the chain that you need is a
significant fraction of the load required for downloading the data
itself.

Remember that BitTorrent's DHT is a system giving you access to tens of
petabytes worth of data. The Bitcoin blockchain on the other hand simply
can't grow more than 57GiB per year. It's a cute idea though.


Also, while we're talking about the initial download:

http://blockchainbymail.com

Lots of options out there.
--
'peter'[:-1]@petertodd.org
Jay F
2013-04-29 06:10:54 UTC
Permalink
Post by Peter Todd
Post by John Dillon
We can build this stuff incrementally I'll agree. It won't be the case that one
in a thousand nodes serve up the part of the chain you need overnight. So many
I am over engineering the solution with BitTorrent.
I think that pretty much sums it up.
With the block-range served in the anounce message you just need to find
an annoucement with the right range, and at worst connect to a few more
node to get what you need.
One of the technologies that can be borrowed from Bittorrent (besides
downloading from multiple peers at once) is analysis by clients of the
part distribution, which allows a client to download and share the
least-propagated parts first to maintain high availability of the whole
file, even when not one individual currently has downloaded the complete
file (the seed has left the swarm).

Unlike Bittorrent, a partial-blockchain swarm client needs to make
informed decisions about how much to download, such as rules like "until
it sees at least 20 complete blockchain-equivalents in the swarm",
"until it has 10% of the blockchain itself", "work backwards, all blocks
from the hash tree required to verify my payments" or other minimums
that might all be criteria.

Bittorrent only considers directly connected peers' piecemaps when
making decisions of what to download. Bitcoin, however, already has a
protocol to allow peer discovery beyond the connected nodes; this could
be extended to communicate what parts the peer is hosting. Careful
thought into attack vectors would need to be paid in design, so that
only a majority of outbound-connected peers's advertisement are able to
inform consensus about part or peer availability, messages able to
remove a peer or part availability from other's lists are confirmed
independently without such removal verification generating DDOS traffic
amplification, lying clients can be marked as discovered by the
majority, etc.

Such thought doesn't have to be paid if directly implementing
Bittorrent, but it has the burden of centralized trackers or expensive
DHT, and it also doesn't have any logic informing it besides "don't quit
until I get the whole file".
Rebroad (sourceforge)
2013-04-30 16:14:56 UTC
Permalink
As part of a roadmap for block downloading, I think this may be a good time
to look into providing an HTTP/HTTPS protocol for block downloading - this
would also allow web proxies to cache blocks and thus make it more
accessible, as well as cater for resumeable downloads.
Jeff Garzik
2013-04-30 18:04:59 UTC
Permalink
On Tue, Apr 30, 2013 at 12:14 PM, Rebroad (sourceforge)
Post by Rebroad (sourceforge)
As part of a roadmap for block downloading, I think this may be a good time
to look into providing an HTTP/HTTPS protocol for block downloading - this
would also allow web proxies to cache blocks and thus make it more
accessible, as well as cater for resumeable downloads.
Speaking generally, I've always been a supporter of finding new and
creative ways to store and transmit blocks. The more diversity, the
less likely bitcoin can be shut down worldwide.

HTTP is fine, but you run into many issues with large files. You
would need a very well defined HTTP-retrievable layout, with proper
HTTP headers along the entire path, if you want web caches to function
properly. You need HTTP byte range support, HTTP 1.1 keep-alives, and
other features for resuming large, interrupted downloads.

The format currently used by bitcoind would be just fine --
blocks/blkNNNN.dat for raw data, size-limited well below 1GB. Just
need to add a small metadata download, and serve the raw block files.
--
Jeff Garzik
exMULTI, Inc.
***@exmulti.com
Andy Parkins
2013-04-30 19:27:10 UTC
Permalink
Post by Jeff Garzik
The format currently used by bitcoind would be just fine --
blocks/blkNNNN.dat for raw data, size-limited well below 1GB. Just
need to add a small metadata download, and serve the raw block files.
That doesn't seem very generic. It's tied far too much to the current storage
format of bitcoind.

Wouldn't it be better to add support for more bitcoin-protocol-oriented HTTP
requests? Then any client can supply the same interface, rather than being
forced to create blkNNNN.dat on the fly?

http://bitcoind.example.com/block/BBBBBBBBBBBBBBBBBBBBBBB
http://bitcoind.example.com/tx/TTTTTTTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/block/oftx/TTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/peers
http://bitcoind.example.com/peer/nnn

Essentially: block explorer's raw mode but in every bitcoind. The hardest
operation for light clients is finding out the block that contains a
particular transaction -- something that bitcoind already knows.

I'd like to see support for HTTP POST/PUT of signed transactions and block
announcements too.



Andy
--
Dr Andy Parkins
***@gmail.com
Jeff Garzik
2013-04-30 20:11:47 UTC
Permalink
Post by Andy Parkins
Post by Jeff Garzik
The format currently used by bitcoind would be just fine --
blocks/blkNNNN.dat for raw data, size-limited well below 1GB. Just
need to add a small metadata download, and serve the raw block files.
That doesn't seem very generic. It's tied far too much to the current storage
format of bitcoind.
Hardly. The storage format is bitcoin protocol wire format, plus a
tiny header. It is supported in multiple applications already, and is
the most efficient storage format for bitcoin protocol blocks.
Post by Andy Parkins
Wouldn't it be better to add support for more bitcoin-protocol-oriented HTTP
requests? Then any client can supply the same interface, rather than being
forced to create blkNNNN.dat on the fly?
You don't have to create anything on the fly, if you store blocks in
their native P2P wire protocol format.
Post by Andy Parkins
http://bitcoind.example.com/block/BBBBBBBBBBBBBBBBBBBBBBB
http://bitcoind.example.com/tx/TTTTTTTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/block/oftx/TTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/peers
http://bitcoind.example.com/peer/nnn
Essentially: block explorer's raw mode but in every bitcoind. The hardest
operation for light clients is finding out the block that contains a
particular transaction -- something that bitcoind already knows.
This is a whole new client interface. It's fun to dream this up, but
it is far outside the scope of an efficient HTTP protocol that
downloads blocks.

Your proposal is closer to a full P2P rewrite over HTTP (or a proxy thereof).
--
Jeff Garzik
exMULTI, Inc.
***@exmulti.com
Andy Parkins
2013-05-01 14:05:03 UTC
Permalink
Post by Jeff Garzik
Hardly. The storage format is bitcoin protocol wire format, plus a
tiny header. It is supported in multiple applications already, and is
the most efficient storage format for bitcoin protocol blocks.
"Most efficient" for what purpose? There is more that one might do than just
duplicate bitcoind exactly. I can well imagine storing bitcoin blocks parsed
and separated out into database fields.
Post by Jeff Garzik
Post by Andy Parkins
Wouldn't it be better to add support for more bitcoin-protocol-oriented
HTTP requests? Then any client can supply the same interface, rather
than being forced to create blkNNNN.dat on the fly?
You don't have to create anything on the fly, if you store blocks in
their native P2P wire protocol format.
If. What if I'm writing a client and don't want to store them the way
bitcoind has?
Post by Jeff Garzik
This is a whole new client interface. It's fun to dream this up, but
it is far outside the scope of an efficient HTTP protocol that
downloads blocks.
Except the alternative is no schema at all -- essentially it's just give
access to a file on disk. Well, that hardly needs discussion at all, and it
hardly needs the involvement of bitcoind, apache could do it right now.
Post by Jeff Garzik
Your proposal is closer to a full P2P rewrite over HTTP (or a proxy thereof).
I don't think it's a "rewrite". The wire protocol is only a small part of
what bitcoind does. Adding another thread listening for HTTP requests at the
same time as on 8333 for stadnard format.

Anyway -- I've obviously misunderstood what the idea behind a HTTP protocol
was, and it's not like I was volunteering to do any of the work ;-)



Andy
--
Dr Andy Parkins
***@gmail.com
Jeff Garzik
2013-05-01 14:26:57 UTC
Permalink
Post by Andy Parkins
Post by Jeff Garzik
Hardly. The storage format is bitcoin protocol wire format, plus a
tiny header. It is supported in multiple applications already, and is
the most efficient storage format for bitcoin protocol blocks.
"Most efficient" for what purpose? There is more that one might do than just
duplicate bitcoind exactly. I can well imagine storing bitcoin blocks parsed
and separated out into database fields.
[...]
Post by Andy Parkins
Post by Jeff Garzik
You don't have to create anything on the fly, if you store blocks in
their native P2P wire protocol format.
If. What if I'm writing a client and don't want to store them the way
bitcoind has?
That posits -expanding- blocks from their native form into a larger
form, and then squashing them back down again upon request. A lot of
extra work from the point of view of clients downloading blocks
themselves.

But sure, if you want to do it, yes, it is possible to design an
interface like that.
Post by Andy Parkins
Post by Jeff Garzik
This is a whole new client interface. It's fun to dream this up, but
it is far outside the scope of an efficient HTTP protocol that
downloads blocks.
Except the alternative is no schema at all -- essentially it's just give
access to a file on disk. Well, that hardly needs discussion at all, and it
hardly needs the involvement of bitcoind, apache could do it right now.
Correct, Apache today could easily serve an HTTP-based layout of
blkNNNN.dat, plus a tiny JSON metadata file.

That's not "no schema", just a different layout.
Post by Andy Parkins
Post by Jeff Garzik
Your proposal is closer to a full P2P rewrite over HTTP (or a proxy thereof).
I don't think it's a "rewrite". The wire protocol is only a small part of
what bitcoind does. Adding another thread listening for HTTP requests at the
same time as on 8333 for stadnard format.
Anyway -- I've obviously misunderstood what the idea behind a HTTP protocol
was, and it's not like I was volunteering to do any of the work ;-)
In the context of this thread: distributing and downloading blocks.
All current users require the native binary block format.

A generalized HTTP REST query protocol would be a nice addition... it
is just off-topic for this thread. On IRC yesterday, we discussed an
HTTP query interface like you suggested. It was agreed that it was a
nice interface, and might be a nice addition to bitcoind.

That is a separate topic for a separate email thread, though.

As an example, see the pull request I wrote for an HTTP REST interface
that downloads an encrypted wallet backup:
https://github.com/bitcoin/bitcoin/pull/1982
--
Jeff Garzik
exMULTI, Inc.
***@exmulti.com
Andy Parkins
2013-05-01 14:34:27 UTC
Permalink
Post by Jeff Garzik
A generalized HTTP REST query protocol would be a nice addition... it
is just off-topic for this thread. On IRC yesterday, we discussed an
HTTP query interface like you suggested. It was agreed that it was a
nice interface, and might be a nice addition to bitcoind.
That is a separate topic for a separate email thread, though.
As an example, see the pull request I wrote for an HTTP REST interface
https://github.com/bitcoin/bitcoin/pull/1982
Fair enough.

I'm usually behind the state-of-the-art when I suggest things here :-) I
should just trust you guys have already planned everything I might think of.


Andy
--
Dr Andy Parkins
***@gmail.com
Simon Barber
2013-04-30 19:31:38 UTC
Permalink
And then the problem of what domain name to use - ideally a single name
would be used so caches had the maximum chance to reuse content. To
keep the network distributed perhaps the existing DNS seed mechanism
could be used - a few names, each serving a random bitcoind's address.
Put :8333 after the name, and enhance bitcoind to respond to HTTP and
p2p caching would be used!

Simon
Post by Andy Parkins
Post by Jeff Garzik
The format currently used by bitcoind would be just fine --
blocks/blkNNNN.dat for raw data, size-limited well below 1GB. Just
need to add a small metadata download, and serve the raw block files.
That doesn't seem very generic. It's tied far too much to the current storage
format of bitcoind.
Wouldn't it be better to add support for more bitcoin-protocol-oriented HTTP
requests? Then any client can supply the same interface, rather than being
forced to create blkNNNN.dat on the fly?
http://bitcoind.example.com/block/BBBBBBBBBBBBBBBBBBBBBBB
http://bitcoind.example.com/tx/TTTTTTTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/block/oftx/TTTTTTTTTTTTTTTTTTT
http://bitcoind.example.com/peers
http://bitcoind.example.com/peer/nnn
Essentially: block explorer's raw mode but in every bitcoind. The hardest
operation for light clients is finding out the block that contains a
particular transaction -- something that bitcoind already knows.
I'd like to see support for HTTP POST/PUT of signed transactions and block
announcements too.
Andy
Brenton Camac
2013-04-30 20:06:12 UTC
Permalink
Sounds like this part of Bitcoin (block sharing) would definitely benefit from having a REST (HTTP) API.

REST-based web APIs are a common feature of most online services these days. Makes writing other client services so much easier. Plus you get the benefit of the HTTP ecosystem for free (HTTP caches, etc).


- Brenton Camac
Post by Jeff Garzik
On Tue, Apr 30, 2013 at 12:14 PM, Rebroad (sourceforge)
Post by Rebroad (sourceforge)
As part of a roadmap for block downloading, I think this may be a good time
to look into providing an HTTP/HTTPS protocol for block downloading - this
would also allow web proxies to cache blocks and thus make it more
accessible, as well as cater for resumeable downloads.
Speaking generally, I've always been a supporter of finding new and
creative ways to store and transmit blocks. The more diversity, the
less likely bitcoin can be shut down worldwide.
HTTP is fine, but you run into many issues with large files. You
would need a very well defined HTTP-retrievable layout, with proper
HTTP headers along the entire path, if you want web caches to function
properly. You need HTTP byte range support, HTTP 1.1 keep-alives, and
other features for resuming large, interrupted downloads.
The format currently used by bitcoind would be just fine --
blocks/blkNNNN.dat for raw data, size-limited well below 1GB. Just
need to add a small metadata download, and serve the raw block files.
--
Jeff Garzik
exMULTI, Inc.
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Bitcoin-development mailing list
https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Jeff Garzik
2013-05-01 13:46:08 UTC
Permalink
Post by Pieter Wuille
Hello all,
I think it is time to move forward with pruning nodes, i.e. nodes that fully
validate and relay blocks and transactions, but which do not keep (all)
historic blocks around, and thus cannot be queried for these.
The biggest roadblock is making sure new and old nodes that start up are
able to find nodes to synchronize from. To help them find peers, I would
* NODE_VALIDATE: relay and validate blocks and transactions, but is only
guaranteed to answer getdata requests for (recently) relayed blocks and
transactions, and mempool transactions.
* NODE_BLOCKS_2016: can be queried for the last 2016 blocks, but without
guarantee for relaying/validating new blocks and transactions.
* NODE_NETWORK (which existed before) will imply NODE_VALIDATE and guarantee
availability of all historic blocks.
In general, I support this, as anybody on IRC knows.

Though it does seem to open the question about snapshotting.

Personally, it seems important to enable running a fully validating
node, that may bootstrap from a UTXO snapshot + all blocks since that
snapshot.

NODE_BLOCKS_2016, in particular, is too short. For users, I've seen
plenty of use cases in the field where you start a network sync after
a 2-week period.

Set a regular interval for creating a UTXO snapshot, say 3 months
(6*2016 blocks), and serve all blocks after that snapshot. For older
nodes, they would contact an archive node or torrent for >3 month
blocks, and then download normally <= 3 month blocks (if the archive
node didn't serve up to present day).

Where are we on nailing down a stable, hash-able UTXO serialization?
--
Jeff Garzik
exMULTI, Inc.
***@exmulti.com
Continue reading on narkive:
Loading...