Discussion:
[Bitcoin-development] Cartographer
Mike Hearn
2014-12-28 18:25:29 UTC
Permalink
Hi there!

Lately we have been bumping up against the limitations of DNS as a protocol
for learning about the p2p network. As a proposal for how to address this,
I have written a new network crawler and seed:

https://github.com/mikehearn/httpseed

It implements a standard DNS seed with a minimal embedded DNS server (you
can find one running at dnsseed.vinumeris.com) and also has the following
extra features:


- Can serve seed data using gzipped, timestamped digitally signed
protocol buffers over HTTP. This fixes authentication, auditability,
malware false positives and extensibility. The signature uses secp256k1.
SSL is *not* used, to simplify deployment and to allow ISPs to cache the
results transparently when a future version sets cache control headers.
- Can additionally serve data in JSON, XML and HTML (examples for json
<http://vinumeris.com:8081/peers.json> xml
<http://vinumeris.com:8081/peers.xml> html
<http://vinumeris.com:8081/peers.html>) for ease of use with other
tools, like web browsers.
- Results can be restricted using query parameters, e.g. for a service
flags bit mask. Cartographer tests nodes that set service bit 2 to see if
they really support BIP 64, and this requirement can also be specified as
an argument to the query.
- Crawl speed can be specified in terms of successful connects per
second, rather than the number-of-threads approach used by other crawlers.
- Can export statistics and controls using JMX, so you can reconfigure
it at runtime and view charts of things like connects/sec or CPU usage
using any JMX console, like Mission Control.
- A client for it is in bitcoinj master branch.

To provide all these features Cartographer relies heavily on libraries and
is written in a concise new language called Kotlin <http://kotlinlang.org/>,
so it fits in about 650 lines of code. Kotlin is easy to learn for anyone
who knows Scala or Java, so it should be straightforward to hack on and
there is no chance of any buffer/heap exploits in the DNS, HTTP or Bitcoin
protocol stacks.

In the new year I will probably write a BIP describing the protocol. For
now you can see the definition here
<https://github.com/mikehearn/httpseed/blob/master/src/main/peerseeds.proto>
or just read the textual forms from the links above. It's pretty self
explanatory. I hope that in future other DNS seeds will start supporting
this protocol too, as it has many advantages.

Future versions might include data like how long the peer has been around,
node keys if we add auth/encrypt support to the p2p protocol and so on.
Thomas Zander
2014-12-29 08:47:29 UTC
Permalink
Post by Mike Hearn
Lately we have been bumping up against the limitations of DNS as a protocol
for learning about the p2p network.
Can you explain further where limitations and problems were hit?
--
Thomas Zander
Peter Todd
2014-12-29 10:39:52 UTC
Permalink
A big one is the privacy is way too good: every DNS request goes through multiple levels of caching and indirection, so there's no way to figure out who made the request to subject them additional targeting.

A connection-oriented protocol gets rid of all those protections, giving us seed operators monetisation opportunities like selling usage statistics, per-client targeted results, etc. We recently got rid of all the "call-home" functionality that previously gave this type of insight; a connecyion-oriented seed protocol gives us this right back.

There's also this pesky problem of ISP's censoring DNS results with dumb automated systems to block malware - easily fixed with Gregory Maxwell's suggestion of permuting the results with XOR - but that kind of end-user driven solution really misses out in the needs of other Bitcoin stakeholders like law enforcement and marketing companies.
Post by Mike Hearn
Post by Mike Hearn
Lately we have been bumping up against the limitations of DNS as a
protocol
Post by Mike Hearn
for learning about the p2p network.
Can you explain further where limitations and problems were hit?
Mike Hearn
2014-12-29 11:30:42 UTC
Permalink
Post by Thomas Zander
Can you explain further where limitations and problems were hit?
Well, look at the feature list :)

The biggest need from my POV is querying support. It's awkward to try and
retrofit flexible key=value pair queries onto DNS, it just wasn't designed
for that. With HTTP it's easy. This will become more important in future as
the protocol evolves. For example, some nodes will soon stop serving the
block chain because they start pruning. Today this is managed with a hack:
pruning nodes just stop providing *all* services to the p2p network. This
takes them out of the DNS seeds entirely. But they can actually still
provide download of the parts of the chain they still have, and they can
provide transaction filtering and relay support, along with misc other
things.

With the current DNS protocol you get an all or nothing choice - so
probably seeds that only support it will elect to only show nodes that have
the entire block chain, because that's what Bitcoin Core would find most
useful. SPV wallets have slightly different needs.

In theory you could come up with a pile of hacks to specify what the client
needs in the DNS query, but then you have a v2 protocol anyway and might as
well go the whole way.

Additionally, with DNS it's awkward to provide extra data in the responses
beyond IP address and it's VERY awkward to sign the responses. Signing the
responses has a couple of benefits. The biggest is, in future it'd be nice
to have an authenticated and encrypted network, to raise the bar for sybil
and MITM attacks. DNS seeding can't be upgraded to support that with any
reasonable level of effort. And DNS is awkward to configure/set up.
Actually DNS is just awkward, period.

The second benefit of signing is it provides audibility. If you see a seed
give bad answers, you can now prove it to other people.

There is also the previously discussed issue that DNS seeds sometimes get
blocked by aggressive networks because they start serving IPs that are
infected with malware i.e. they look like fast-flux sites.

Using a simple HTTP based protocol fixes all of these problems in one go.

Now, re: privacy.

Firstly, Peter, get help. I'm serious. You are starting to sound like an
auto-generated parody of yourself. When you can't debate something as
boring as HTTP vs DNS without trying to raise an angry mob using bizarre
conspiracy theories, that's not normal.

I don't think the "DNS has caches" issue is worth worrying about for a lot
of reasons:

1. Full nodes try as hard as they can to open ports and advertise their
IP addresses to everyone. Even if you change the defaults to disable that,
you're about to connect to a bunch of random computers with no reputation
anyway.

2. Lists of stale IP addresses are hardly useful to regular people and
network operators can identify Bitcoin users by looking for traffic on port
8333, so it's unclear what threat model is being addressed here.

3. The biggest users of the seeds are all SPV wallets. Every single one
of these already phones home to check for online updates.

4. DNS proxying only hides part of the IP address. If you're serious
about this you want to be doing lookups via Tor. Whilst it's possible to
use the DNS seeds via Tor in a reasonable way with exit diversity (and
bitcoinj does), doing it requires low level Tor protocol programming
<https://github.com/bitcoinj/bitcoinj/blob/42f9d7c193fcd56fda7691b0ea934bae9d23f2d6/core/src/main/java/org/bitcoinj/net/discovery/TorDiscovery.java#L260>
that is out of reach for most implementors. An HTTP lookup is trivial with
any HTTP stack that supports SOCKS.

5. ISPs also deploy transparent HTTP caches. The Cartographer protocol
uses HTTP with inline signing so responses can be cached, once the right
headers are being set.

tl;dr it's unclear that DNS caching actually solves any real privacy
problem but regardless, you can get the same distributed caching with HTTP
as with DNS. So in the end it makes little difference.

I believe that Cartographer is a better protocol all round and there are no
costs beyond the one-time upgrades, but even if for some reason you
disagree with the five privacy points above, I think the benefits still
massively outweigh the costs.
Thomas Zander
2014-12-29 11:49:45 UTC
Permalink
Post by Mike Hearn
With the current DNS protocol you get an all or nothing choice
Its a seed. Not the protocol itself.
Post by Mike Hearn
Firstly, Peter, get help. I'm serious.
I think most would agree with me that, Mike, this answer is not just a little
over the line, its unacceptable behavior in any collaborative group.
Please be respectful and avoid ad-hominem attacks.
--
Thomas Zander
Peter Todd
2014-12-29 11:59:16 UTC
Permalink
Post by Thomas Zander
Post by Mike Hearn
With the current DNS protocol you get an all or nothing choice
Its a seed. Not the protocol itself.
Post by Mike Hearn
Firstly, Peter, get help. I'm serious.
I think most would agree with me that, Mike, this answer is not just a little
over the line, its unacceptable behavior in any collaborative group.
Please be respectful and avoid ad-hominem attacks.
Yes I agree, Mike shouldn't be making ad-hominim attacks by calling people "a parody"

You'll note my response however carefully avoiding talking about the person who originated the idea, and merely stuck to criticising - via parody - the idea itself.
Btc Drak
2014-12-29 12:13:41 UTC
Permalink
Mike,

In all seriousness, are you on the payroll of the NSA or similar to
repeatedly attempt to introduce privacy leaks[1] and weaknesses[2] into the
ecosystem not to mention logical fallacies like ad hominem attacks;
disruption[3] and FUD[4]?

Why do you answer objections by hand waving and misdirection as opposed to
sound technical reasoning? Remember how hand waving ended for you the last
time with your p2p getutxo pull-request[5] and the public flogging the
ensued because you refused to accept your implementation was not only
flawed but critically vulnerable to attack[6].

Given your intelligence, education and experience, it would seem logical
that your behaviour is not random or irrational, but in fact calculated and
planned.

references:
[1]
http://www.reddit.com/r/Bitcoin/comments/2byqz0/mike_hearn_proposes_to_build_vulnerable/
[2]
https://www.reddit.com/r/Bitcoin/comments/1qmbtu/mike_hearn_chair_of_the_bitcoin_foundations_law/
[3]
http://www.reddit.com/r/Bitcoin/comments/28zts3/mike_hearn_interview_quotes_progress_on_the/
[4]

[5] https://github.com/bitcoin/bitcoin/pull/4351
[6]
http://www.reddit.com/r/Bitcoin/comments/2eq8gi/getutxos_a_convenient_way_to_crash_bitcoind/
Mistr Bigs
2014-12-29 13:08:52 UTC
Permalink
As an outside observer, I have to say I also found Peter's sardonic message
tone inappropriate for furthering the discussion.

Loading...