Discussion:
BIP Proposal: Compact Client Side Filtering for Light Clients
(too old to reply)
Olaoluwa Osuntokun via bitcoin-dev
2017-06-01 19:01:14 UTC
Permalink
Raw Message
Hi y'all,

Alex Akselrod and I would like to propose a new light client BIP for
consideration:
* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki

This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.

One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
optimization attempting to optimize the following sum:
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
tweaking the false positive rate in addition to the following variables:
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
the Geometric Distribution. The calculator can be found here:
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.

We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.

Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
total size of the index comes out to 6.9GB. The break down is as follows:

* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508

In order to consider the average+median filter sizes in a world worth
larger blocks, I also ran the index for testnet:

* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731

Finally, here are the testnet stats which take into account the increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).

* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631

For those that are interested in the raw data, I've uploaded a CSV file
of raw data for each block (mainnet + testnet), which can be found here:
* mainnet: (14MB):
https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
* testnet: (25MB):
https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0


We look forward to getting feedback from all of y'all!

-- Laolu


[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/

-- Laolu
Eric Lombrozo via bitcoin-dev
2017-06-01 21:00:02 UTC
Permalink
Raw Message
Thanks for sending this proposal! I look forward to having a great
discussion around this.

- Eric

On Thursday, June 1, 2017, Olaoluwa Osuntokun via bitcoin-dev <
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
* https://github.com/Roasbeef/bips/blob/master/gcs_light_
client.mediawiki
This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508
In order to consider the average+median filter sizes in a world worth
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Finally, here are the testnet stats which take into account the increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).
* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631
For those that are interested in the raw data, I've uploaded a CSV file
* mainnet: (14MB): https://www.dropbox.com/s/
4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
* testnet: (25MB): https://www.dropbox.com/s/
w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0
We look forward to getting feedback from all of y'all!
-- Laolu
[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/
-- Laolu
Matt Corallo via bitcoin-dev
2017-06-01 21:33:43 UTC
Permalink
Raw Message
Quick comment before I finish reading it completely, looks like you have no way to match the input prevouts being spent, which is rather nice from a "watch for this output being spent" pov.
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary
motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
tweaking the false positive rate in addition to the following
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct
encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508
In order to consider the average+median filter sizes in a world worth
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Finally, here are the testnet stats which take into account the
increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).
* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631
For those that are interested in the raw data, I've uploaded a CSV file
https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0
We look forward to getting feedback from all of y'all!
-- Laolu
[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/
-- Laolu
Olaoluwa Osuntokun via bitcoin-dev
2017-06-01 22:10:34 UTC
Permalink
Raw Message
Post by Eric Lombrozo via bitcoin-dev
Thanks for sending this proposal! I look forward to having a great
discussion around this.
Thanks Eric! We really appreciated the early feedback you gave on the
initial design.

One aspect which isn't in this BIP draft is direct support for unconfirmed
transactions. I consider such a feature an important UX feature for mobile
phones, and something which I've personally seen as an important
UX-experience when on-boarding new users to Bitcoin. This was brought up
in the original "bfd" mailing list chain [1]. Possible solutions are: a
new beefier INV message which contains enough information to be able to
identify relevant outputs created in a transaction, or a "streaming" p2p
extension that allows light clients to receive notifications of mempool
inclusion based on only (pkScript, amount) pairs.
Post by Eric Lombrozo via bitcoin-dev
looks like you have no way to match the input prevouts being spent, which
is rather nice from a "watch for this output being spent" pov.
Perhaps we didn't make this clear enough, but it _is_ indeed possible to
watch an output for spentness. Or maybe you mean matching on the
_script_ being spent?
Post by Eric Lombrozo via bitcoin-dev
* The outpoints of each input within a transaction.
...
Within the integration for lnd, we specifically use this feature to be
able to watch for when channels have been closed within the network graph,
or channels _directly_ under our control have been spent (either
unilateral channel closure, or a revocation beach).

-- Laolu

[1]:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-January/013397.html
Post by Eric Lombrozo via bitcoin-dev
Quick comment before I finish reading it completely, looks like you have
no way to match the input prevouts being spent, which is rather nice from a
"watch for this output being spent" pov.
On June 1, 2017 3:01:14 PM EDT, Olaoluwa Osuntokun via bitcoin-dev <
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary
motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
tweaking the false positive rate in addition to the following
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413 <(385)%20886-3413>
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508
In order to consider the average+median filter sizes in a world worth
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Finally, here are the testnet stats which take into account the increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).
* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631
For those that are interested in the raw data, I've uploaded a CSV file
https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0
We look forward to getting feedback from all of y'all!
-- Laolu
[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/
-- Laolu
Chris via bitcoin-dev
2017-06-02 02:15:16 UTC
Permalink
Raw Message
Post by Olaoluwa Osuntokun via bitcoin-dev
One aspect which isn't in this BIP draft is direct support for unconfirmed
transactions. I consider such a feature an important UX feature for mobile
phones, and something which I've personally seen as an important
UX-experience when on-boarding new users to Bitcoin.
Totally agree. My first thought is maybe you could keep bip37 filtering
as optional for unconfirmed transactions. Since you're only interested
in incoming transactions in this case you can create one big filter with
all your wallet's addresses and reuse that filter. The bip37 privacy
issues mainly come up when trying to get the filter to match both
incoming and outgoing transactions, which is not needed in this case.

Otoh, if you download the block from the same peer that you gave a bip37
filter then they could probably test the txs in the block against both
filters. :/
Gregory Maxwell via bitcoin-dev
2017-06-02 02:28:54 UTC
Permalink
Raw Message
On Fri, Jun 2, 2017 at 2:15 AM, Chris via bitcoin-dev
Post by Chris via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
One aspect which isn't in this BIP draft is direct support for unconfirmed
transactions. I consider such a feature an important UX feature for mobile
phones, and something which I've personally seen as an important
UX-experience when on-boarding new users to Bitcoin.
Totally agree. My first thought is maybe you could keep bip37 filtering
as optional for unconfirmed transactions. Since you're only interested
Really bad for privacy. Data for transactions at the tip is only
14kb/s-- potentially less if segwit is in use and you're not getting
witnesses. Is that really that burdensome?

FWIW, leaving a mobile browser just running while pointed at some
websites seems to use more traffic than that just loading advertising.
:)
Alex Akselrod via bitcoin-dev
2017-06-02 03:35:29 UTC
Permalink
Raw Message
I agree with Greg and Laolu; BIP-37 filtering for transactions is no better
than for blocks and completely destroys privacy.

A constant stream of transactions is OK, but even cheaper for light clients
would be Laolu's proposal of streaming more tx data than existing inv
messages but less than existing tx messages.

We could make a bit field of things to include in every inv-with-metadata
message, such as:
- witness data
- scriptSig data pushes
- scriptPubKey
- hash of scriptPubKey (unnecessary if full scriptPubKey is sent)
- scriptPubKey data pushes
- etc.

This way a full node might be able to tell what application (or type of
application) a light client is running, but not the client's addresses or
outputs, except maybe when the client originates transactions.

On Thu, Jun 1, 2017 at 10:28 PM, Gregory Maxwell via bitcoin-dev <
Post by Gregory Maxwell via bitcoin-dev
On Fri, Jun 2, 2017 at 2:15 AM, Chris via bitcoin-dev
Post by Chris via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
One aspect which isn't in this BIP draft is direct support for
unconfirmed
Post by Chris via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
transactions. I consider such a feature an important UX feature for
mobile
Post by Chris via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
phones, and something which I've personally seen as an important
UX-experience when on-boarding new users to Bitcoin.
Totally agree. My first thought is maybe you could keep bip37 filtering
as optional for unconfirmed transactions. Since you're only interested
Really bad for privacy. Data for transactions at the tip is only
14kb/s-- potentially less if segwit is in use and you're not getting
witnesses. Is that really that burdensome?
FWIW, leaving a mobile browser just running while pointed at some
websites seems to use more traffic than that just loading advertising.
:)
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Chris Pacia via bitcoin-dev
2017-06-02 16:07:17 UTC
Permalink
Raw Message
On Jun 1, 2017 10:28 PM, "Gregory Maxwell" <***@xiph.org> wrote:


Really bad for privacy.


As I mentioned the issue is the potential collision with the block filter,
but bloom filters by themselves aren't inherently bad for privacy. They
just reduce the anonymity set. The reason bip37 doesn't work for privacy is
how the filters have to be used/abused to make it work. That's not what I
mentioned above.

Data for transactions at the tip is only

14kb/s-- potentially less if segwit is in use and you're not getting
witnesses. Is that really that burdensome?


Your not running that on mobile.
Olaoluwa Osuntokun via bitcoin-dev
2017-06-02 04:49:16 UTC
Permalink
Raw Message
Post by Olaoluwa Osuntokun via bitcoin-dev
In order to consider the average+median filter sizes in a world worth
larger
Post by Olaoluwa Osuntokun via bitcoin-dev
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Oops, realized I made a mistake. These are the stats for Feb 2016 until
about a
month ago (since height 400k iirc).

-- Laolu
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413 <(385)%20886-3413>
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508
In order to consider the average+median filter sizes in a world worth
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Finally, here are the testnet stats which take into account the increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).
* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631
For those that are interested in the raw data, I've uploaded a CSV file
https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0
We look forward to getting feedback from all of y'all!
-- Laolu
[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/
-- Laolu
Olaoluwa Osuntokun via bitcoin-dev
2017-06-09 03:59:17 UTC
Permalink
Raw Message
Hi y'all,

Thanks for all the comments so far!

I've pushed a series of updates to the text of the BIP repo linked in the
OP.
The fixes include: typos, components of the specification which were
incorrect
(N is the total number of items, NOT the number of txns in the block), and a
few sections have been clarified.

The latest version also includes a set of test vectors (as CSV files), which
for a series of fp rates (1/2 to 1/2^32) includes (for 6 testnet blocks,
one of
which generates a "null" filter):

* The block height
* The block hash
* The raw block itself
* The previous basic+extended filter header
* The basic+extended filter header for the block
* The basic+extended filter for the block

The size of the test vectors was too large to include in-line within the
document, so we put them temporarily in a distinct folder [1]. The code
used to
generate the test vectors has also been included.

-- Laolu

[1]: https://github.com/Roasbeef/bips/tree/master/gcs_light_client
Post by Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
In order to consider the average+median filter sizes in a world worth
larger
Post by Olaoluwa Osuntokun via bitcoin-dev
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Oops, realized I made a mistake. These are the stats for Feb 2016 until
about a
month ago (since height 400k iirc).
-- Laolu
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
This BIP proposal describes a concrete specification (along with a
reference implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode that uses
client-side filtering based off of Golomb-Rice coded sets. Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches. The cool part is that blocks can be fetched from _any_
source, once the light client deems it necessary. Our primary motivation
for this work was enabling a light client mode for lnd[4] in order to
support a more light-weight back end paving the way for the usage of
Lightning on mobile phones and other devices. We've integrated neutrino
as a back end for lnd, and will be making the updated code public very
soon.
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
Using a fixed fp=20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using the
filter contents as currently described in the BIP (basic + extended), the
* total size: 6976047156
* total avg: 14997.220622758816
* total median: 3801
* total max: 79155
* regular size: 3117183743
* regular avg: 6701.372750217131
* regular median: 1734
* regular max: 67533
* extended size: 3858863413 <(385)%20886-3413>
* extended avg: 8295.847872541684
* extended median: 2041
* extended max: 52508
In order to consider the average+median filter sizes in a world worth
* total size: 2753238530
* total avg: 5918.95736054141
* total median: 60202
* total max: 74983
* regular size: 1165148878
* regular avg: 2504.856172982827
* regular median: 24812
* regular max: 64554
* extended size: 1588089652
* extended avg: 3414.1011875585823
* extended median: 35260
* extended max: 41731
Finally, here are the testnet stats which take into account the increase
in the maximum filter size due to segwit's block-size increase. The max
filter sizes are a bit larger due to some of the habitual blocks I
created last year when testing segwit (transactions with 30k inputs, 30k
outputs, etc).
* total size: 585087597
* total avg: 520.8839608674402
* total median: 20
* total max: 164598
* regular size: 299325029
* regular avg: 266.4790836307566
* regular median: 13
* regular max: 164583
* extended size: 285762568
* extended avg: 254.4048772366836
* extended median: 7
* extended max: 127631
For those that are interested in the raw data, I've uploaded a CSV file
https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0
https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0
We look forward to getting feedback from all of y'all!
-- Laolu
[1]: https://github.com/lightninglabs/neutrino
[2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf
[3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
[4]: https://github.com/lightningnetwork/lnd/
-- Laolu
Karl Johan Alm via bitcoin-dev
2017-06-02 06:00:30 UTC
Permalink
Raw Message
Hello,

Really wish I'd known you were working on this a few weeks ago, but
such is life. Hopefully I can provide some useful feedback.

On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches.
Is it necessary to maintain the index all the way to the beginning of
the chain? When would clients request "really old digests" and why?
Post by Olaoluwa Osuntokun via bitcoin-dev
One specific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positive rate,
our proposal uses a _fixed_ false-positive. Within the document, it's
currently specified as P = 1/2^20. We've done a bit of analysis and
filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex
has made a JS calculator that allows y'all to explore the affect of
the number of items the wallet is scanning for, the size of the blocks,
number of blocks fetched, and the size of the filters themselves. The
calculator calculates the expected bandwidth utilization using the CDF of
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
I haven't tried the tool yet, and maybe it will answer some of my questions.

On what data were the simulated wallets on actual data based? How did
false positive rates for wallets with lots of items (pubkeys etc) play
out? Is there a maximum number of items for a wallet before it becomes
too bandwidth costly to use digests?
Post by Olaoluwa Osuntokun via bitcoin-dev
We we're excited to see that Karl Johan Alm (kallewoof) has done some
(rather extensive!) analysis of his own, focusing on a distinct encoding
type [5]. I haven't had the time yet to dig into his report yet, but I
think I've read enough to extract the key difference in our encodings: his
filters use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (we use
siphash) of the filter items.
I will definitely try to reproduce my experiments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks
are empty).

On the BIP proposal itself:

In Compact Filter Header Chain, you mention that clients should
download filters from nodes if filter_headers is not identical, and
ban offending nodes. What about temporary forks in the chain? What
about longer forks? In general, I am curious how you will deal with
reorgs and temporary non-consensus related chain splits.

I am also curious if you have considered digests containing multiple
blocks. Retaining a permanent binsearchable record of the entire chain
is obviously too space costly, but keeping the last X blocks as
binsearchable could speed up syncing for clients tremendously, I feel.

It may also be space efficient to ONLY store older digests in chunks
of e.g. 8 blocks. A client syncing up finding a match in an 8-block
chunk would have to grab those 8 blocks, but if it's not recent, that
may be acceptable. It may even be possible to make 4-, 2-, 1-block
digests on demand.

How fast are these to create? Would it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?
Alex Akselrod via bitcoin-dev
2017-06-02 17:55:31 UTC
Permalink
Raw Message
On Jun 2, 2017 8:09 AM, "Karl Johan Alm via bitcoin-dev" <
bitcoin-***@lists.linuxfoundation.org> wrote:

Hello,

Really wish I'd known you were working on this a few weeks ago, but
such is life. Hopefully I can provide some useful feedback.


Your feedback is greatly appreciated!


On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
Full-nodes
maintain an additional index of the chain, and serve this compact filter
(the index) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant
item matches.
Is it necessary to maintain the index all the way to the beginning of
the chain? When would clients request "really old digests" and why?


Without a soft fork, this is the only way for light clients to verify that
peers aren't lying to them. Clients can request headers (just hashes of the
filters and the previous headers, creating a chain) and look for conflicts
between peers. If a conflict is found at a certain block, the client can
download the block, generate a filter, calculate the header by hashing
together the previous header and the generated filter, and banning any
peers that don't match. A full node could prune old filters if you wanted
and recalculate them as necessary if you just keep the filter header chain
info as really old filters are unlikely to be requested by correctly
written software but you can't guarantee every client will follow best
practices either.
Post by Olaoluwa Osuntokun via bitcoin-dev
https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
script he's been running on actual data, and the results seem to match up
rather nicely.
I haven't tried the tool yet, and maybe it will answer some of my questions.

On what data were the simulated wallets on actual data based? How did
false positive rates for wallets with lots of items (pubkeys etc) play
out? Is there a maximum number of items for a wallet before it becomes
too bandwidth costly to use digests?


The simulations are based on completely random data within given
parameters. For example, it will generate a wallet of a specified size and
generate blocks of specified size with specified number of transactions of
specified format, all guaranteed to not match the wallet. It then tries to
match the wallet and tracks the filter size and the bandwidth used by block
downloads which are all due to false positives. The maximum wallet size can
be millions or more of addresses and outpoints before the filter isn't
worth it.

I published the simulation code at
https://gist.github.com/aakselrod/0ee665205f7c9538c2339876b0424b26 but the
calculation code gives you the same results (on average but very close with
a big enough sample size) much faster.

I will definitely try to reproduce my experiments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks
are empty).


Filters for empty blocks only take a few bytes and sometimes zero when the
coinbase output is a burn that doesn't push any data (example will be in
the test vectors that I'll have ready shortly).

On the BIP proposal itself:

In Compact Filter Header Chain, you mention that clients should
download filters from nodes if filter_headers is not identical, and
ban offending nodes. What about temporary forks in the chain? What
about longer forks? In general, I am curious how you will deal with
reorgs and temporary non-consensus related chain splits.


The cfheaders messages give you the hash of the final block for which
there's a header in the message. This means you can ignore the message as
necessary rather than ban the peer, or track cfheaders for multiple forks
if desired.


I am also curious if you have considered digests containing multiple
blocks. Retaining a permanent binsearchable record of the entire chain
is obviously too space costly, but keeping the last X blocks as
binsearchable could speed up syncing for clients tremendously, I feel.


We hadn't (or I hadn't) until we read your recent post/paper and are
considering it now.


It may also be space efficient to ONLY store older digests in chunks
of e.g. 8 blocks. A client syncing up finding a match in an 8-block
chunk would have to grab those 8 blocks, but if it's not recent, that
may be acceptable. It may even be possible to make 4-, 2-, 1-block
digests on demand.


This is also something we (or at least I) hadn't considered before your
recent post. We have been working on this for a few months now so didn't
have time to work on trying out and possibly incorporating the idea before
release.

How fast are these to create? Would it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?


They're pretty fast and can be pruned if desired, as mentioned above, as
long as the header chain is kept.
Karl Johan Alm via bitcoin-dev
2017-06-05 02:06:32 UTC
Permalink
Raw Message
On Sat, Jun 3, 2017 at 2:55 AM, Alex Akselrod via bitcoin-dev
Post by Alex Akselrod via bitcoin-dev
Without a soft fork, this is the only way for light clients to verify that
peers aren't lying to them. Clients can request headers (just hashes of the
filters and the previous headers, creating a chain) and look for conflicts
between peers. If a conflict is found at a certain block, the client can
download the block, generate a filter, calculate the header by hashing
together the previous header and the generated filter, and banning any peers
that don't match. A full node could prune old filters if you wanted and
recalculate them as necessary if you just keep the filter header chain info
as really old filters are unlikely to be requested by correctly written
software but you can't guarantee every client will follow best practices
either.
Ahh, so you actually make a separate digest chain with prev hashes and
everything. Once/if committed digests are soft forked in, it seems a
bit overkill but maybe it's worth it. (I was always assuming committed
digests in coinbase would come after people started using this, and
that people could just ask a couple of random peers for the digest
hash and ensure everyone gave the same answer as the hash of the
downloaded digest..).
Post by Alex Akselrod via bitcoin-dev
The simulations are based on completely random data within given parameters.
I noticed an increase in FP hits when using real data sampled from
real scriptPubKeys and such. Address reuse and other weird stuff. See
"lies.h" in github repo for experiments and chainsim.c initial part of
main where wallets get random stuff from the chain.
Post by Alex Akselrod via bitcoin-dev
I will definitely try to reproduce my experiments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks
are empty).
Filters for empty blocks only take a few bytes and sometimes zero when the
coinbase output is a burn that doesn't push any data (example will be in the
test vectors that I'll have ready shortly).
I created digests for all blocks up until block #469805 and actually
ended up with 5.8 GB, which is 1.1 GB lower than what you have, but
may be worse perf-wise on false positive rates and such.
Post by Alex Akselrod via bitcoin-dev
How fast are these to create? Would it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?
They're pretty fast and can be pruned if desired, as mentioned above, as
long as the header chain is kept.
For comparison, creating the digests above (469805 of them) took
roughly 30 mins on my end, but using the kstats format so probably
higher on an actual node (should get around to profiling that...).
Olaoluwa Osuntokun via bitcoin-dev
2017-06-09 03:03:51 UTC
Permalink
Raw Message
Post by Karl Johan Alm via bitcoin-dev
I am also curious if you have considered digests containing multiple
blocks. Retaining a permanent binsearchable record of the entire chain is
obviously too space costly, but keeping the last X blocks as binsearchable
could speed up syncing for clients tremendously, I feel.
Originally we hadn't considered such an idea. Grasping the concept a bit
better, I can see how that may result in considerable bandwidth savings
(for purely negative queries) for clients doing a historical sync, or
catching up to the chain after being inactive for months/weeks.

If we were to purse tacking this approach onto the current BIP proposal,
we could do it in the following way:

* The `getcfilter` message gains an additional "Level" field. Using
this field, the range of blocks to be included in the returned filter
would be Level^2. So a level of 0 is just the single filter, 3 is 8
blocks past the block hash etc.

* Similarly, the `getcfheaders` message would also gain a similar field
with identical semantics. In this case each "level" would have a
distinct header chain for clients to verify.
Post by Karl Johan Alm via bitcoin-dev
How fast are these to create? Would it make sense to provide digests on
demand in some cases, rather than keeping them around indefinitely?
For larger blocks (like the one referenced at the end of this mail) full
construction of the regular filter takes ~10-20ms (most of this spent
extracting the data pushes). With smaller blocks, it quickly dips down to
the nano to micro second range.

Whether to keep _all_ the filters on disk, or to dynamically re-generate a
particular range (possibly most of the historical data) is an
implementation detail. Nodes that already do block pruning could discard
very old filters once the header chain is constructed allowing them to
save additional space, as it's unlikely most clients would care about the
first 300k or so blocks.
Post by Karl Johan Alm via bitcoin-dev
Ahh, so you actually make a separate digest chain with prev hashes and
everything. Once/if committed digests are soft forked in, it seems a bit
overkill but maybe it's worth it.
Yep, this is only a hold-over until when/if a commitment to the filter is
soft-forked in. In that case, there could be some extension message to
fetch the filter hash for a particular block, along with a merkle proof of
the coinbase transaction to the merkle root in the header.
Post by Karl Johan Alm via bitcoin-dev
I created digests for all blocks up until block #469805 and actually ended
up with 5.8 GB, which is 1.1 GB lower than what you have, but may be worse
perf-wise on false positive rates and such.
Interesting, are you creating the equivalent of both our "regular" and
"extended" filters? Each of the filter types consume about ~3.5GB in
isolation, with the extended filter type on average consuming more bytes
due to the fact that it includes sigScript/witness data as well.

It's worth noting that those numbers includes the fixed 4-byte value for
"N" that's prepended to each filter once it's serialized (though that
doesn't add a considerable amount of overhead). Alex and I were
considering instead using Bitcoin's var-int encoding for that number
instead. This would result in using a single byte for empty filters, 1
byte for most filters (< 2^16 items), and 3 bytes for the remainder of the
cases.
Post by Karl Johan Alm via bitcoin-dev
For comparison, creating the digests above (469805 of them) took
roughly 30 mins on my end, but using the kstats format so probably
higher on an actual node (should get around to profiling that...).
Does that include the time required to read the blocks from disk? Or just
the CPU computation of constructing the filters? I haven't yet kicked off
a full re-index of the filters, but for reference this block[1] on testnet
takes ~18ms for the _full_ indexing routine with our current code+spec.

[1]: 000000000000052184fbe86eff349e31703e4f109b52c7e6fa105cd1588ab6aa

-- Laolu


On Sun, Jun 4, 2017 at 7:18 PM Karl Johan Alm via bitcoin-dev <
Post by Karl Johan Alm via bitcoin-dev
On Sat, Jun 3, 2017 at 2:55 AM, Alex Akselrod via bitcoin-dev
Post by Alex Akselrod via bitcoin-dev
Without a soft fork, this is the only way for light clients to verify
that
Post by Alex Akselrod via bitcoin-dev
peers aren't lying to them. Clients can request headers (just hashes of
the
Post by Alex Akselrod via bitcoin-dev
filters and the previous headers, creating a chain) and look for
conflicts
Post by Alex Akselrod via bitcoin-dev
between peers. If a conflict is found at a certain block, the client can
download the block, generate a filter, calculate the header by hashing
together the previous header and the generated filter, and banning any
peers
Post by Alex Akselrod via bitcoin-dev
that don't match. A full node could prune old filters if you wanted and
recalculate them as necessary if you just keep the filter header chain
info
Post by Alex Akselrod via bitcoin-dev
as really old filters are unlikely to be requested by correctly written
software but you can't guarantee every client will follow best practices
either.
Ahh, so you actually make a separate digest chain with prev hashes and
everything. Once/if committed digests are soft forked in, it seems a
bit overkill but maybe it's worth it. (I was always assuming committed
digests in coinbase would come after people started using this, and
that people could just ask a couple of random peers for the digest
hash and ensure everyone gave the same answer as the hash of the
downloaded digest..).
Post by Alex Akselrod via bitcoin-dev
The simulations are based on completely random data within given
parameters.
I noticed an increase in FP hits when using real data sampled from
real scriptPubKeys and such. Address reuse and other weird stuff. See
"lies.h" in github repo for experiments and chainsim.c initial part of
main where wallets get random stuff from the chain.
Post by Alex Akselrod via bitcoin-dev
I will definitely try to reproduce my experiments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks
are empty).
Filters for empty blocks only take a few bytes and sometimes zero when
the
Post by Alex Akselrod via bitcoin-dev
coinbase output is a burn that doesn't push any data (example will be in
the
Post by Alex Akselrod via bitcoin-dev
test vectors that I'll have ready shortly).
I created digests for all blocks up until block #469805 and actually
ended up with 5.8 GB, which is 1.1 GB lower than what you have, but
may be worse perf-wise on false positive rates and such.
Post by Alex Akselrod via bitcoin-dev
How fast are these to create? Would it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?
They're pretty fast and can be pruned if desired, as mentioned above, as
long as the header chain is kept.
For comparison, creating the digests above (469805 of them) took
roughly 30 mins on my end, but using the kstats format so probably
higher on an actual node (should get around to profiling that...).
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Gregory Maxwell via bitcoin-dev
2017-06-07 21:41:36 UTC
Permalink
Raw Message
On Thu, Jun 1, 2017 at 7:01 PM, Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
I see the inner loop of construction and lookup are free of
non-constant divmod. This will result in implementations being
needlessly slow (especially on arm, but even on modern x86_64 a
division is a 90 cycle-ish affair.)

I believe this can be fixed by using this approach
http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
which has the same non-uniformity as mod but needs only a multiply
and shift.

Otherwise fast implementations will have to implement the code to
compute bit twiddling hack exact division code, which is kind of
complicated. (e.g. via the technique in "{N}-bit Unsigned Division via
{N}-bit Multiply-Add" by Arch D. Robison).

Shouldn't all cases in your spec where you have N=transactions be
n=indexed-outputs? Otherwise, I think your golomb parameter and false
positive rate are wrong.
Olaoluwa Osuntokun via bitcoin-dev
2017-06-09 03:42:58 UTC
Permalink
Raw Message
Post by Gregory Maxwell via bitcoin-dev
I see the inner loop of construction and lookup are free of
non-constant divmod. This will result in implementations being
needlessly slow
Ahh, sipa brought this up other day, but I thought he was referring to the
coding loop (which uses a power of 2 divisor/modulus), not the
siphash-then-reduce loop.
Post by Gregory Maxwell via bitcoin-dev
I believe this can be fixed by using this approach
http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Post by Gregory Maxwell via bitcoin-dev
which has the same non-uniformity as mod but needs only a multiply and
shift.
Very cool, I wasn't aware of the existence of such a mapping.

Correct me if I'm wrong, but from my interpretation we can't use that
method as described as we need to output 64-bit integers rather than
32-bit integers. A range of 32-bits would be constrain the number of items
we could encode to be ~4096 to ensure that we don't overflow with fp
values such as 20 (which we currently use in our code).

If filter commitment are to be considered for a soft-fork in the future,
then we should definitely optimize the construction of the filters as much
as possible! I'll look into that paper you referenced to get a feel for
just how complex the optimization would be.
Post by Gregory Maxwell via bitcoin-dev
Shouldn't all cases in your spec where you have N=transactions be
n=indexed-outputs? Otherwise, I think your golomb parameter and false
positive rate are wrong.
Yep! Nice catch. Our code is correct, but mistake in the spec was an
oversight on my part. I've pushed a commit[1] to the bip repo referenced
in the OP to fix this error.

I've also pushed another commit to explicitly take advantage of the fact
that P is a power-of-two within the coding loop [2].

-- Laolu

[1]:
https://github.com/Roasbeef/bips/commit/bc5c6d6797f3df1c4a44213963ba12e72122163d
[2]:
https://github.com/Roasbeef/bips/commit/578a4e3aa8ec04524c83bfc5d14be1b2660e7f7a
Post by Gregory Maxwell via bitcoin-dev
On Thu, Jun 1, 2017 at 7:01 PM, Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
I see the inner loop of construction and lookup are free of
non-constant divmod. This will result in implementations being
needlessly slow (especially on arm, but even on modern x86_64 a
division is a 90 cycle-ish affair.)
I believe this can be fixed by using this approach
http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
which has the same non-uniformity as mod but needs only a multiply
and shift.
Otherwise fast implementations will have to implement the code to
compute bit twiddling hack exact division code, which is kind of
complicated. (e.g. via the technique in "{N}-bit Unsigned Division via
{N}-bit Multiply-Add" by Arch D. Robison).
Shouldn't all cases in your spec where you have N=transactions be
n=indexed-outputs? Otherwise, I think your golomb parameter and false
positive rate are wrong.
Olaoluwa Osuntokun via bitcoin-dev
2017-06-09 04:47:19 UTC
Permalink
Raw Message
Post by Olaoluwa Osuntokun via bitcoin-dev
Correct me if I'm wrong, but from my interpretation we can't use that
method as described as we need to output 64-bit integers rather than
32-bit integers.
Had a chat with gmax off-list and came to the realization that the method
_should_ indeed generalize to our case of outputting 64-bit integers.
We'll need to do a bit of bit twiddling to make it work properly. I'll
modify our implementation and report back with some basic benchmarks.

-- Laolu
Post by Olaoluwa Osuntokun via bitcoin-dev
Post by Gregory Maxwell via bitcoin-dev
I see the inner loop of construction and lookup are free of
non-constant divmod. This will result in implementations being
needlessly slow
Ahh, sipa brought this up other day, but I thought he was referring to the
coding loop (which uses a power of 2 divisor/modulus), not the
siphash-then-reduce loop.
Post by Gregory Maxwell via bitcoin-dev
I believe this can be fixed by using this approach
http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Post by Gregory Maxwell via bitcoin-dev
which has the same non-uniformity as mod but needs only a multiply and
shift.
Very cool, I wasn't aware of the existence of such a mapping.
Correct me if I'm wrong, but from my interpretation we can't use that
method as described as we need to output 64-bit integers rather than
32-bit integers. A range of 32-bits would be constrain the number of items
we could encode to be ~4096 to ensure that we don't overflow with fp
values such as 20 (which we currently use in our code).
If filter commitment are to be considered for a soft-fork in the future,
then we should definitely optimize the construction of the filters as much
as possible! I'll look into that paper you referenced to get a feel for
just how complex the optimization would be.
Post by Gregory Maxwell via bitcoin-dev
Shouldn't all cases in your spec where you have N=transactions be
n=indexed-outputs? Otherwise, I think your golomb parameter and false
positive rate are wrong.
Yep! Nice catch. Our code is correct, but mistake in the spec was an
oversight on my part. I've pushed a commit[1] to the bip repo referenced
in the OP to fix this error.
I've also pushed another commit to explicitly take advantage of the fact
that P is a power-of-two within the coding loop [2].
-- Laolu
https://github.com/Roasbeef/bips/commit/bc5c6d6797f3df1c4a44213963ba12e72122163d
https://github.com/Roasbeef/bips/commit/578a4e3aa8ec04524c83bfc5d14be1b2660e7f7a
Post by Gregory Maxwell via bitcoin-dev
On Thu, Jun 1, 2017 at 7:01 PM, Olaoluwa Osuntokun via bitcoin-dev
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
I see the inner loop of construction and lookup are free of
non-constant divmod. This will result in implementations being
needlessly slow (especially on arm, but even on modern x86_64 a
division is a 90 cycle-ish affair.)
I believe this can be fixed by using this approach
http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
which has the same non-uniformity as mod but needs only a multiply
and shift.
Otherwise fast implementations will have to implement the code to
compute bit twiddling hack exact division code, which is kind of
complicated. (e.g. via the technique in "{N}-bit Unsigned Division via
{N}-bit Multiply-Add" by Arch D. Robison).
Shouldn't all cases in your spec where you have N=transactions be
n=indexed-outputs? Otherwise, I think your golomb parameter and false
positive rate are wrong.
Tomas via bitcoin-dev
2017-06-08 09:50:08 UTC
Permalink
Raw Message
On Thu, Jun 1, 2017, at 21:01, Olaoluwa Osuntokun via bitcoin-dev wrote:> Hi y'all,
Post by Olaoluwa Osuntokun via bitcoin-dev
Alex Akselrod and I would like to propose a new light client BIP for
* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki>
Very interesting.

I would like to consider how this compares to another light client type
with rather different security characteristics where each client would
receive for each transaction in each block,
* The TXID (uncompressed)
* The spent outpoints (with TXIDs compressed)
* The pubkey hash (compressed to reasonable amount of false positives)

A rough estimate would indicate this to be about 2-2.5x as big per block
as your proposal, but comes with rather different security
characteristics, and would not require download since genesis.
The client could verify the TXIDs against the merkle root with a much
stronger (PoW) guarantee compared to the guarantee based on the
assumption of peers being distinct, which your proposal seems to make.
Like your proposal this removes the privacy and processing issues from
server-side filtering, but unlike your proposal retrieval of all txids
in each block can also serve for a basis of fraud proofs and
(disprovable) fraud hints, without resorting to full block downloads.
I don't completely understand the benefit of making the outpoints and
pubkey hashes (weakly) verifiable. These only serve as notifications and
therefore do not seem to introduce an attack vector. Omitting data is
always possible, so receiving data is a prerequisite for verification,
not an assumption that can be made. How could an attacker benefit from
"hiding notifications"?
I think client-side filtering is definitely an important route to
take, but is it worth compressing away the information to verify the
merkle root?
Regards,
Tomas van der Wansem
bitcrust
Olaoluwa Osuntokun via bitcoin-dev
2017-06-09 03:50:37 UTC
Permalink
Raw Message
Post by Tomas via bitcoin-dev
A rough estimate would indicate this to be about 2-2.5x as big per block
as your proposal, but comes with rather different security
characteristics, and would not require download since genesis.
Our proposal _doesnt_ require downloading from genesis, if by
"downloading" you mean downloading all the blocks. Clients only need to
sync the block+filter headers, then (if they don't care about historical
blocks), will download filters from their "birthday" onwards.
Post by Tomas via bitcoin-dev
The client could verify the TXIDs against the merkle root with a much
stronger (PoW) guarantee compared to the guarantee based on the assumption
of peers being distinct, which your proposal seems to make
Our proposal only makes a "one honest peer" assumption, which is the same
as any other operating mode. Also as client still download all the
headers, they're able to verify PoW conformance/work as normal.
Post by Tomas via bitcoin-dev
I don't completely understand the benefit of making the outpoints and
pubkey hashes (weakly) verifiable. These only serve as notifications and
therefore do not seem to introduce an attack vector.
Not sure what you mean by this. Care to elaborate?
Post by Tomas via bitcoin-dev
I think client-side filtering is definitely an important route to take,
but is it worth compressing away the information to verify the merkle
root?
That's not the case with our proposal. Clients get the _entire_ block (if
they need it), so they can verify the merkle root as normal. Unless one of
us is misinterpreting the other here.

-- Laolu


On Thu, Jun 8, 2017 at 6:34 AM Tomas via bitcoin-dev <
Post by Tomas via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
Very interesting.
I would like to consider how this compares to another light client type
with rather different security characteristics where each client would
receive for each transaction in each block,
* The TXID (uncompressed)
* The spent outpoints (with TXIDs compressed)
* The pubkey hash (compressed to reasonable amount of false positives)
A rough estimate would indicate this to be about 2-2.5x as big per block
as your proposal, but comes with rather different security characteristics,
and would not require download since genesis.
The client could verify the TXIDs against the merkle root with a much
stronger (PoW) guarantee compared to the guarantee based on the assumption
of peers being distinct, which your proposal seems to make. Like your
proposal this removes the privacy and processing issues from server-side
filtering, but unlike your proposal retrieval of all txids in each block
can also serve for a basis of fraud proofs and (disprovable) fraud hints,
without resorting to full block downloads.
I don't completely understand the benefit of making the outpoints and
pubkey hashes (weakly) verifiable. These only serve as notifications and
therefore do not seem to introduce an attack vector. Omitting data is
always possible, so receiving data is a prerequisite for verification, not
an assumption that can be made. How could an attacker benefit from "hiding
notifications"?
I think client-side filtering is definitely an important route to take,
but is it worth compressing away the information to verify the merkle root?
Regards,
Tomas van der Wansem
bitcrust
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Tomas via bitcoin-dev
2017-06-09 08:26:52 UTC
Permalink
Raw Message
Post by Tomas via bitcoin-dev
I don't completely understand the benefit of making the outpoints and
pubkey hashes (weakly) verifiable. These only serve as notifications and
therefore do not seem to introduce an attack vector.
Not sure what you mean by this. Care to elaborate? 
Additionally, Full nodes can nearly undetectably lie by omission causing a denial of service which can
lead to undesirable failure modes in applications whose safety
critically relies on responding to certain
on-chain events.

I understand that the compact header chain is used to mitigate against
this, but I am unsure about the use
cases and trade-offs.

For a normal wallet, the only thing I can imagine an attacker could do
is pretending a transaction did not confirm
yet, causing nuisance.

An application critically depending on knowing what happens on-chain
surely is better off downloading
the TXIDs, providing PoW security? Gaining knowledge of incoming TXIDs
is nicely solved the payment protocol.

Are there enough use cases that critically depend on pub key hashes
being used on-chain, to make the compact header chain worth its costs?

Regards,
Tomas
Andreas Schildbach via bitcoin-dev
2017-06-19 11:58:18 UTC
Permalink
Raw Message
I'm not sure if this has been brought up elsewhere in this thread.

This proposal doesn't seem to be a complete replacement of BIP37: It
doesn't provide a filter for unconfirmed transactions like BIP37 does.

That means that most light clients will continue to use BIP37 even if
they may use this BIP as a supplement. Otherwise users would not get
timely notification of incoming payments any more.
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
[...]
bfd--- via bitcoin-dev
2017-06-19 12:26:46 UTC
Permalink
Raw Message
Several times. It's been debated if unconfirmed transactions are
necessary, methods of doing more private filtering have been suggested,
along with simply not filtering unconfirmed transactions at all. My
collected data suggests that there is very little use of BIP37 at
present, based on incoming connections to nodes I know end up in the DNS
seed responses (no "SPV" clients do their own peer management).
Post by Andreas Schildbach via bitcoin-dev
I'm not sure if this has been brought up elsewhere in this thread.
This proposal doesn't seem to be a complete replacement of BIP37: It
doesn't provide a filter for unconfirmed transactions like BIP37 does.
That means that most light clients will continue to use BIP37 even if
they may use this BIP as a supplement. Otherwise users would not get
timely notification of incoming payments any more.
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
[...]
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Tom Zander via bitcoin-dev
2017-06-19 15:15:10 UTC
Permalink
Raw Message
It's been debated if [filtering of] unconfirmed transactions are
necessary,
Why would it not be needed? Any SPV client (when used as a payment-receiver)
requires this from a simple usability point of view.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Jonas Schnelli via bitcoin-dev
2017-06-19 15:49:59 UTC
Permalink
Raw Message
Hi
Post by Tom Zander via bitcoin-dev
It's been debated if [filtering of] unconfirmed transactions are
necessary,
Why would it not be needed? Any SPV client (when used as a payment-receiver)
requires this from a simple usability point of view.
I think many users would be willing ...
a) 
 to trade higher privacy (using client side filtering) for not having the „incoming transaction“ feature
b) – if they want 0-conf – to fetch all inved transactions
/jonas
Andreas Schildbach via bitcoin-dev
2017-06-19 15:59:57 UTC
Permalink
Raw Message
Post by Jonas Schnelli via bitcoin-dev
Post by Tom Zander via bitcoin-dev
It's been debated if [filtering of] unconfirmed transactions are
necessary,
Why would it not be needed? Any SPV client (when used as a payment-receiver)
requires this from a simple usability point of view.
I think many users would be willing ...
a) … to trade higher privacy (using client side filtering) for not having the „incoming transaction“ feature
b) – if they want 0-conf – to fetch all inved transactions
Another number: I'm answering dozens of support inquiries about
delayed/missing transactions per day. Over the 7 years of Bitcoin
Wallet's existence, I estimate about 50000 inquiries.

On the other hand, I remember only 1 (one) inquiry about the privacy
problems of BIP37 (or privacy at all).

From a regular user's point of view, privacy is non-issue. Sure,
everyone would take it for free, but certainly not if it a) delays
incoming payments or b) quickly eats up your traffic quota.
Jonas Schnelli via bitcoin-dev
2017-06-19 16:22:03 UTC
Permalink
Raw Message
Post by Andreas Schildbach via bitcoin-dev
On the other hand, I remember only 1 (one) inquiry about the privacy
problems of BIP37 (or privacy at all).
IMO privacy its something developers should make sure users have it.
Also, I think, todays SPV wallets should make users more aware of the possible privacy implications.

Do users know, if they pay for a good in a shop while consuming the shops WIFI, that the shop-owner as well as the ISP can use that data to combine it with the user profile (and ~ALL FUTURE purchases you do with the same wallet IN ANY LOCATION online or in-person)?

Do users know, that ISPs (cellular; including Google) can completely link the used Bitcoin wallet (again: all purchase including future ones) with the to the ISP well known user profile including credit-card data and may sell the Bitcoin data to any other data mining company?

If you use BIP37, you basically give your transaction history (_ALL TRANSACTIONS_ including transactions in future) to everyone.
Post by Andreas Schildbach via bitcoin-dev
From a regular user's point of view, privacy is non-issue. Sure,
everyone would take it for free, but certainly not if it a) delays
incoming payments or b) quickly eats up your traffic quota.
This may be true because they are not aware of the ramification and I don’t think client side filtering is a drop-in replacement for todays, smartphone SPV-model.


/jonas
adiabat via bitcoin-dev
2017-06-19 16:36:41 UTC
Permalink
Raw Message
This has been brought up several times in the past, and I agree with
Jonas' comments about users being unaware of the privacy losses due to
BIP37. One thing also mentioned before but not int he current thread
is that the entire concept of SPV is not applicable to unconfirmed
transactions. SPV uses the fact that miners have committed to a
transaction with work to give the user an assurance that the
transaction is valid; if the transaction were invalid, it would be
costly for the miner to include it in a block with valid work.

Transactions in the mempool have no such assurance, and are costlessly
forgeable by anyone, including your ISP. I wasn't involved in any
debate over BIP37 when it was being written up, so I don't know how
mempool filtering got in, but it never made any sense to me. The fact
that lots of lite clients are using this is a problem as it gives
false assurance to users that there is a valid but yet-to-be-confirmed
transaction sending them money.

-Tadge
Andreas Schildbach via bitcoin-dev
2017-06-19 20:49:17 UTC
Permalink
Raw Message
Most SPV wallets make it quite clear that unconfirmed transactions are
just that.
Post by adiabat via bitcoin-dev
This has been brought up several times in the past, and I agree with
Jonas' comments about users being unaware of the privacy losses due to
BIP37. One thing also mentioned before but not int he current thread
is that the entire concept of SPV is not applicable to unconfirmed
transactions. SPV uses the fact that miners have committed to a
transaction with work to give the user an assurance that the
transaction is valid; if the transaction were invalid, it would be
costly for the miner to include it in a block with valid work.
Transactions in the mempool have no such assurance, and are costlessly
forgeable by anyone, including your ISP. I wasn't involved in any
debate over BIP37 when it was being written up, so I don't know how
mempool filtering got in, but it never made any sense to me. The fact
that lots of lite clients are using this is a problem as it gives
false assurance to users that there is a valid but yet-to-be-confirmed
transaction sending them money.
-Tadge
Eric Voskuil via bitcoin-dev
2017-06-20 07:03:43 UTC
Permalink
Raw Message
The reason that BIP37 presents a long list of problems is that it is a client-server scenario wedged into a peer-to-peer network. The only possible excuse for this design was implementation shortcut.

As this thread and others demonstrate, reproducing this design flaw will not eliminate the problems. The fact that there are many wallets dependent upon it is an unfortunate consequence of the original sin, but is not likely to last. There is no rationale for node operators to support wallets apart from their own. As a node implementer interested in privacy, security and scalability, I would never waste the time to code BIP37, or and client-server feature into the P2P protocol, especially one that delegates some aspect of validation.

Other nodes (servers) provide independent, securable, client-server interfaces. Many of these are made available as community servers for use at no charge. They could also provide mechanisms for operator payment without polluting the P2P network.

However as a community we should be working toward full node wallets. A secured personal node/server can support remote mobile wallets with security, privacy and no wasted bandwidth. And if we avoid counterproductive increases in blockchain growth rate, full nodes will eventually be able to run on mobile platforms with no difficulty whatsoever. A wallet that delegates full validation to node operators is just another centralization pressure that we do not need.

e
Post by Jonas Schnelli via bitcoin-dev
Post by Andreas Schildbach via bitcoin-dev
On the other hand, I remember only 1 (one) inquiry about the privacy
problems of BIP37 (or privacy at all).
IMO privacy its something developers should make sure users have it.
Also, I think, todays SPV wallets should make users more aware of the possible privacy implications.
Do users know, if they pay for a good in a shop while consuming the shops WIFI, that the shop-owner as well as the ISP can use that data to combine it with the user profile (and ~ALL FUTURE purchases you do with the same wallet IN ANY LOCATION online or in-person)?
Do users know, that ISPs (cellular; including Google) can completely link the used Bitcoin wallet (again: all purchase including future ones) with the to the ISP well known user profile including credit-card data and may sell the Bitcoin data to any other data mining company?
If you use BIP37, you basically give your transaction history (_ALL TRANSACTIONS_ including transactions in future) to everyone.
Post by Andreas Schildbach via bitcoin-dev
From a regular user's point of view, privacy is non-issue. Sure,
everyone would take it for free, but certainly not if it a) delays
incoming payments or b) quickly eats up your traffic quota.
This may be true because they are not aware of the ramification and I don’t think client side filtering is a drop-in replacement for todays, smartphone SPV-model.
/jonas
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Tom Zander via bitcoin-dev
2017-06-19 16:07:45 UTC
Permalink
Raw Message
Post by Jonas Schnelli via bitcoin-dev
Hi
Post by Tom Zander via bitcoin-dev
It's been debated if [filtering of] unconfirmed transactions are
necessary,
Why would it not be needed? Any SPV client (when used as a
payment-receiver) requires this from a simple usability point of view.
I think many users would be willing ...
a) … to trade higher privacy (using client side filtering) for not having
the „incoming transaction“ feature b) – if they want 0-conf – to fetch
all inved transactions
You seem to misunderstand the usecase.
If you send me a transaction, both of use are using our phones, then I need
to be able to have immediate feedback on the transaction being broadcast on
the network.
This is not about zero-conf, this is simple seeing what is happening while
it is happening.

Additionally, when the transaction that is meant for my wallet is broadcast,
I want my SPV wallet to parse and check the actual transaction.
It is not just to see that *something* was actually send, but also to be
able to see how much is being paid to me. Maybe If the transaction is marked
as RBF-able, etc.

Really basic usability: provide information to your users when you can,
should they want to, and by default on.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Jonas Schnelli via bitcoin-dev
2017-06-19 16:30:18 UTC
Permalink
Raw Message
Post by Tom Zander via bitcoin-dev
Post by Jonas Schnelli via bitcoin-dev
I think many users would be willing ...
a) 
 to trade higher privacy (using client side filtering) for not having
the „incoming transaction“ feature b) – if they want 0-conf – to fetch
all inved transactions
You seem to misunderstand the usecase.
If you send me a transaction, both of use are using our phones, then I need
to be able to have immediate feedback on the transaction being broadcast on
the network.
This is not about zero-conf, this is simple seeing what is happening while
it is happening.
Additionally, when the transaction that is meant for my wallet is broadcast,
I want my SPV wallet to parse and check the actual transaction.
It is not just to see that *something* was actually send, but also to be
able to see how much is being paid to me. Maybe If the transaction is marked
as RBF-able, etc.
Really basic usability: provide information to your users when you can,
should they want to, and by default on.
I see this use case.
But I did receive bank wire transfers for the last decades without _immediately_ knowing that someone sent funds to me.
I personally would ALWAYS trade the higher bandwidth consumption (300MB mempool filtering) or slower notification time (maybe ~1h) for preserving privacy.
I agree, there are use cases where you want immediate notification, those use cases could probably be solved by not trowing away privacy („parsing“ all transactions and running in the background).

/jonas
Tom Zander via bitcoin-dev
2017-06-19 16:38:55 UTC
Permalink
Raw Message
I personally would ALWAYS [snip]
I mentioned that it was a usability point for a reason, and your personal
behavior makes me want to quote one of the main UX guidelines;
“You are not your user”

http://uxmyths.com/post/715988395/myth-you-are-like-your-users
older;
http://52weeksofux.com/post/385981879/you-are-not-your-user

I think we should defer to actual real numbers and user reseach, as has been
quoted by Andreas. You disagreeing based on your own experience and behavior
is worse than useless. As the above links show.

Don’t fall in that trap :)
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
Andreas Schildbach via bitcoin-dev
2017-06-19 15:43:14 UTC
Permalink
Raw Message
Just to give you a number: based on the statistics of the Bitcoin Wallet
app there are at least 2 million wallets depending on BIP37. Not all
would need instant notification but based on the daily support enquiries
instant notificaton is the most asked property of Bitcoin.
Post by bfd--- via bitcoin-dev
Several times. It's been debated if unconfirmed transactions are
necessary, methods of doing more private filtering have been suggested,
along with simply not filtering unconfirmed transactions at all. My
collected data suggests that there is very little use of BIP37 at
present, based on incoming connections to nodes I know end up in the DNS
seed responses (no "SPV" clients do their own peer management).
Post by Andreas Schildbach via bitcoin-dev
I'm not sure if this has been brought up elsewhere in this thread.
This proposal doesn't seem to be a complete replacement of BIP37: It
doesn't provide a filter for unconfirmed transactions like BIP37 does.
That means that most light clients will continue to use BIP37 even if
they may use this BIP as a supplement. Otherwise users would not get
timely notification of incoming payments any more.
Post by Olaoluwa Osuntokun via bitcoin-dev
Hi y'all,
Alex Akselrod and I would like to propose a new light client BIP for
*
https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
[...]
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Jonas Schnelli via bitcoin-dev
2017-06-19 16:10:13 UTC
Permalink
Raw Message
Post by Andreas Schildbach via bitcoin-dev
Just to give you a number: based on the statistics of the Bitcoin Wallet
app there are at least 2 million wallets depending on BIP37. Not all
would need instant notification but based on the daily support enquiries
instant notificaton is the most asked property of Bitcoin.
Yes. Users probably like this feature and client side filtering is not a drop-in replacement for BIP37.

We should also consider:
BIP37 works, because node-operators are willing to offer that service for free (which maybe change over time).
BIP37 consumes plenty of horsepower (disk/cpu) from nodes. Filtering a couple of days of blocks (assume 1000+) eats lots of resources for something, that has no direct long-term value for Bitcoin (the filters data is unique and will be "thrown away“ [can’t be used by other peers]). Same applies for mempool (filtering mempool of a couple of hundred of mb each time the HD gap limit has been exceeded or the app gets sent to the foreground again).

Purely relying on the availability of BIP37 seems fragile to me and start to explore other ways is very reasonable.

/jonas
Gregory Maxwell via bitcoin-dev
2017-06-19 22:41:49 UTC
Permalink
Raw Message
On Mon, Jun 19, 2017 at 12:26 PM, bfd--- via bitcoin-dev
Several times. It's been debated if unconfirmed transactions are necessary,
methods of doing more private filtering have been suggested, along with
simply not filtering unconfirmed transactions at all. My collected data
suggests that there is very little use of BIP37 at present, based on
incoming connections to nodes I know end up in the DNS seed responses (no
"SPV" clients do their own peer management).
Sending just the output addresses of each transaction would use about
1 kilobit/s of data. Sending the entire transactions would use
~14kbit/sec data. These don't seem to be a unsustainable tremendous
amount of data to use while an application is running.

Doubly so for SPV wallets which are highly vulnerable to unconfirmed
transactions, and many which last I heard testing reports on became
pretty severely corrupted once given a fake transaction.

Can someone make a case why saving no more than those figures would
justify the near total loss of privacy that filtering gives?

"Because they already do it" isn't a good argument when talking about
a new protocol feature; things which already do BIP37 will presumably
continue to already do BIP37.
Tom Zander via bitcoin-dev
2017-06-20 09:52:00 UTC
Permalink
Raw Message
On Tuesday, 20 June 2017 00:41:49 CEST Gregory Maxwell via bitcoin-dev
Post by Gregory Maxwell via bitcoin-dev
Can someone make a case why saving no more than those figures would
justify the near total loss of privacy that filtering gives?
First, your figures are wrong and also fall out of the sky with no
justification. Can’t debunk something that is pure garbage.

Second, stating that a bloom filter is a "total loss of privacy" is equally
baseless and doesn’t need debunking.
Post by Gregory Maxwell via bitcoin-dev
"Because they already do it" isn't a good argument when talking about
a new protocol feature; things which already do BIP37 will presumably
continue to already do BIP37.
I think you just made the case for completely rejecting this proposal based
on the fact that nobody will use it, BIP37 already exists.

Not sure if I agree with that, improvements are always useful and we should
be able to come up with replacements.
But arguing against a feature you don’t like, especiallyh one used by
millions every day, is a sad way to stiffle innovation, Greg.
--
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel
bfd--- via bitcoin-dev
2017-06-20 13:08:03 UTC
Permalink
Raw Message
Post by Tom Zander via bitcoin-dev
Second, stating that a bloom filter is a "total loss of privacy" is equally
baseless and doesn’t need debunking.
"On the Privacy Provisions of Bloom Filters in Lightweight Bitcoin
Clients"
Post by Tom Zander via bitcoin-dev
We show analytically and empirically that the reliance on Bloom filters
within existing SPV clients leaks considerable information about the
addresses of Bitcoin users. Our results show that an SPV client who
uses a modest number of Bitcoin addresses (e.g., < 20) risks revealing
almost all of his addresses.
https://eprint.iacr.org/2014/763.pdf
Adam Back via bitcoin-dev
2017-06-20 17:20:53 UTC
Permalink
Raw Message
Also Jonas Nick gave a fairly comprehensive presentation on privacy
leaks in bitcoin protocol including SPV and bloom query problem
specifics:



Adam


On 20 June 2017 at 14:08, bfd--- via bitcoin-dev
Post by Tom Zander via bitcoin-dev
Second, stating that a bloom filter is a "total loss of privacy" is equally
baseless and doesn’t need debunking.
"On the Privacy Provisions of Bloom Filters in Lightweight Bitcoin Clients"
Post by Tom Zander via bitcoin-dev
We show analytically and empirically that the reliance on Bloom filters
within existing SPV clients leaks considerable information about the
addresses of Bitcoin users. Our results show that an SPV client who uses a
modest number of Bitcoin addresses (e.g., < 20) risks revealing almost all
of his addresses.
https://eprint.iacr.org/2014/763.pdf
_______________________________________________
bitcoin-dev mailing list
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Loading...