Threaded Mode | Linear Mode

eternaleye · (This post was last modified: 03-19-2011 08:51 PM by eternaleye.)

(03-19-2011 07:50 PM)overload Wrote: 1) node ids: the description above explains how to generate key ids, but it is not clear to me how to generate and manage the node ids that will contain these key ids.

Yes, there will need to be a spec on how to generate node IDs. Moreover, it needs to be verifiable - that way attackers trying to purposefully choose a specific node ID can be identified and ignored.

(03-19-2011 07:50 PM)overload Wrote: 1a) in kademlia dht, it is usually expected that new nodes will have one random node id, and that symbols with similar enough ids will be stored in those nodes. However, since the symbol ids are hashes, they will be effectively random as well, meaning that they will be spread around random nodes, and I cannot specify my preferred nodes. This makes it difficult for me to provide my authoritative source(s) of 'kennings' db (which I know is up-to-date etc), or to do geographical redundancy of the db, since the k nodes randomly chosen to store my kennings could by chance be all in Egypt and go down at once. A solution might be allowing a server to have many node ids, which I could made similar enough to all my symbol ids so as to store all of them in the same server.

Part of the point of using a DHT is that there is no *single* authoritative source, because any single source can be coerced. The randomness is necessary to prevent a malicious entity from intentionally gaining control over the entirety of the duplicates for a given Kenning, and misusing that control. Basically, we're sacrificing control for security. The chances of all K duplicates being in a single area decreases much faster than linearly as K increases - a reasonable value will make it close enough to zero as not to matter. This is statistical geographic redundancy, rather than deterministic geographical redundancy (which is weak against several classes of attacks). Also, regarding the concern of the DB being up to date, this is an invalid concern - Unlike DNS, where most users configure a caching DNS server as their primary and only use the authoritative one when the cache lookup fails (google DNS, opendns, etc), and the cache can become stale, this design has all users using the authoritative source in all cases. Thus, the results *cannot* be out of date. If you want to increase the type of reliability you seem to be valuing, you could create a caching proxy for this system that stores records when you look them up and is under your control, so that if the authoritative sources all go down you couls still retrieve the data. However, this would actually make one of your concerns (freshness) *worse* due to reintroducing the problems with caching DNS resolvers.

(03-19-2011 07:50 PM)overload Wrote: 1b) It seems like it might be possible for anyone to join the dht with thousands of malicious node ids (or a suitable % of number of the dht nodes, whatever is larger) that will send any kennings they receive to /dev/null, effectively creating black holes in the dht, since nodes in kademlia are supposed to keep only up to k node id refs, where k is a small number ~20, and all k refs could be pointing to black hole nodes.

This is a well-researched concern. I recommend reading the paper "S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing," especially the section on "Sybil attacks."

(03-19-2011 07:50 PM)overload Wrote: 2) abuse of storage: it seems like I could start pumping an unlimited number of kennings into this dht, containing sectors of my file system, and start abusing the storage of others to store my personal data. There's this 30-day limit for the kennings, which alleviate this problem somehow, but apart from small technical details of detecting if the timestamp in the add message is 100 years in the future etc (so that it won't be deleted for 100 years), 30 days is enough time to allow refreshing a reasonable amount of data in the current scheme, probably a few TBs or more depending on the bandwidth available to whoever is pumping the data. If everybody starts pumping that much data, there won't be enough space in all the dht nodes. This could also flush out the real data out of each dht node, which would only have a limited storage space and would only be able to keep the last TB or so of (garbage?) data.

This is not as easy as you seem to be thinking it is. For one, they only have 3 types of Kenning they could spam: Identities, Names, and Records.

Identities cannot be used in this manner: they would have to generate a fresh public key for every one, which is time-consuming.

Names may be usable for this, by choosing a randomized Name and pointing it at a pre-existing Identity. However, the amount of space they take up would be negligible, they would expire after 30 days, and this can be trivially made moot by a.) including the Name itself in the signed data of the value, and b.) including some sort of challenge into posting a Name, such as a trailing 64-bit integer that is random, and adding the constraint that the value (ie, { timestamp, Name, identity, signature( { timestamp, Name, identity } ), random64bit } must have a hash that has N leading zeroes. This constraint, shamelessly stolen from HashCash, nips this attack in the bud.

The same challenge system would also make Records infeasible to use for such an attack.

Also, the 'never expires' trick could be nullified by saying "Kennings with timestamps more than 1 day in the future are invalid and must not be accepted."

eternaleye · (This post was last modified: 03-19-2011 09:19 PM by eternaleye.)

Okay, so here's the design updated for the changes I made due to feedback (I also fixed it stripping out the indentation):

256-bit S/Kademlia
Basic building block: Kennings
Hash: SHA-256 (Plan to transition to SHA3-256 when it is released)
On changing the hash function, alter the first-byte values for all Kenning types to the next unused values (see symbol definitions per type)

Note on S/Kademlia:
See paper "S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing"
Available at http://www.spovnet.de/files/publications...ia2007.pdf

Kennings are composed of a 'symbol' and any number of 'values'
Kennings 'decay' - they disappear after 30 days if the counter is not reset (see 'Add' message)
    The counter starts from the timestamp in the value
    Values with timestamps more than 1 day in the future *MUST* not be accepted by conforming nodes
3 types of Kennings: Identifiers, Names, and Records

Identifiers:
    An identifier's symbol is a 256-bit ( hash of a PGP public key's fingerprint ) with the first byte set to '0'
    Each value *MUST* be a PGP key signed with the key whose fingerprint the symbol represents
    The format must be:
        UTC timestamp
        PGP public key
        Signature of the tuple ( timestamp, publickey ) using the key whose fingerprint the symbol is derived from
    NOTE: This does not necessarily mean self-signed - one can have a new key, signed by the old key.
        This allows expiration of keys and migration to new ones.
        This also allows many identifiers to show a common owner - have all of them reference and be referenced by the same PGP key in addition to their own key.
        This *ALSO* allows an implementation of communities, in a sense - any site belonging to a community would reference and be referenced by a 'community key'. Anyone who wants to 'join' can 'accept' the community key, transitively accepting all members of the community.

Names:
    A name's symbol is a 256-bit ( hash of a human-readable string ) with the first byte set to '1'
    Each value *MUST* be a of the sollowing form:
        A timestamp in UTC
        The Name as a utf-8 string
        A reference to an Identifier
        A signature of the tuple ( timestamp, Name, Identifier ) using the public key of the identifier
        A random 64-bit value
    And must meet the constraint that the hash (same hash as for Kenning symbols) of the value must have N leading zero bits. (N to be defined)

Records:
    All records 'belong' to an Identifier
    A record's symbol is a 256-bit ( hash of the concatenation of the symbol of the Identifier it belongs to and the record type ), with the first byte set to '2'
    The format of each value depends on the type of record, with A, AAAA, MX, etc. records acting much as in DNS. However, all these values have these common invariants:
    Each value *MUST* contain a 'backreference' to the owning Identifier
    Each value *MUST* be signed with the PGP key for the owning Identifier
    Specifically, values must be of the form:
        UTC timestamp
        Identifier
        Type-dependent data
        Signature of the tuple ( timestamp, Identifier, data ) with the public key of the Identifier
        Random 64-bit value
    And must meet the constraint that the hash (same hash as for Kenning symbols) of the value must have N leading zero bits. (N to be defined)

Messages:
    As this is based on a DHT, we need to define what operations on symbols are valid.
    Lookup:
        Given a symbol, retrieve all of that symbol's values
    Add:
        Given a symbol and a complete value, add the value to the list for that symbol
        The node acting on this message *MUST* validate the relevant signatures
        If an 'Add' message is received for a value that already exists, set the delay counter's start point to the new timestamp
            To be clear, 'already exists' means that all information aside from:
                The timestamp
                The signature
                The random 64-bit value (if it is present)
            *MUST* be the same.
    Delete:
        Given a symbol and a complete value that has been signed *AGAIN* by the same PGP key, delete the value
        Again, the node acting on this *MUST* validate signatures

As far as who the peers in the DHT are, I have a pretty elegant idea: Have them be the servers the Identifiers reference!
The Kennings need to be republished periodically anyway, and who better to do it than the very servers they point to?
This also greatly reduces the amount of churn in the network, since servers go down very rarely.

Here's an imagined workflow:

Person A wishes to publish a website and mailserver, accessible via IPv4 and IPv6.
He generates his personal key, let's call it 0x1234.
He then generates a key for his webserver (0xF00D) and his mailserver (0xBEEF)

He sends the Add message a few times, creating the Identifiers for each of the 3 keys.
He then signs 0x1234 with 0xF00D, and Add's it to the 0xF00D identifier as well
He also signs it with 0xBEEF and Add's it to 0xBEEF's Identifier

He then adds an A and AAAA record belonging to each of 0xF00D and 0xBEEF's Identifiers, with the relevant IP addresses
Next, he sends an Add message creating a Name, say 'goodfoodsite' referencing 0xF00D
And another creating 'goodfoodmail' referencing 0xBEEF
Finally, he adds an MX record on 0xF00D referencing 'goodfoodmail'

Another:

Person B wants to visit 'goodfoodsite'

He opens his browser and goes there
His browser looks up the Name 'goodfoodsite', and finds two values - 0xF00D and 0xDEAD
Neither is known to the browser. 0xDEAD has no values other than a self signed key, but 0xF00D references 0x1234
Since Person B knows Person A personally (heh), he's already accepted his public key 0x1234
The browser has an accepted selection, so it looks up 0xF00D's AAAA record
It then contacts that IP address, and loads the webpage.

overload · 03-20-2011, 04:31 AM

I want to make sure that this design can make idons resistant to the strongest pressures from the strongest of opponents, so pardon me here if it looks I'm being picky. I just want to make sure that all attacks I can think of are being taken into account.

(03-19-2011 08:31 PM)eternaleye Wrote:
(03-19-2011 07:50 PM)overload Wrote: This makes it difficult for me to provide my authoritative source(s) of 'kennings' db (which I know is up-to-date etc),
Part of the point of using a DHT is that there is no *single* authoritative source, because any single source can be coerced.

By "authoritative source", what I mean is the owner of the data, not something like the dns registrar. I agree that a single source can be coerced: my single dns registrar can be coerced to change my data it is supposed to publish, and I could be coerced to change my data. But while it's possible to detect the former using eg. signatures, there's no way to design against the latter, because there's no way to detect if I'm being coerced or not when I change my data using my private key.

I'll use preferred instead of authoritative to avoid confusion, and I'll illustrate with an example why I think that being able to provide preferred sources of kennings is importat. Imagine a site called "freedomleaks" which publishes internal corruption of governments to the world. It will certainly attract lots of governments wanting to shut it down. Using this dht design, they could threat the (thousand of?) random nodes hosting the symbol ids for "freedomleaks", demanding these random nodes to remove the "freedomleaks" symbol entries or else their families and business will be affected. The random nodes will most likely give in. Since freedomleaks is already putting up a fight, it would not remove any symbols from any freedomleaks' servers in case these servers existed at all. But in the current design, it is not very useful for freedomleaks to have their own idons servers, because they will most likely not have a node id suitable for hosting freedomleaks symbol ids. If freedomleaks could point their own servers as preferred nodes when resolving freedomleaks, then governments would need to go after the freedomleaks servers, which could be cunningly spread all over the world, antarctic, moon, wherever, before being able to shut down freedomleaks. Of course, the random nodes are still useful as a fallback in case the freedomleaks servers are overwhelmed with too many requests or any other reason, but freedomleaks would rather trust their own servers to hold to their
symbols than random nodes.

Not sure how to implement preferred servers in dht; it looks like dht is designed against this feature. The best I can come up is for freedomleaks to generate many node ids that are similar to the freedomleaks symbol ids and use them across the globe, but then anyone using idons could do the same thing and the next sybil attacks in dht kick in.

(03-19-2011 08:31 PM)eternaleye Wrote:
(03-19-2011 07:50 PM)overload Wrote: 1b) It seems like it might be possible for anyone to join the dht with thousands of malicious node ids ... effectively creating black holes in the dht ...
This is a well-researched concern. I recommend reading the paper "S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing," especially the section on "Sybil attacks."

I had a look at this paper, and in page 3 it says:

"Sybil attack: In completely decentralized systems there
is no instance that controls the quantity of nodeIds an at-
tacker can obtain. Thus an attacker can join the network
with lots of nodeIds until he controls a fraction m of all
nodes in the network. Douceur [8] proved that this attack
can not be prevented but only impeded. "

and in page 4 it says:
"We now need to impede the
Sybil and Eclipse attack. This is done by either using a
crypto puzzle or a signature from a central certificate au-
thority...:
• Supervised signature: If a signature’s public key addi-
tionally is signed by a trustworthy certificate authority
[so, no supervision should be used, since idons shouldn't depend on any single authority anywhere],
this signature is called supervised signature.
• Crypto puzzle signature: In the absence of a trustwor-
thy authority we need to impede the Eclipse and Sybil
attack with a crypto puzzle. In [3] the use of crypto
puzzles for nodeId generation is rejected because they
cannot be used to entirely prevent an attack. But in
our opinion they are the most effective approach for
distributed nodeId generation...
"

I think I see a big problem when using cryptopuzzles to protect against sybil attacks. Cryptopuzzles can be effective by requiring a single server to spend hours or days or months of computation before being able to produce a valid node id. However, attackers can use zombie networks that would be able to pump thousands/millions of node ids to the cryptopuzzle-protected dht by using thousands and millions of zombie computers acting on their behalf. Once created the rogue nodeids can be used to completely disrupt the whole of the dht and so idons permanently.

Since the remaining suggested protection agains sybil attacks rely on a single certificate authority, and therefore is out of consideration, I don't see how dht and therefore idons would be able to resist a reasonably orquestrated sybil attack by a zombie network that would cost probably a few hundred dollars.

(03-19-2011 08:31 PM)eternaleye Wrote:
(03-19-2011 07:50 PM)overload Wrote: 2) abuse of storage: it seems like I could start pumping an unlimited number of kennings into this dht,
... Names may be usable for this, by choosing a randomized Name and pointing it at a pre-existing Identity. ... this can be trivially made moot by ... and b.) including some sort of challenge into posting a Name, such as a trailing 64-bit integer that is random, and adding the constraint that the value (ie, { timestamp, Name, identity, signature( { timestamp, Name, identity } ), random64bit } must have a hash that has N leading zeroes. This constraint, shamelessly stolen from HashCash, nips this attack in the bud.

The same challenge system would also make Records infeasible to use for such an attack.

I believe that, in the case of impeding creation of too many symbol ids, cryptopuzzles are an improvement because then a zombie network would only inject symbol ids for individual users of the dht network at a time, instead of affecting the whole dht network as with node ids. Therefore, paying a zombie network doesn't seem economically viable for a single user: it is much cheaper for the user to just buy a hard disk than to pay a zombie network to store his backup as dht kennings.

eternaleye · (This post was last modified: 03-20-2011 01:20 PM by eternaleye.)

(03-20-2011 04:31 AM)overload Wrote: I want to make sure that this design can make idons resistant to the strongest pressures from the strongest of opponents, so pardon me here if it looks I'm being picky. I just want to make sure that all attacks I can think of are being taken into account.

No problem, your feedback is definitely helpful.

(03-20-2011 04:31 AM)overload Wrote: (snip rather convincing example)

Not sure how to implement preferred servers in dht; it looks like dht is designed against this feature. The best I can come up is for freedomleaks to generate many node ids that are similar to the freedomleaks symbol ids and use them across the globe, but then anyone using idons could do the same thing and the next sybil attacks in dht kick in.

It's true that DHTs are designed against choosing who hosts things. However, perhaps we could use something that I mentioned in another thread: delegation. You would 'trust' a delegate, who would do lookups on your behalf. They would be free to serve specific records from themselves rather than (or in addition to) the ones from the DHT. This could be problematic in terms of security, as you are basically trusting them absolutely as far as name resolution is concerned, but that can be alleviated by *also* doing lookups yourself and using the union of the results. However, any 'preferred servers' scheme has a drawback: You must have a stable way to refer to the preferred server. You can't use a name in this case because when you need it, you can't look up names yet. Therefore, you'll need raw IP addresses and port numbers, which are often unstable and rarely easy to remember.

(03-20-2011 04:31 AM)overload Wrote: I think I see a big problem when using cryptopuzzles to protect against sybil attacks. Cryptopuzzles can be effective by requiring a single server to spend hours or days or months of computation before being able to produce a valid node id. However, attackers can use zombie networks that would be able to pump thousands/millions of node ids to the cryptopuzzle-protected dht by using thousands and millions of zombie computers acting on their behalf. Once created the rogue nodeids can be used to completely disrupt the whole of the dht and so idons permanently.

Since the remaining suggested protection agains sybil attacks rely on a single certificate authority, and therefore is out of consideration, I don't see how dht and therefore idons would be able to resist a reasonably orchestrated sybil attack by a zombie network that would cost probably a few hundred dollars.

It's true that a Sybil attack would probably be the most effective one against this design; however I have yet to see a distributed name service that has a stronger defense against Sybil attacks than using cryptopuzzles. 'Scrolls' ( http://www.aaronsw.com/weblog/squarezooko ), for instance, use the entirety of the name history for the entire network as the cryptopuzzle an attacker must solve - but the thing that makes this harder over time could be simulated (to a degree) by gradually increasing N.

eternaleye · 03-20-2011, 04:17 PM

Having just quoted the Scrolls idea, I went and reread it. This time, the author added a link to a FAQ and other pages about the idea, one of which had an idea I think is worthwhile to add to my design: non-parallelizable HashCash. The HashCash design is such that it is trivially parallelizable; this makes it vulnerable to an attacker farming out the computation to a GPU or a botnet. If it is not parallelizable, though, their max speed is the same as any other user with same-speed single-threaded hardware.

eternaleye · 03-29-2011, 09:55 PM

Found a good paper on non-parallelizable puzzles. http://doc.utwente.nl/69557/1/On_Non-Par..._Modes.pdf