Threaded Mode | Linear Mode

eternaleye · (This post was last modified: 03-12-2011 11:37 PM by eternaleye.)

Primary architecture: 256-bit Kademlia DHT
Basic building block: "Kenning"
Hash: SHA-256 (Plan to transition to SHA3-256 when it is released)
On changing the hash function, alter the first-byte values for all Kenning types to the next unused values (see symbol definitions per type)

Kennings are composed of a 'symbol' and any number of 'values'
Kenning values 'decay' - they disappear after 30 days if the counter is not reset (see 'Add' message)
3 types of Kennings: Identifiers, Names, and Records

Identifiers:
An identifier's symbol is a 256-bit ( hash of a PGP public key's fingerprint ) with the first byte set to '0'
Each value *MUST* be a PGP key signed with the key whose fingerprint the symbol represents
NOTE: This does not necessarily mean self-signed - one can have a new key, signed by the old key.
This allows expiration of keys and migration to new ones.
This also allows many identifiers to show a common owner - have all of them reference the same PGP key in addition to their own key.
This *ALSO* allows an implementation of communities, in a sense - any site belonging to a community can reference a 'community key'. Anyone who wants to 'join' can 'accept' the community key, transitively accepting all members of the community.

Names:
A name's symbol is a 256-bit ( hash of a human-readable string ) with the first byte set to '1'
Each value *MUST* be a reference to an Identifier, signed with the relevant PGP key

Records:
All records 'belong' to an Identifier
A record's symbol is a 256-bit ( hash of the concatenation of the symbol of the Identifier it belongs to and the record type ), with the first byte set to '2'
The format of each value depends on the type of record, with A, AAAA, MX, etc. records acting much as in DNS. However, all these values have these common invariants:
Each value *MUST* contain a 'backreference' to the owning Identifier
Each value *MUST* be signed with the PGP key for the owning Identifier

Messages:
As this is based on a DHT, we need to define what operations on symbols are valid.
Lookup:
Given a symbol, retrieve all of that symbol's values
Add:
Given a symbol and a complete value, add the value to the list for that symbol
The node acting on this message *MUST* validate the relevant signatures
If an 'Add' message is received for a value that already exists, reset the 'decay' counter
Delete:
Given a symbol and a complete value that has been signed *AGAIN* by the same PGP key, delete the value
Again, the node acting on this *MUST* validate signatures

As far as who the peers in the DHT are, I have a pretty elegant idea: Have them be the servers the Identifiers reference!
The Kennings need to be republished periodically anyway, and who better to do it than the very servers they point to?
This also greatly reduces the amount of churn in the network, since servers go down very rarely.

Here's an imagined workflow:

Person A wishes to publish a website and mailserver, accessible via IPv4 and IPv6.
He generates his personal key, let's call it 0x1234.
He then generates a key for his webserver (0xF00D) and his mailserver (0xBEEF)

He sends the Add message a few times, creating the Identifiers for each of the 3 keys.
He then signs 0x1234 with 0xF00D, and Add's it to the 0xF00D identifier as well
He also signs it with 0xBEEF and Add's it to 0xBEEF's Identifier

He then adds an A and AAAA record belonging to each of 0xF00D and 0xBEEF's Identifiers, with the relevant IP addresses
Next, he sends an Add message creating a Name, say 'goodfoodsite' referencing 0xF00D
And another creating 'goodfoodmail' referencing 0xBEEF
Finally, he adds an MX record on 0xF00D referencing 'goodfoodmail'

Another:

Person B wants to visit 'goodfoodsite'

He opens his browser and goes there
His browser looks up the Name 'goodfoodsite', and finds two values - 0xF00D and 0xDEAD
Neither is known to the browser. 0xDEAD has no values other than a self signed key, but 0xF00D references 0x1234
Since Person B knows Person A personally (heh), he's already accepted his public key 0x1234
The browser has an accepted selection, so it looks up 0xF00D's AAAA record
It then contacts that IP address, and loads the webpage.

bug1 · 03-14-2011, 04:39 PM

There are bits that i dont understand, but one issue in particular i would like to be clear on is how name collision is handled.

So for example if hundreds of people want 'goodfoodsite' as a human identifer, they all produce the same hash, and associate it with their unique key.

People search for goodfoodsite by hashing it and searching for the hash (in the DHT tree ?), they find hundreds of hits with different unique identifiers.

They then have to decide for themselves which of the sites to go to based on other information associated with that identifier, possibly trust metrics based on key signings.

Is that correct ?

eternaleye · 03-14-2011, 06:43 PM

(03-14-2011 04:39 PM)bug1 Wrote: There are bits that i dont understand, but one issue in particular i would like to be clear on is how name collision is handled.

So for example if hundreds of people want 'goodfoodsite' as a human identifer, they all produce the same hash, and associate it with their unique key.

People search for goodfoodsite by hashing it and searching for the hash (in the DHT tree ?), they find hundreds of hits with different unique identifiers.

They then have to decide for themselves which of the sites to go to based on other information associated with that identifier, possibly trust metrics based on key signings.

Is that correct ?

Precisely. What I think would be the best way to handle it is to have a common library, and have 'accepting' a key occur via a call to the library, rather than in in each application in its own manner. That way, it could store trusted keys in a persistent per-user manner, similar to how untrusted HTTPS certificates are stored by browsers today, but across multiple applications - so for instance if you visit GMail from the web and trust Google's key (pointed to by GMail's key), your mail client will automatically trust the correct, official Google server.

Thinking on this, it may be good to require identity-to-identity-links to have a bidirectional link, to prevent people from capitalizing on a widely-trusted identity.

eternaleye · 03-14-2011, 07:25 PM

Grr, my computer froze so I couldn't edit before the timeout. I was going to clarify that bidirectional links would be required for "same person" relationships, while there would be a "Trusts" recordtype of the same format to indicate that the owner of the identifier trusts the owner of the public key in the value.

bug1 · 03-14-2011, 09:31 PM

Maybe we could support legacy (ICANN) domain names by having an identifier that looks up the hash of the domain name and checks if its been signed by an unofficial identifier that we could create.

i.e. we suggest people trust a key that represents traditional domain names.

If people dont want it they would be free to not trust it, so we wouldnt be forcing things on them.

If someone wanted to create there own, say, microsoft.com they could, but people who trust this signed key that represents the legacy system wouldnt get surprised by an what might be considered an imposter. If they want to create their own human name that conflicts with ICANN they could, but they would have to get people to trust their signature more than the legacy system.

eternaleye · (This post was last modified: 03-14-2011 10:12 PM by eternaleye.)

(03-14-2011 09:31 PM)bug1 Wrote: Maybe we could support legacy (ICANN) domain names by having an identifier that looks up the hash of the domain name and checks if its been signed by an unofficial identifier that we could create.

The idea has potential, but it has a major flaw: It reintroduces central authority.

The official identifier's key would need to be kept secret, or else people could use it to spoof domains. This means there can be only one source of this data, who could be subpoena'd etc. It would probably be better to specifically say that there is no compatibility mode, and applications wanting to support both systems must look up each explicitly. Maybe we could have a low-level library that only deals with this proposed system, and a higher level one that's plugin-based an has plugins for both this system and DNS and abstracts name resolution completely. Building that library would probably be made easier by the fact that both systems support roughly the same set of record types.

In that case, we may want to rename record types specific to this system so that they are in a namespace that's uncontrolled in DNS, like X- headers in email.

bug1 · 03-15-2011, 01:08 AM

Hmm, it introduces a reference to a central authority, but i expect users would be able to disregard that reference.

The problem i was think of, is if Dr Evil starts signing heaps of ICANN style domain names and pushing them into this system, users might search for say Bank of America, results might include "bankofamerica.com" and "Bank of America", if some people ended up following the bankofamerica.com signed by Dr Evil it might take them to a phishing site that cleans their account.

I think if we going to allow freeform names, it would be better for everyone if we encouraged someone honest to setup a legacy system early and give their key a chance to develop some trust before someone evil tries to do it.

Alternatively we could come up with a way to invalidate ICANN style domain names, which people might try working around by using different characters anyway, eg 'bankofamerica,com' 'bankofamerica com'

bug1 · 03-15-2011, 01:12 AM

Or we could have a higher level library (as you suggest) that checks both and warns when there is a discrepency with the results.

eternaleye · 03-15-2011, 03:08 AM

(03-15-2011 01:08 AM)bug1 Wrote: Alternatively we could come up with a way to invalidate ICANN style domain names, which people might try working around by using different characters anyway, eg 'bankofamerica,com' 'bankofamerica com'

Disallowing the dot is the only really foolproof way of doing that.
I do think we need a way of specifying which identifier to look up directly, rather than via a Name, but that's probably solvable just by using an underscore followed by a hex representation of the key fingerprint. (The underscore is to prevent conflict with ICANN names and alternate forms of IP addresses, since they forbid it, with the side benefit of matching nicely with the coding practice of using leading underscores to indicate internal values)

(Did you know you can enter the hex or decimal representation of an IP address as a single integer in some browsers and it'll work?)

eternaleye · (This post was last modified: 03-17-2011 10:28 AM by eternaleye.)

I just realized that the Add and Delete messages in the protocol I stated have a security hole: An attacker could perpetuate a stale value of a symbol indefinitely, and even if the owner sent a Delete message, they could reinstate it. The obvious solution is for Add messages to include a UTC timestamp, and have the 30-day expiration count from that timestamp (which is inside the signed portion of the data and thus unforgeable). Also, Delete messages must be stored until the expiration date of the record they delete, to prevent a replay attack of the record it deleted.

It should be noted that an Add message differing only in the timestamp (iff the timestamp is newer than the old one) [of course the signature will differ as well, since the signed data changed] is counted as a refresh.

***lauren*** · 03-17-2011, 11:22 AM

My strong preference is to isolate "ICANN-compatibility" (that is, the current TLD infrastructure) from the IDONS framework per se, except for minimally necessary temporary lookup hooks into the existing BIND libraries, that would be available in parallel during the transition period. Other than that, IDONS should not (in my opinion) be operating on the legacy TLD identifiers in any way.

This "separated" approach helps to avoid a number of significant potential technical and "political" problems.

--Lauren--

eternaleye · 03-17-2011, 08:14 PM

Lauren, I wholeheartedly agree that they should be as fully isolated as possible. I do, however, doubt that the existing DNS resolution libraries would be able to be extended to whatever IDONS ends up being.

***lauren*** · 03-17-2011, 08:21 PM

I'm not suggesting that the existing DNS libraries be extended for anything. I'm suggesting that during the transition phase there will need to be a library with hooks to direct IDONS addresses to the actual IDONS libraries, and legacy DNS addresses to the legacy DNS libraries. The actual IDONS and DNS libraries stay completely separate.

--Lauren--

eternaleye · 03-17-2011, 08:32 PM

Oh, I see what you mean. Total agreement here; In fact I said just about the same thing a bit back in the thread.

overload · 03-19-2011, 07:50 PM

I believe the following issues still need to be addressed (forgive me if they already have and I just didn't understand it):

1) node ids: the description above explains how to generate key ids, but it is not clear to me how to generate and manage the node ids that will contain these key ids. Specifically:

1a) in kademlia dht, it is usually expected that new nodes will have one random node id, and that symbols with similar enough ids will be stored in those nodes. However, since the symbol ids are hashes, they will be effectively random as well, meaning that they will be spread around random nodes, and I cannot specify my preferred nodes. This makes it difficult for me to provide my authoritative source(s) of 'kennings' db (which I know is up-to-date etc), or to do geographical redundancy of the db, since the k nodes randomly chosen to store my kennings could by chance be all in Egypt and go down at once. A solution might be allowing a server to have many node ids, which I could made similar enough to all my symbol ids so as to store all of them in the same server.

1b) It seems like it might be possible for anyone to join the dht with thousands of malicious node ids (or a suitable % of number of the dht nodes, whatever is larger) that will send any kennings they receive to /dev/null, effectively creating black holes in the dht, since nodes in kademlia are supposed to keep only up to k node id refs, where k is a small number ~20, and all k refs could be pointing to black hole nodes.

2) abuse of storage: it seems like I could start pumping an unlimited number of kennings into this dht, containing sectors of my file system, and start abusing the storage of others to store my personal data. There's this 30-day limit for the kennings, which alleviate this problem somehow, but apart from small technical details of detecting if the timestamp in the add message is 100 years in the future etc (so that it won't be deleted for 100 years), 30 days is enough time to allow refreshing a reasonable amount of data in the current scheme, probably a few TBs or more depending on the bandwidth available to whoever is pumping the data. If everybody starts pumping that much data, there won't be enough space in all the dht nodes. This could also flush out the real data out of each dht node, which would only have a limited storage space and would only be able to keep the last TB or so of (garbage?) data.