The Invisible Internet Project (I2P) is an encrypted private network layer that allows people to connect anonymously. Unlike Tor (if you are not familiar with it, I recommend looking at this article), I2P isn't made to access the internet (although it is possible to make it happen with outproxies) but to access various types of services within the network, such as anonymous websites (I2P site, or eepSite), torrents, IRCs, ... I2P was released in 2003 and is now in version 1.5.
Some part of I2P's implementation is a bit similar to Tor (the onion encryption concept), but it also has some extra characteristics (mostly the garlic routing and encryption). The official documentation is a bit difficult to apprehend, so this post aims to give you a good oversight of how all the parts of the system are working.
How I2P Works
High-Level Overview and Tunnels
The first term that we need to know is "router". Basically, it refers to any client that is running I2P. Each router has inbound and outbound tunnels that are data pipelines allowing to receive and send data. Incoming and outcoming data are separated to allow better anonymity and performance.
The following Figure 1 shows how exchanges are carried on between different users. Note that this is a simplified version with some tunnels omitted.
We can see that Alice and Bob are communicated together. The data send to Bob is going through Alice's outbound tunnels and then to Charlie's inbound tunnels, while data received from Bob is going through Bob's outbound tunnels and then Alice's inbound tunnels.
A Deeper Dive Into the Tunnels and Encryption
After reading the previous part, we have a high-level overview of how the system works, but it doesn't answer how clients' data is secret. This part is there to answer this.
The main part of the answer is the tunnels. Similarly to Tor, two clients communicating together are separated by multiple routers (hops). Usually 2 or 3 (as in Figure 2), but it could be set to as much as 7, or as little as 0.
Figure 2 shows what would happen if Alice sends a message to Bob. Before explaining the interactions, we need a bit of names definition:
ais the Outbound Gateway. Technically, this is Alice's router
care the Outbound Tunnel Participants. There can be one or multiple of them, and they are just there to transfer the message to the next node
dis the Outbound Endpoint. This is the end of the Outbound tunnel (belonging to Alice), and it is tasked with transmitting the message to the Inbound tunnel (belonging to Bob)
eis the Inbound Gateway. If a client wants people to be able to contact it, the Inbound Gateway address will be published into the network database (more on that later)
gare Inbound Tunnel Participants. They are the same as
c. Tunnel participants never know if they are part of an inbound or outbound tunnel, and are just tasked with receiving messages and sending them to the next hop
his the Inbound Endpoint. Technically, Bob's router
When sending a message, this is what will happen at the encryption level:
awill split the message into smaller 1.024 bytes messages. All the messages going into the pipeline have a fixed size to prevent various attacks
awill encrypt every 1,024 bytes messages for
h, so that only Alice and Bob can know its content
awill encrypt the result obtained in (1) for
d, then it will encrypt the result for
c, then it will encrypt the result for
b(and basically do that as many times as there are participants). This is similar to what is done in Tor with the onion encryption
bwill receive the tunnel messages, decrypt them, and forward them to
cwill receive the tunnel messages, decrypt them, and forward them to
dwill receive the 1,024 bytes messages, reassemble them to recover the initial data that was split in step (1), and transmit this to the Inbound Gateway
ewill receive the message, fragment it into 1,024 bytes tunnel messages, and send each of them to
fwill receive the tunnel messages, encrypt them, and forward them to
gwill receive the tunnel message, encrypt them, and forward them to
hwill decrypt the messages encrypted by
g, decrypt the result encrypted by
f, decrypt the result encrypted by
e, and decrypt the result encrypted by
a. Then it assembles everything to recover the big plaintext message that we had in step (1)
a will use the
decrypt function to encrypt the messages, and
encrypt function to decrypt them. Since
g also use the
encrypt function (to encrypt the messages this time), the participants can't know if they are part of an Inbound or Outbound tunnel. Also, to ensure that the tunnel messages are always 1,024 bytes larges, the various processes can use some padding.
Tunnels Establishment and Database
If you came so far, you now have a general idea of how I2P works, how the data stays confidential, and what the tunnels are doing. Two things you don't know though are how the tunnels are created, and how clients are anonymous (if you are familiar with Tor, you might have understood the second part of the question already).
One thing that is critical in the system is the ability to contact other routers to create the routes, but also to know how to access services. Tor would have a central point where clients can inquire about the nodes' information. I2P also has a central point to get routers' information, but this is not an exhaustive database. Instead, it will just be used to provide a couple of routers' addresses, so that the client can bootstrap the network map. The routers database itself is a distributed database named netDb. The netDb is distributed with a technique called "floodfill", and each router participating in it is called a "floodfill router". The database contains two entities: RouterInfos and LeaseSets.
Each router participating in the network has a database entry named RouterInfo. It contains the following:
- The router identity (an encryption key, signing key, and a certificate)
- The contact address to reach the router
- When it was published
- Various text options to share the router's capabilities and such
- A signature of all the previous fields, created by the router with the signing key
To be able to send a message to a client, each of its inbound tunnels has a LeaseSet in the database. It will provide the following information:
- The tunnel gateway router (which is defined in the RouterInfo entities)
- The tunnel ID (4-byte number)
- When the tunnel expires (by default every 10 minutes)
- An encryption key, signing key, and certificate for the destination
- A signature of the LeaseSet data
Note that it is possible for a LeaseSet to not be public (for example when a client is using IRC), and to just have I2P share the LeaseSet information with the relevant parties.
One type of LeaseSets is particularly interesting to us: the encrypted LeaseSets. As their name suggests, they only allow clients with the correct key to get the LeaseSet information, and therefore to be able to contact the Inbound tunnel. Note that when a router records a LeaseSet into the database, it will do so using its outbound tunnel to be able to stay anonymous.
When a router wants to create a tunnel, it will find routers from the RouteInfo entries of the database, and select them to have an optimized route with the help of various heuristics. It will then need to contact the routers to ask them for permission to set up a tunnel.
Each of the potential tunnel members will receive a tunnel ID and encryption keys that it will need to use to encrypt the data going through, and also to reply to the build request. In addition:
- the intermediate tunnel participants and inbound gateway will receive the next-hop information (router address and tunnel ID)
- the outbound endpoint will receive the information of the inbound gateway we want to send information to
When all the participants agreed to be part of a tunnel, the initiating router can start using it to send or receive data. Participants can have multiple tunnels open, so the tunnel ID that they are provided at the beginning allows them to know which tunnel a packet they receive belongs to, and therefore, what is the next hop for this message. Since the routers can't only see the previous and next destination of a message, they can't know who is sending the message, and where it is going, therefore allowing anonymity.
Sources and Extra Reading
- Effects of Shared Bandwidth on Anonymity of the I2P Network Users
- Privacy-Implications of Performance-Based Peer Selection by Onion-Routers: A Real-World Case Study using I2P