A world in which IPv6 is well thought out

Translation articles Avery Pennarun, one of Google employees, about why today's Internet is what it is, about the history and preconditions of creation of IPv6, and how would have taken the ideal Protocol IPv6, why it is not so and as possible to this ideal approach.

In November of last year, I first went to a meeting of the IETF. The IETF is an interesting place: it seems that a third of it consists of supporting heavy work, one-third from the expansion of the already created things, and the third from a crazy, distant from the reality of research (at this point, Avery used the phrase "blue sky insanity", formed from blue skies research approx. transl.). I attended mainly because I wanted to see how people react to TCP BBR, which was first introduced. (Answer: for the most part positive, but with distrust. He seemed too good to justify hope.)

Anyway, meeting the IETF includes many presentations on IPv6, which was supposed to replace IPv4, which is the basis of the Internet. (Some would say replacement is already underway; some say it already happened.) In addition to these presentations about IPv6, there are a large number of people who consider it the best, the greatest of all, and they believe that he's finally come (At Any Time), and IPv4 is just a big pile of hacks, which is destined to die so the Internet again was beautiful.

I thought that it might be a good opportunity to try actually figuring out what's going on. Why IPv6 is so convoluted mess compared to IPv4? Was it not better if it was just IPv4 with a larger number of bits in the address? But no, for God's sake, everything is done wrong. So I started asking around and here's what I learned.

Bus destroyed everything

Once upon a time there was a phone network, which used physical switching circuits. In essence, this meant moving the connectors in such a way that your phone was literally connected to very long wires (level 1 OSI). A "leased line" was that the longest wire that you took from the phone company to rent. You put bits on one side to that wire and out the other end they went after a fixed period of time. You did not need addresses, because at each end there was only one car.

Once the phone company it's a bit optimized. Appeared division multiplexing time (TDM) and "virtual channel switch". The phone company could transparently take bits at a low speed of many lines, group them together by using multiplexers and demultiplexers, and keep them out of the phone system, using fewer wires than before. For this to work, it required more work than before, but so far for us modem users, it was still put bits in one end, they come out of the other. Any addresses not necessary.

The Internet (then not yet named) was built on top of these channels. You had a bunch of wires that you can stick bits and catch with the other hand. If one computer two or three network interface, it can, if properly instructed, to forward bits from one line to another, and you can do something much more effective than a single communication line between every pair of computers. And so there was an IP address ("level 3"), subnets and routing. Even then, with these channels point to point, you did not need MAC addresses, because as soon as the package was in the wire, there was only one place where he could go. You needed IP addresses only in order to decide where it needs to go after this.
Meanwhile, alternatively, was invented by a local area network (LAN). If you wanted to connect their computers (or terminals and mainframe), you get uncomfortable as a set of interfaces that were required for each connection in the topology "star". In order to reduce costs on electronics, people needed a network of "bus" (also known as a "broadcast domain", a concept which will be important later), where many stations could simply be connected in one wire, and talking with anyone who is connected to him. It wasn't the same people who built the Internet, so they didn't use IP addresses for this. They invented their own scheme ("level 2").

One of the earliest local area networks of the type "bus" was dear to my heart arcnet (I wrote the first Linux arcnet driver, and poems arcnet in the nineties, after a long time after arcnet was outdated). Address arcnet level 2 was very easy: only 8 bits, set by jumpers or DIP switches on the back of the network card. It was your job as the owner of the network is to configure the address and make sure you have no duplicates, or otherwise can happen any hell. It was slightly painful, but network arcnet was usually quite small, so it was only like pain.

A few years later came Ethernet and solved this problem once and for all, using much more bit (actually 48) to the addresses of the second level. It is quite a bit, so you can assign different from others (sortirovani-serial (here, apparently, meaning that the first three bytes of the MAC address are fixed as a range for a specific manufacturer — approx. transl.)) address to every device that's ever been released, and have no intersections. And that's what they did! So there was an Ethernet MAC address.

Different LAN technologies have come and gone, including one of my favorites, IPX (internetwork (internetwork — approx. transl.) the packet exchange, although it had nothing to do with the "real" Internet), and Netware, which worked great, until all clients and servers in the network from one tire. You never had to configure any address. It was beautiful, and reliable, and workable. Practically, the Golden age of SetState.

Of course, someone had to ruin it: a large network of companies/universities. They wanted to have as many connected computers that the separation of the 10 Mbit/s on a single bus between them all has become a bottleneck, so they need a way to have lots of tires, and connect with each other — "Internett" if you want — these tires together. You're probably thinking, "of course! Use the Internet Protocol (IP) for this", right? Ha ha, no. Internet Protocol, still not named, was not yet old enough and popular at the time, and nobody took him seriously. Netware over IPX (and numerous at that time, other LAN protocols) was serious business, and like any serious business, they invented their own things to expand with the popularity of Ethernet. Devices on Ethernet have had addresses, MAC-addresses, which was probably the only thing using different LAN protocols people could agree, so they decided to use Ethernet addresses as keys for their routing mechanisms. (Actually, instead of "routing" they used to call it bridging and switching.)

A problem associated with Ethernet addresses is that they are assigned sequentially at the factory, so they can't form the hierarchy. This means that the "bridging table" is not as good as modern table IP routing, which may contain a record of the route to the whole subnet at once. To do efficient bridge (bridging), you had to remember what a bus network, each MAC address can be found. And people didn't want to configure each one by hand, so that was to find out on their own. If you had a complicated connection of networks using bridges, it got a little complicated. As I understand it, that's what led to poem about spanning tree, and I'm probably just gonna leave this here. Poetry is very important in network technologies.
Whatever it was, for the most part it worked, although it was a little confusing, and you have here and there happened broadcast "floods", and the routes were not always optimal, and that's all it was almost impossible to debug. (You definitely couldn't write something like traceroute for bridges, because nothing of the tools that were needed to make it work — such as the ability to customize the address to the intermediate bridge — does not exist in a bare Ethernet.)

On the other hand, all these bridges were hardware optimized. Zhelezyachnikami was simply invented the whole system as a mechanism to trick the software that I had no idea about many buses and bridges between them, that he worked in a large network. Hardware bridging means that the bridge can work really fast, as fast as he Ethernet. Now that doesn't sound like something extraordinary, but at that time it was too much. Ethernet was 10Mbps, so you might be able to score it by connecting a few computers, but one computer 10 Mbit/s could not give. In those days it sounded crazy.

In any case, the point is that bridging was a mess, impossible to debug, but he was fast.

the online tyre

While all this was going on, those same Internet users have taken the job and, of course, they missed the emergence of cool cheap LAN technology. I think it could be about a time when the ARPANET was renamed to Internet, although I'm not so sure. Let's say it was, because the story sounds better when you tell it confidently.
At some point, progress has moved from the connection of individual computers to the Internet via distant communication lines point-to-point to a desire to connect the entire LAN together through a connection point-to-point. I would like to have a "long bridges".

You might think: "Hey, no problem, why not build the bridge to long communication lines and be done with it?" Sounds good, but it's not working. I won't go into details, but briefly the problem is congestion control (sorry, for some reason, no Russian translation of this article on the wiki — approx. transl.). Terrible dark secret Ethernet bridging is the assumption that all your connections are approximately at the same speed, and/or severely underused, because they have no braking mechanism. You just spit out data as fast as you can, and expect that they will come. But when your Ethernet operates at 10 Mbps and your connection is point-to-point — at 0.128 Mbps, it is absolutely hopeless. Another problem is that the elucidation of the routes by sending all the channels to see which one is correct — and thus bridging usually work — too costly for slow connections. And suboptimal routing, annoying and in local networks, where low latency and high bandwidth, for slow and expensive long communication channels is quite disgusting. It's just not massturbate.

Fortunately, the Internet (if the Internet is already a name) was working exactly on the same things. If we were able to use the tools of the Internet to connect Ethernet tires together, we'd be in good shape.

And then they developed a "frame" for packets to the Internet via Ethernet (arcnet and at the same time, and all other LAN types).

And here everything went wrong.

First problem we had to solve, was the fact that now, when you put the package in the wire, it became quite clear which car should "hear" and maybe pass on. If a few Internet routers are in the same Ethernet segment, you will not be able to do so, they all took the package and tried to redirect it; this is the way towards a batch of storms and circular routes. No, you need to choose, what the router to the Ethernet bus needs to pick up. We can't just use the IP destination address for this, because we have to wrote down the address of the message recipient, not the address of the router. Instead, we define the desired router, using its MAC address in the Ethernet frame.
Thus, to set up your local IP routing table, you would like to be able to say something like "send the packets to the 10.1.1.1 address through router c MAC 11:22:33:44:55:66." This is the thing that you wish to Express. Important! Assigning your package IP address, but your router's MAC. But if you've ever configured a routing table, you may have noticed that no one so they are not recorded. Instead, you write: "send a packet to 10.1.1.1 via the router on 192.168.1.1".

In fact, it just complicates things. Now your operating system must first find the MAC address for 192.168.1.1, to understand what it 11:22:33:44:55:66, and finally build the package with the destination address of the Ethernet 11:22:33:44:55:66 and destination IP 10.1.1.1. Address 192.168.1.1 nowhere in the package is not specified, it's just an abstraction for people.

To make this a useless intermediate step, you need to add ARP (address resolution Protocol), simple non-IP Protocol to convert IP address to Ethernet address. This is done with a broadcast request to all local Ethernet segment, with the question whether they have this IP address. If you have bridges, they have to forward all ARP packets to all its interfaces, because they are multicast packets, and this is just what does the word "broadcasting" (broadcasting). In a large, busy Ethernet network with multiple connected LAN, redundant broadcast-s become one of your nightmares. Especially bad is the Wi-Fi networks. Over time to deal with this problem, people invented the bridges/switches with special hacks for avoiding forwarding the ARP as long as it is technically possible. Some devices (particularly access points, Wi-Fi) just respond with a fake ARP replies to help. But it's all crutches, though sometimes necessary.

die because of a heritage

As time went on. Once (actually it took quite a time) people have practically ceased to use non-IP protocols to Ethernet. So basically, all online has become a physical wire (level 1), with many stations on the bus (level 2), buses are concatenated using bridges (caught! still level 2), and these inter-bus connected IP marshrutizatory (level 3).

Some time later, people are tired to manually configure IP addresses for the arcnet-style, and wanted to set themselves up independently, in the style of Ethernet, well, except that it was too late to do it in style Ethernet, because a) the device has produced with the Ethernet addresses, not IP, b) IP addresses were only 32-bit, it is not enough just to make them infinitely without intersection, and C) simple sequential assignment of IP addresses instead of using subnets would return us to the beginning: it would be another Ethernet, made from scratch, and we already have Ethernet.

And then there was bootp and DHCP. These protocols, by the way, special like ARP (only they're not trying to be special, technically as IP packets). They need to be special, because the IP node must be able to send them before I get the IP address, which of course is impossible, so he just fills in the IP header is essentially meaningless (although specified in the RFC), so that they can be safely discarded. (You will recognize these meaningless headlines because DHCP needs to open a raw socket and fill them manually; the IP layer in the kernel can't do it.) But no one sought happily to invent another Protocol that was not IP, so they pretended that it's IP, and everyone was happy. Well, as far as possible when you invent DHCP.

I'm a little distracted. A distinctive feature here is that in contrast to these services, IP protocols, bootp and DHCP, you need to know about Ethernet addresses, because, in the end, it's their job to listen to your Ethernet address and assign the IP addresses for further work. In fact it is the appeal of the ARP Protocol, except that we can't do that, because there is already a RARP, which literally is "inverse ARP" reverse ARP(approx. transl.). In fact, RARP worked well and did the same as bootp and DHCP, being much simpler, but not talk about it.
The point of all this is that Ethernet and IP more and more intertwined. They are now virtually inseparable. It's hard to imagine the network interface (ppp0 in addition to) without the 48-bit MAC address, and it is difficult to imagine this interface working without an IP address. You write up your IP routing table using IP addresses, but of course you know that you are lying, calling the router at its IP address; you just indirectly say you want to route via the MAC address. And you have the ARP that runs through the bridges, but fun, and DHCP, which is the IP Protocol, but in fact, Ethernet, etc.

Moreover, we still have bridges (bridging) and routing (routing), and they both become more complex, while the local network and the Internet also become more complex and complicated. Bridging still mainly hardware and defined by IEEE, the people who run the Ethernet standards. Routing is still mainly software and are defined by the IETF, the people who control Internet standards. Both groups are still trying to pretend that no other band. Network operators simply choose bridging vs routing, based on how quickly they want to work and how much they hate configuring DHCP servers, which they really hate very much, which means that they use the bridges as possible and routing — when they have to.

In fact, bridges is so out of control that people have decided to make decisions at the level of the bridge as a whole to a higher level (of course, the configuration communication between bridges is made using Protocol on top of IP!), to be able to centrally manage them. This is called software-defined networking (SDN). This is much better compared to when switches and bridges are allowed to do what they want, but it's also fundamentally stupid, because I know what is software-defined networking? IP. He literally is, and always was SDN that you use to connect networks that have become too large. But the problem is that IPv4 was initially too difficult to accelerate in hardware, and in any case, he's not got hardware acceleration, and configuring DHCP hell, so that network operators simply have learned how to connect the bridges larger and larger entity. And now large data centers simply based on SDN, and you could not use the IP in the data center generally with the same success because no one mersrutiyet packages. It's all just one big network of "bus".

It is, in short, a mess.

Now forget that I have told this to...

A good story, right? Good. Now pretend none of this happened, and we returned back to the 1990s, when most anything actually happened, but the people in the IETF still pretended that this was not "impending" catastrophe can be avoided. It's a good part!

I forgot to mention in the long story above: somewhere in this chain of events we stopped to use the bus network. Ethernet really never tire. He only pretends to be a bus. Simply put, we could not get to work is known CSMA/CD the growth speed, so we went back to the good old topology "star". We are a pack of cables from the switch so that we can stretch one cable from each station to the center. The walls, ceiling and floors are filled with large, thick and expensive bundles of Ethernet cables because we were unable to figure out how to make the tire work well... on level 1. It's somewhat amusing if you think about it. Of course, if you find it funny, sad things.

In fact, in an attack of madness, even Wi-Fi — limiting case of "bus" network — right! — where literally everyone shares the same open space environment, we use Wi-Fi almost everywhere in a mode called "infrastructure", which emulates the topology of a giant star. If you have two WiFi stations connected to one access point, they do not communicate with each other directly, even when you can "hear" each other. They send a packet of the access point, but is addressed to the MAC address of the other node. The access point then reflects it toward the destination host.
HOLD YOUR HORSES LET ME FOR YOU TO EXPLAIN IT. There is one catch. When a node X wants to send something to node Z through an IP router Y via Wi-Fi access point A, what kind of package? Draw a picture of what we want:

X -> [Wi-Fi] -> A -> [Wi-Fi] -> Y -> [internet] -> Z

Z is the IP address of the destination, so obviously the IP destination field must be Y Z. router, which as we learned above, specify its Ethernet MAC address in the Ethernet destination field. But in Wi-Fi, X cannot simply send a packet to Y, for various reasons (including the fact that they don't know the encryption keys WPA2 each other). We need to send in A. You may ask, where do we put the address of A?

Not a problem! 802.11 is such a thing as three-address mode. They added third the Ethernet MAC address in each frame to be able to talk about the present Ethernet Ethernet destination and intermediate destination. On top of this, there is also a bit field called "to-AP" and "from-AP", which is telling you that the package goes from the station to the access point or from access point to station, respectively. But in fact, they can both be true, because Wi-Fi repeaters (AP sends packets to the AP).

Speaking of repeaters! If A repeater, send back to the base station B he needs on the road, which looks like this:

X -> [Wi-Fi] -> A -> [wifi repeater] -> B -> [Wi-Fi] -> Y -> [internet] -> Z

X->uses A three-address mode, but A- > B problem: Ethernet source — X, and Ethernet destination — Y, but the package is forwarded by air from A to B; X and Y do not involved. Suffice it to say that there is such a thing as chetyrehyarusny mode, and it works exactly as you might think.

(In mesh-networks 802.11 s there is a mode called shestigrannym, and about this point I gave up trying to understand.)

Avery, I promised IPv6, and you have not even mentioned it

Oh-Oh. This post is a bit off the rails, don't you think?
That's the whole point of this story. People in the IETF when IPv6 was invented, look at all this mess and may have predicted more confusion, which was supposed to be, although I doubt they could predict the SDN and repeater modes Wi-Fi — and said, wait a minute, wait a minute. We don't need this shit! What if, instead, the world around us would work like this:
the

No more physical bus network (already done!)
No more inter networks of level 2 (this is the third level)
No more broadcasts (level two will always be point to point, so where would you send stream? Replace mnogogrannyj newsletters — multicast)
are No more MAC addresses (networks point-to-point obvious who the sender and who the receiver, and multicast you can do and IP addresses)
No more ARP and DHCP (no MAC address, so not displaying the IP address on a MAC)
No more problems with the IP header (so that you can hardware to accelerate the routing)
No more shortages of IP addresses (so that we can go back to routing for large subnets)
No more manual configuration of IP addresses, except the kernel (the Internet — approx. transl.) (and we have so many IP addresses that we can recursively distribute the subnet tree from there)

Imagine we would live in such a world: Wi-Fi repeaters would be just the IPv6 router. And access points too. And Ethernet switches. AND SDN. With ARP storms would be over. Each routing problem could be tracerout-ing. Best of all, we could throw 12 bytes (Mac address of source and destination) of each Ethernet packet and 18 bytes (source/destination/access point) from each WiFi package. Of course, IPv6 will give us an additional 24 bytes of addresses (compared to IPv4), but you will emit 12 bytes on Ethernet, so the overhead will be only 12 bytes — comparable with the use of two 64-bit IP addresses, if you leave the Ethernet header. The idea that one day we will be able to throw out the Ethernet address, helped to justify the bloat of the IPv6 addresses.

It would be nice. Except for one problem: it didn't happen.

Requiem for a dream

One colleague at work said it best: "layers of always just added and never disappear."

For all these wonders need to be able to start over and throw away the legacy built by that time. And this, unfortunately, for the most part impossible. Even if IPv6 has reached a penetration of 99%, it would not mean that we get rid of IPv4. And if we didn't get rid of IPv4, we didn't get rid of Ethernet addresses, or Wi-Fi addresses. But if we need to adhere to the standards of frames IEEE 802.3 and 802.11, we will never be able to throw away those bytes. Therefore, we will always need Protocol neighbor discovery "IPv6 neighbour discovery", which is just a more complex ARP. Even though we no longer use tire chains, we always need some semblance of broadcasts, because that's how ARP works. We will need to keep running a local DHCP server at home, so that our legacy IPv4 bulbs continued to work. We still will need a NAT to IPv4 legacy light bulbs were able to get to the Internet.

And that's not the worst. Worst of all is that we still need an infinite abomination in the form of bridging the second level, due to another error that the command for IPv6 forgot to fix. Unfortunately, when they designed IPv6 in the 1990s, the idea was to first start IPv6 is supposed to take several years — and then work on it when both IPv4 and MAC addresses will disappear, then this task would become easier to solve, and at that time still no one really was not truly "mobile IP-devices". That is, what is generally supposed to mean — to carry the laptop and plug it into the Ethernet ports one after the other, while the file is uploaded via FTP? Sounds stupid.

App-killer: mobile IP

Of course, having a couple decades of history behind us, now we know a few examples of laptop — your phone and connect it to the ~~Ethernet ports~~ wireless access points, one after another. We all the time do. And with LTE, it even mostly works! With WiFi it works only sometimes. Not bad, right?

Not really, because the shameful secret of the Internet: this works only for bridging and second level. Internet routing not working with mobility at all. If you move through an IP network, your IP address changes, it breaks all open connections.

Corporate Wi-Fi network deceive you, bringing together the entire LAN on the second level bridge, a giant Central DHCP server always gives you the same IP address regardless of which corporate of the access point you are connected, and then delivers your packages, with a maximum of pereival for a few seconds until the bridge preconfigured. These new-fangled home WiFi system with multiple repeaters/extenders do the same thing. But if you switch from one WiFi network to another while walking down the street — if the public WiFi was in all stores in a row — all bad. Each of them gives you a new IP address every time your IP address changes, all your links break.

LTE is trying even harder. You retain your IP address (usually an IPv6 address in the case of mobile networks), even when moving to kilometers and many cell towers you pass from one to the other. How? Well... they usually just are tunneling your traffic to a Central point where it is all connected by bridge (although using enhanced filtering fairvalue) into one super-huge virtual network of the second level. And your joints continue to live. The cost of large complexity and really discouraging number of additional delays that they really would like to remove, but it is almost impossible.

How to make mobile network work

Footnote 1

it turns out that nothing in this section does not require IPv6. All would work with IPv4 through NAT, even roaming through several NAT-s.

Well, it was a long story, but I still managed to pull it from the people in the IETF. When we got here, to the problems of mobile IP, I could not help but ask. What went wrong? Why can't we make it work?
It turns out that the answer is surprisingly simple. The big disadvantage lies in the fact, as was determined by the famous "Quartet" (source IP, source port, destination IP, destination port). We use this tetrad to identify this session is TCP or UDP; if the package indicated the same four fields, it belongs to this session and we can send it in the socket, which serves the session. But the Quartet covers two levels: the network (third) and transport (fourth). If, instead, we have identified a session using the only data of the fourth level, the mobile clients would work perfectly.

Give a short example. Port 1111 client X 80 communicates with the port Y, so it needs to send the four (X,1111,Y,80). The answer comes from the (Y,80,X,1111), and the kernel delivers it to the socket, which created the first package. When X otpravlyaet still packs the (X,1111,Y,80), Y sends them to the same server socket, etc.

Then X changes the IP address, and receives a name, say Q. Now it starts to send packets with the quadruple (Q,1111,Y,80). Y has no idea what it means and throws it. Meanwhile, if Y send the packet, denoted (Y,80,X,1111), they will get lost because there are no more X ready to receive them.

Imagine now that we would have marked the socket without binding to IP adesam. For this to work, we need a much larger port numbers (which now make up 16 bits). Let's make them, say, 128 or 256 bits, something like a unique hash.

Now X sends a packet to Y with label (uuid,80). Note, the packages themselves still provide information about IP addresses (X,Y), at level 3 — so they are routed to the correct machine. But the kernel does not uses information level 3 making decisions about which socket to send the packet; it just uses the uuid. The destination port (80 in this case) is needed only for starting a new session to determine which service you want to connect, and can then be ignored or not taken into account.
For reverse direction, the core Y caches the fact that packages (uuid) go to IP address X, which is the last address from which the packets came to (uuid).

Now suppose that X changes the address on Q. He still sends packets with tag (uuid,80) on IP address Y, but now these packets appear to come from Q. the Machine Y receives this packet and checks it against the socket associated with (uuid), notice that the packets for this socket now appear to come from Q, and updates the cache. Now the packets in the opposite direction can be sent with tag (uuid) in the direction Q instead of X. Everything works! (Taking into account measures needed to prevent attacks by intruders).

Footnote 2

someone asked how could look like such "measures to prevent attacks on connections". There are various ways to achieve this, but the easiest is to do something like exchange the SYN-ACK-SYNACK, which is done at the start of the TCP. If Y simply trusts the first packet from host Q, then attacking it is too easy to intercept the connection X -- > Y by sending a packet to Y from anywhere from the Internet. (Although a little hard to guess which 256-bit uuid, we need to substitute). But if Y sends back the cookie that Q must receive, process and send back, it will prove that Q is at least a man-in-the-middle, and not just an external attacker (in any case, TCP also guarantees greater). If you use an encrypted Protocol (such as QUIC), the handshake also may be protected by a session key.

There's only one catch: UDP and TCP don't work, and it's too late to update them. Update the UDP and TCP would be a comparable upgrade IPv4 to IPv6; a project that seemed simple back then in the 1990s, but decades later not completed and half (and first half was easy; the remaining part is much harder).

The good news is that we may be able to get around this another violation of the "separation". If we scrap the TCP, he was already old enough — and instead will use the QUIC over UDP, then we can just stop using four UDP as the connection ID. Instead, if a UDP port number equal to the specified value, meaning "layer mobility", we unpack the contents, which may be another package with the right tag UUID, synchronize it with the correct session and deliver these packets to the appropriate socket.
There is even more good news: experimental QUIC Protocol already, at least in theory, has the correct package structure in order to work. It turns out that you need a unique session IDs (keys), if you want to use stateless (stateless) encryption and authentication, as does QUIC. So, perhaps with slight modifications, QUIC could support seamless roaming. What would such a world!

Here all we need to do is to remove all remnants of UDP and TCP from the Internet and then we'd lost the need for bridges of the second level, this time for real, and then we could get rid of broadcasts and MAC addresses, and SDN, and DHCP and everything else.

Then the Internet would be elegant again.

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express