> It adds support for advanced networking-related randomization, in > concrete it adds support for TCP ISNs randomization
Er... did you read the existing Linux TCP ISN generation code? Which is quite thoroughly randomized already? I'm not sure how the OpenBSD code is better in any way. (Notice that it uses the same "half_md4_transform" as Linux; you just added another copy.) Is there a design note on how the design was chosen? I don't wish to be *too* discouraging to someone who's *trying* to help, but could you *please* check a little more carefully in future to make sire it's actually an improvement? I fear there's some ignorance of what the TCP ISN does, why it's chosen the way it is, and what the current Linux algorithm is designed to do. So here's a summary of what's going on. But even as a summary, it's pretty long... First, a little background on the selection of the TCP ISN... TCP is designed to work in an environment where packets are delayed. If a packet is delayed enough, TCP will retransmit it. If one of the copies floats around the Internet for long enough and then arrives long after it is expected, this is a "delayed duplicate". TCP connections are between (host, port, host port) quadruples, and packets that don't match some "current connection" in all four fields will have no effect on the current connection. This is why systems try to avoid re-using source port numbers when making connections to well-known destination ports. However, sometimes the source port number is explicitly specified and must be reused. The problem then arises, how do we avoid having any possible delayed packets from the previous use of this address pair show up during the current connection and confuse the heck out of things by acknowledging data that was never received, or shutting down a connection that's supposed to stay open, or something like that? First of all, protocols assume a maximum packet lifetime in the Internet. The "Maximum Segment Lifetime" was originally specified as 120 seconds, but many implementations optimize this to 60 or 30 seconds. The longest time that a response can be delayed is 2*MSL - one delay for the packet eliciting the response, and another for the response. In truth, there are few really-hard guarantees on how long a packet can be delayed. IP does have a TTL field, and a requirement that a packet's TTL field be decremented for each hop between routers *or each second of delay within a router*, but that latter portion isn't widely implemented. Still, it is an identified design goal, and is pretty reliable in practice. The solution is twofold: First, refuse to accept packets whose acks aren't in the current transmission window. That is, if the last ack I got was for byte 1000, and I have sent 1100 bytes (numbers 0 through 1099), then if the incoming packet's ack isn't somewhere between 1000 and 1100, it's not relevant. If it's 950, it might be an old ack from the current connection (which doesn't include anything interesting), but in any case it can be safely ignored, and should be. The only remaining issue is, how to choose the first sequence number to use in a connection, the Initial Sequence Number (ISN)? If you start every connection at zero, then you have the risk that packets from an old connection between the same endpoints will show up at a bad time, with in-range sequence numbers, and confuse the current connection. So what you do is, start at a sequence number higher than the last one used in the old connection. Then there can't be any confusion. But this requires remembering the last sequence number used on every connection ever. And there are at least 2^48 addresses allowed to connect to each port on the local machine. At 4 bytes per sequence number, that's a Petabyte of storage... Well, first of all, after 2*MSL, you can forget about it and use whatever sequence number you like, because you know that there won't be any old packets floating around to crash the party. But still, it can be quite a burden on a busy web server. And you might crash and lose all your notes. Do you want to have to wait 2*MSL before rebooting? So the TCP designers (I'm not on page 27 of RFC 793, if you want to follow along) specified a time of day based ISN. If you use a clock to generate an ISN which counts up faster than your network connection can send data (and thus crank up its sequence numbers), you can be sure that your ISN is always higher than the last one used by an old connection without having to remember it explicitly. RFC 793 specifies a 250,000 bytes/second counting rate. Most implementations since Ethernet used a 1,000,000 byte/second counting rate, which matches the capabilities of 10base5 and 10base2 quite well, and is easy to get from the gettimeofday() call. Note that there are two risks with this. First, if the connection actually manages to go faster than the ISN clock, the next connection's ISN will be in the middle of the space the earlier connection used. Fortunately, the kind of links where significant routing delay appear are generally slower ones where 1 Mbyte/sec is a not too unreasonable limit. Your gigabit LAN isn't going to be delaying packets by seconds. The second is that a connection will be made and do nothing for 4294 seconds until the ISN clock is about to wrap around and then start doing packets "ahead of" the ISN clock. If it then closes the connection and a new one opens, once again you have sequence number overlap. If you read old networking papers, there were a bunch of proposals for occasional sequence number renegotiation to solve this problem, but in the end people decided to not worry about it, and it hasn't been a problem in practice. Anyway... fast forward out of the peace and love decade and welcome to the modern Internet, with people *trying* to mess up TCP connections. This kind of attack from within was, unfortunately, not one of the scenarios that the initial Internet designers considered, and it's been a bit of a problem since. All this careful worry about packets left over from an old connection randomly showing up and messing things up, when we have people *creating* packets deliberately crafted to mess things up! A whole separate problem. In particular, using the simple timer-based algorithm, I can connect to a server, look at the ISN it offers me, and know that thats the same ISN it's offering to other people connecting at the same time. So I can create packets with a forged source address and approximately-valid sequence numbers and bombard the connection with them, cutting off that server's connection to some third party. Even if I can't see any of the traffic on the connection. So people sat down and did some thinking. How to deal with this? Harder yet, how to deal with this without redesigning TCP from scratch? Well, if a person wants to mess up their *own* connections, we can't stop them. The fundamental problem is that an attacker A can figure out the sequence numbers that machines B and C are using to talk to each other. So we'd like to make the sequence numbers for every connection unique and not related to the sequence numbers used on any other connections. So A can talk to B and A can talk to C and still not be able to figure out the sequence numbers that B and C are using between themselves. Fortunately, it is entirely possible to combine this with the clock-based algorithm and get the best of both worlds! All we need is a random offset, unique for every (address, port, address, port) quadruple, to add to the clock value, and we have all of the clock-based guarantees preserved within every pair of endpoints, but unrelated endpoints have their ISNs at some unpredictable offset relative to each other. And for generating such a random offset, we can use cryptography. Keep a single secret key, and hash together the endpoint addresses, and you can generate a random 32-bit ISN offset. Add that to the current time, and everything is golden. A can connect to B and see and ISN, but would need to do some serious cryptanalysis to figure out what ISN B is using to talk to C. Linux actually adds one refinement. For speed, it uses a very stripped-down cryptographic hash function. To guard against that being broken, it generates a new secret every 5 minutes. So an attacker only has 5 minutes to break it. The cryptographic offset is divided into 2 parts. The high 8 bits are a sequence number, incremented every time the secret is regenerated. The low 24 bits are produced by the hash. So 5 minutes after booting, the secret offset changes from 0x00yyyyyy to 0x01zzzzzz. This is at least +1, and at most +0x1ffffff. On average, the count is going up by 2^24 = 16 million every 300 seconds. Which just adds a bit to the basic "1 million per second" ISN timer. The cost is that the per-connection part of the ISN offset is limited to 24 and not 32 bits, but a cryptanalytic attack is pretty much precluded by the every-5-minutes thing. The rekey time and the number of really-unpredictable bits have to add up to not wrapping the ISN space too fast. (Although the 24 bits could be increased to 28 bits without quite doubling the ISN counting speed. Or 27 bits if you want plenty of margin. Could I suggest that as an improvement?) --- drivers/char/random.c 2004-12-04 09:24:19.000000000 +0000 +++ drivers/char/random.c 2005-01-29 07:20:37.000000000 +0000 @@ -2183,26 +2183,26 @@ #define REKEY_INTERVAL (300*HZ) /* * Bit layout of the tcp sequence numbers (before adding current time): - * bit 24-31: increased after every key exchange - * bit 0-23: hash(source,dest) + * bit 27-31: increased after every key exchange + * bit 0-26: hash(source,dest) * * The implementation is similar to the algorithm described * in the Appendix of RFC 1185, except that * - it uses a 1 MHz clock instead of a 250 kHz clock * - it performs a rekey every 5 minutes, which is equivalent - * to a (source,dest) tulple dependent forward jump of the + * to a (source,dest) tuple dependent forward jump of the * clock by 0..2^(HASH_BITS+1) * - * Thus the average ISN wraparound time is 68 minutes instead of - * 4.55 hours. + * Thus the average ISN wraparound time is 49 minutes instead of + * 4.77 hours. * * SMP cleanup and lock avoidance with poor man's RCU. * Manfred Spraul <[EMAIL PROTECTED]> * */ -#define COUNT_BITS 8 +#define COUNT_BITS 5 #define COUNT_MASK ( (1<<COUNT_BITS)-1) -#define HASH_BITS 24 +#define HASH_BITS 27 #define HASH_MASK ( (1<<HASH_BITS)-1 ) static struct keydata { Anyway, in comparison, the algorithm in your patch (and presumably OpenBSD, although I haven't personally compared it) uses a clock offset generated fresh for each connection. There's a global counter (tcp_rndiss_cnt; I notice you don't have any SMP locking on it) which is incremented every time an ISN is needed. It's rekeyed periodically, and the high bit (tcp_rndiss_msb) of the delta is used like the COUNT_BITS in the Linux algorithm. The ISN is generated as the high sequence bit, then 15 bits of encrypted count (with some homebrew cipher I don't recognize), then a constant zero bit (am I reading that right), then the 15 low-order bits are purely random. It's a slightly different algorithm, but it does a very similar function. The main downsides are that the sequence number can easily go backwards (there's no guarantee that consecutive calls will return increasing numbers since tcp_rndiss_encrypt scrambles the high 15 bits), and that it's not SMP-safe. Two processors could read and use the same tcp_rndiss_cnt value at the same time, or (more likely) both call tcp_rndiss_init at the same time and end up toggling tcp_rndiss_msb twice, thereby destroying the no-rollback property it's trying to achieve. Oh, and the single sequence bit in the offsets means that the TCP ISN will wrap around very fast. Every 10 minutes, or every 60000 TCP connections, whichever comes first. Regarding the first issue, it's possible that the OpenBSD network stack takes care to remember all connections for 2*MSL and continues the sequence number if the endpoints are reused, thereby avoiding a call to ip_randomisn entirely. But the second deserves some attention. The Linux code takes some care to avoid having to lock anything in the common case. The global count makes that difficult. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/