Re: General denial question (tarpitting)

James Craig Burley Sat, 27 Mar 2004 22:48:18 -0800

>On Saturday 27 March 2004 06:41, Andrew Pam wrote:
>> On Sat, Mar 27, 2004 at 04:20:44AM -0000, James Craig Burley wrote:
>> > >No. That's one less port he can use to connect to you (on any given
>> > >destination port). He can still use the same source port to connect to
>> > >others. TCP connections are four-tuples.
>> >
>> > Should I not trust O'Reilly's "TCP/IP Network Administration", by
>> > Craig Hunt, Second Edition, page 46, where it says, among other things
>> > consistent with this,
>> >
>> >   It is the pair of port numbers, source and destination, that
>> >   uniquely identifies each network connection.
>> >
>> > or do you think it is just simplifying things for the benefit of its
>> > audience?
>>
>> Read that again.  The pair of port numbers, source and destination
>> (at the TCP layer)... plus the pair of IP addresses (at the IP layer).
>
>I think there's a mixup between what you two are talking about - on a given 
>_network_ then yes, you need the 4-tuple to identify a single connection 
>(that is port X on IP address Y talking to port A on IP address B is not the 
>same connection as port X on IP address B talking to port Y on IP address A 
>etc) - but that doesn't mean that a TCP stack for _client_ sockets re-uses 
>socket numbers for different connections - each client socket on a given host 
>(meaning a network interface) has a number unique to that host:
>
>Stevens, vol 1, page 42 "[...] the client just needs to be certain that the 
>ephemeral port is unique on the client host. The TCP and UDP codes guarantee 
>this uniqueness."
>page 43: "The _socket_pair_ for a TCP connection is the 4-tuple that defines 
>the two endpoints of the connection [...]. A socket pair uniquely identifies 
>every TCP connection on an internet. The two values that identify each 
>endpoint, an IP address and a port number, are often called a socket."
>page 45 (discussing a client connecting twice to the same server socket: "The 
>TCP code on the client host assigns the new socket an unused ephemeral port 
>number [...]."


Yes, that's exactly what I was thinking would be the case.

>[The O'Reilly books are good, but for authoritative TCP/IP always go to 
>Stevens or the source code]

I didn't want to be the one to say it, since I'm not an expert on
TCP/IP, but, generally, I personally appreciate O'Reilly books but
don't look to them as definitive sources for resolving tricky
questions like this.

>In particular, a client can open two different sockets to the listening socket 
>on a single host (eg web browser opens 2 connections to port 80 on the same 
>host), and so the 4-tuple shows 2 connections differentiated only by the 
>client socket number, but the _client_ socket numbers are always unique even 
>when speaking to different hosts. The stack for the server socket (the one 
>that called listen()) de-multiplexes input packets based on the client 
>IP+port and hands it off to the appropriate file descriptor returned by 
>accept(), but there is no de-multiplexing on the client side - and I'm pretty 
>sure it would break a huge amount of existing code if anyone wrote a TCP 
>stack that did so (which I suppose it could do, given that the client socket 
>number isn't actually assigned until the connect() call returns).

That (connect() behavior) struck me as rather interesting when I
looked it up earlier.  I don't know how non-Unix TCP/IP stacks were
implemented (in terms of ABI's) in the past, so I didn't want to
narrow the issue down to Unix systems.

>Anyway, bringing it all vaguely back on topic, the point of the tarpit is not 
>so much as to tie up the resources for an inactive socket, but to chew up the 
>limited upstream bandwidth of trojan DSPAM clients on asymmetric connections, 
>and where those trojans have been written simply to stay small and 
>unobtrusive (to evade discovery) then it's quite likely that they will limit 
>the number of sockets they use simultaneously.

I would tend to think so.

However, as spamware "evolves" to respond to measures such as
tarpitting, since it doesn't have the same issues of compatibility
with the installed base that most of us have (pertaining to your
comment that "it would break a huge amount of existing code if anyone
wrote a TCP stack [that multiplexed client-side, or dynamic, port
numbers]", it's important to keep in mind that it probably *is*
feasible for a very-low-overhead TCP/IP stack to be implemented on
behalf of spamware that multiplexes port numbers and keeps the related
overhead low enough to make server-side tarpitting less of a problem,
even on low-upstream-bandwidth (e.g. zombie) machines.

Compatibility issues aside, there are two technical disadvantages to
multiplexing dynamic port numbers I can think of.

One, my impression is that there'd be a loss in reliability.
Transmission errors are not eliminated by design at the lowest layers
of TCP/IP; rather, they're handled more on an end-to-end-principle
basis.  That is, even applications using only TCP-protocol connections
are expected to ignore, rather than NAK (in the pertinent protocol
layer), incoming packets or transmissions that fail to match certain
criteria -- though, in practice, this affects only protocols such as
FTP that might use more than one TCP connection for a single "session"
on behalf of a user, since single-connection-per-session applications
already have their network-related reliability issues largely
addressed by the lower layers in the stack (DNS lookups being a
separate issue here).

So a simple reliability check for a transmission that involves a
4-tuple keying the data is, in this case, on the client's end for an
incoming packet, to use the incoming (dynamic) port number and
corresponding (destination) IP address as the primary key to identify
the TCP connection in question, then to double-check the source
(server-side) IP address and port number for that connection against
the information supplied in the packet, and simply drop the packet if
it doesn't all match up.

The more connections pertaining to a given dynamic port number, the
greater the likelihood that a badly-formed packet identifying one
connection will be misidentified as another, or that a badly-formed
packet identifying the wrong dynamic port number will be interpreted
as part of another TCP stream.  (Though there are sequence numbers in
TCP packets to ensure the proper ordering for packets arriving out of
sequence, I don't know enough about TCP/IP to be certain they'd be
sufficient to easily drop all such malformed packets, especially in
the context of spamware that is trying to not be easily detectable as
such by, e.g., using a distinctive pattern of sequence numbers, yet
that is intended to open up bazillion fairly identical connections, in
terms of outgoing and, to some extent, incoming packet profile.)

Two, a system that multiplexes dynamic port numbers has a small amount
of additional overhead to process each incoming packet to such a port.

Given these two disadvantages, should any system (say, spamware)
employing multiplexing be widely-enough deployed such that it poses a
challenge to the utility of tarpitting, such that it's worth being
prepared to deploy new countermeasures, I can think of one fairly
simple countermeasure (not as easy to implement as tarpitting, but not
all that hard) that just might work.

-- 
James Craig Burley
Software Craftsperson
<http://www.jcb-sc.com>

Re: General denial question (tarpitting)

Reply via email to