David S. Miller wrote:
From: Mark Butler <[EMAIL PROTECTED]>
Date: Fri, 24 Mar 2006 22:37:26 -0700
On a more general note, I find the idea that a current dst entry doesn't
actually reflect the interface (even a logical interface) and nexthop
that will be used to deliver a packet a little disturbing. It would
seem to me that any filter that is going to re-route a packet to a
different address or a different interface should be a logical device
(with its own IP address) or logical interface, respectively.
Otherwise what is going on is completely invisible to the transport
protocol, as well as users of tools like traceroute.
Welcome to firewalls and NAT.
A true firewall should never need to do anything but drop packets and
reset connections. Changes to the way packets are routed should be done
at the routing layer, using the flow information from the transport
layer. Simple firewall rules should be implemented the same way. By
the time a dst entry is returned, the need for NF output chain
processing should be minimal to non-existent.
Serialized processing of every IP packet, whether it needs it or not is
ridiculously inefficient. No high capacity router would operate that
way. A route decision for a flow would be made once, and data in most
flows would use a fast (generally hardware) path without further
consideration.
Of course NAT processing only needs to be done on the NF forward chain,
not the input or output chains. No need to affect local transport
protocols at all. The need for any kind of NF processing should be
reflected in the routing tables, and echoed in the dst entry (or dst
entry stack).
There has been discussion of Van Jacobson style optimization of the
input chain. Well the quickest way to optimize the output chain would be
to return filtered routing information to the transport layer so that a
transport protocol could run its own output processing. For example,
why should IPSEC encryption be delayed to the moment of transmission?
Why should a re-transmitted packet be re-encrypted? Performance would
be improved significantly if a transport protocol could arrange for
IPSEC transformations to be done in advance, so that when a congestion
window opening ACK arrived, data could be transmitted without further
delay. Same deal for retransmissions. IPSEC encryption would then
generally occur in the process context of the sender, rather than
softirq context at the last possible moment.
Same thing for Neighbor discovery delays and IP fragmentation. Instead
of holding a packet somewhere in the IP layer waiting for an ARP reply,
the transport driver should just get an appropriate notification. Then
it could (for example) bundle additional data into the same packet in
the meantime.
Transports could easily hold IP fragments for further processing as
well. Some of them (notably DCCP) can profitably make use of IP
datagrams with missing segments. Other transports could use the
information to make better determinations about congestion and packet
loss. In any case IP segmentation and reassembly at the transport layer
would be more efficient and would be a straight forward extension of
what is already present for anything more sophisticated than UDP.
You don't know anything until the packet is examined by the filter,
because it's impossible to know what rule would be matched until the
packet is actually built, since the rule matching is on packet
contents (such as the source and destination IP addresses, and source
and destination ports, but more obscure mathing is also possible, like
matching by TOS or other IP header flags).
The flowi structure already contains all that information for routing
purposes. No reason why it could not be used to do early netfilter
reduction as well. Right?
- Mark B.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html