From: Raivis Bucis <[EMAIL PROTECTED]> Date: Thu, 4 Jan 2007 17:47:46 +0200
> I believe I have found a bug in PF_PACKET socket filtering > (introduced in linux-2.6.19). If BPF returns values larger than > 0x80000000u, run_filter in af_packet.c considers that as error > instead of simply accepting packet in its full length. sk_filter > does not have this problem. Indeed, this bug was introduced by this change: commit fda9ef5d679b07c9d9097aaf6ef7f069d794a8f9 Author: Dmitry Mishin <[EMAIL PROTECTED]> Date: Thu Aug 31 15:28:39 2006 -0700 [NET]: Fix sk->sk_filter field access Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg needlock = 0, while socket is not locked at that moment. In order to avoid this and similar issues in the future, use rcu for sk->sk_filter field read protection. Signed-off-by: Dmitry Mishin <[EMAIL PROTECTED]> Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> Signed-off-by: Kirill Korotaev <[EMAIL PROTECTED]> The expected semantics are that sk_run_filter() returns either "0" which means drop the packet, or a positive 32-bit integer which states how many bytes of the packet to retain. If the returned length is larger than the packet, the packet is left at it's full length, unchanged, and accepted. That last case is what was broken by the logic modifications done by Dmitry Mishin's RCU'ification of socket filters. He tries to combine 32-bit signed error code handling with interpretation of the 32-bit unsigned return value from sk_run_filter(). So this whole idea to make run_filter() return signed integers and fail on negative is entirely flawed, it simply cannot work and retain the expected semantics which have been there forever. Drop on zero works, and was what the code did originally, and there was no reason at all to modify the logic here, it was perfect. The callers to run_filter() told it what the expected return value should be if there was no filter hooked up to the socket, and that is the current length of the packet. It simulates the case of there actually being a filter and it saying that the whole packet should be accepted. This is yet another case of someone doing some logic cleanups to make the logic nicer to them, and adding a bug in the process. Here is how I am likely to fix this: diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index da73e8a..594c078 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -428,24 +428,18 @@ out_unlock: } #endif -static inline int run_filter(struct sk_buff *skb, struct sock *sk, - unsigned *snaplen) +static inline unsigned int run_filter(struct sk_buff *skb, struct sock *sk, + unsigned int res) { struct sk_filter *filter; - int err = 0; rcu_read_lock_bh(); filter = rcu_dereference(sk->sk_filter); - if (filter != NULL) { - err = sk_run_filter(skb, filter->insns, filter->len); - if (!err) - err = -EPERM; - else if (*snaplen > err) - *snaplen = err; - } + if (filter != NULL) + res = sk_run_filter(skb, filter->insns, filter->len); rcu_read_unlock_bh(); - return err; + return res; } /* @@ -467,7 +461,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, struct packet struct packet_sock *po; u8 * skb_head = skb->data; int skb_len = skb->len; - unsigned snaplen; + unsigned int snaplen, res; if (skb->pkt_type == PACKET_LOOPBACK) goto drop; @@ -495,8 +489,11 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, struct packet snaplen = skb->len; - if (run_filter(skb, sk, &snaplen) < 0) + res = run_filter(skb, sk, snaplen); + if (!res) goto drop_n_restore; + if (snaplen > res) + snaplen = res; if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >= (unsigned)sk->sk_rcvbuf) @@ -568,7 +565,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, struct packe struct tpacket_hdr *h; u8 * skb_head = skb->data; int skb_len = skb->len; - unsigned snaplen; + unsigned int snaplen, res; unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER; unsigned short macoff, netoff; struct sk_buff *copy_skb = NULL; @@ -592,8 +589,11 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, struct packe snaplen = skb->len; - if (run_filter(skb, sk, &snaplen) < 0) + res = run_filter(skb, sk, snaplen); + if (!res) goto drop_n_restore; + if (snaplen > res) + snaplen = res; if (sk->sk_type == SOCK_DGRAM) { macoff = netoff = TPACKET_ALIGN(TPACKET_HDRLEN) + 16; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html