On Mon, Jun 29, 2020 at 4:34 PM Tom Herbert <t...@herbertland.com> wrote: > > On Mon, Jun 29, 2020 at 12:23 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > From: Willem de Bruijn <will...@google.com> > > > > ICMP messages may include an extension structure after the original > > datagram. RFC 4884 standardized this behavior. > > > > It introduces an explicit original datagram length field in the ICMP > > header to delineate the original datagram from the extension struct. > > > > Return this field when reading an ICMP error from the error queue. > > > > ICMPv6 by default already returns the entire 32-bit part of the header > > that includes this field by default. For consistency, do the exact > > same for ICMP. So far it only returns mtu on ICMP_FRAG_NEEDED and gw > > on ICMP_PARAMETERPROB. > > > > For backwards compatibility, make this optional, set by setsockopt > > SOL_IP/IP_RECVERR_RFC4884. For API documentation and feature test, see > > https://github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp.c > > > > Alternative implementation to reading from the skb in ip_icmp_error > > is to pass the field from icmp_unreach, again analogous to ICMPv6. But > > this would require changes to every $proto_err() callback, which for > > ICMP_FRAG_NEEDED pass the u32 info arg to a pmtu update function. > > > > Signed-off-by: Willem de Bruijn <will...@google.com> > > --- > > include/net/inet_sock.h | 1 + > > include/uapi/linux/in.h | 1 + > > net/ipv4/ip_sockglue.c | 12 ++++++++++++ > > 3 files changed, 14 insertions(+) > > > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h > > index a7ce00af6c44..a3702d1d4875 100644 > > --- a/include/net/inet_sock.h > > +++ b/include/net/inet_sock.h > > @@ -225,6 +225,7 @@ struct inet_sock { > > mc_all:1, > > nodefrag:1; > > __u8 bind_address_no_port:1, > > + recverr_rfc4884:1, > > defer_connect:1; /* Indicates that > > fastopen_connect is set > > * and cookie exists so we > > defer connect > > * until first data frame > > is written > > diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h > > index 8533bf07450f..3d0d8231dc19 100644 > > --- a/include/uapi/linux/in.h > > +++ b/include/uapi/linux/in.h > > @@ -123,6 +123,7 @@ struct in_addr { > > #define IP_CHECKSUM 23 > > #define IP_BIND_ADDRESS_NO_PORT 24 > > #define IP_RECVFRAGSIZE 25 > > +#define IP_RECVERR_RFC4884 26 > > > > /* IP_MTU_DISCOVER values */ > > #define IP_PMTUDISC_DONT 0 /* Never send DF frames */ > > diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c > > index 84ec3703c909..525140e3947c 100644 > > --- a/net/ipv4/ip_sockglue.c > > +++ b/net/ipv4/ip_sockglue.c > > @@ -398,6 +398,9 @@ void ip_icmp_error(struct sock *sk, struct sk_buff > > *skb, int err, > > if (!skb) > > return; > > > > + if (inet_sk(sk)->recverr_rfc4884) > > + info = ntohl(icmp_hdr(skb)->un.gateway); > > + > Willem, > > Doesn't this assume that all received ICMP errors received on the > socket use the extended RFC4884 format once the option is set on the > socket? > > I think what we might need to do this properly is to switch on the > ICMP Type/Code to determine if the format is RFC4884. If it is then, > we can figure out where the appropriate info for the ICMP error is. > Good example of this is draft-ietf-6man-icmp-limits-08 currently on > the RFC editor queue. A code for destination unreachable is added for > "aggregate header limit exceeded". An RFC4884 format is used to > contain an ICMP extension that includes a pointer to the offending > byte (unlike parameter problem, destination unreachable doesn't have > Pointer in ICMP header). So in this case it makes sense that the > kernel returns the Pointer in extended ICMP as the info.
This implementation matches the existing behavior of ICMPv6. You're right that the RFC 4884 length field is relevant only for a selection of ICMP types. For other types most of the bits are reserved. I don't see any issue with exposing those. Anything more fine grained will have to be extended for each additional feature such as the one you mention. I prefer to just make this information available once and for all. It really helped that it was already available on v6, for instance. > Tom > > > > serr = SKB_EXT_ERR(skb); > > serr->ee.ee_errno = err; > > serr->ee.ee_origin = SO_EE_ORIGIN_ICMP; > > @@ -755,6 +758,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, > > case IP_RECVORIGDSTADDR: > > case IP_CHECKSUM: > > case IP_RECVFRAGSIZE: > > + case IP_RECVERR_RFC4884: > > if (optlen >= sizeof(int)) { > > if (get_user(val, (int __user *) optval)) > > return -EFAULT; > > @@ -914,6 +918,11 @@ static int do_ip_setsockopt(struct sock *sk, int level, > > if (!val) > > skb_queue_purge(&sk->sk_error_queue); > > break; > > + case IP_RECVERR_RFC4884: > > + if (val != 0 && val != 1) > > + goto e_inval; > > + inet->recverr_rfc4884 = val; > > + break; > > case IP_MULTICAST_TTL: > > if (sk->sk_type == SOCK_STREAM) > > goto e_inval; > > @@ -1588,6 +1597,9 @@ static int do_ip_getsockopt(struct sock *sk, int > > level, int optname, > > case IP_RECVERR: > > val = inet->recverr; > > break; > > + case IP_RECVERR_RFC4884: > > + val = inet->recverr_rfc4884; > > + break; > > case IP_MULTICAST_TTL: > > val = inet->mc_ttl; > > break; > > -- > > 2.27.0.212.ge8ba1cc988-goog > >