(Cc Thomas and David) On Tue, Dec 06, 2016 at 03:47:20PM +0800, Brad Campbell wrote: > On 06/12/16 01:53, Guillaume Nault wrote: > > > > > Probably not a mistake on your side. I've started looking at netcf' > > source code, but haven't found anything that could explain your issue. > > It'd really help if you could provide steps to reproduce the bug. > > Further to my message this morning, I started with a clean linux.git > 4.9.0-rc7-00198-g0cb65c8 and did two runs. One untouched and one with the > identified patch reverted. I logged both of these with NLCB=debug, then > split out the ppp section and diffed them. > > It appears the only difference of note is the new ATTR 18. I did a diff of > the entire dump for both and nothing else popped out. > Thanks for the detailed report. Things are getting clear now.
> > brad@test:~$ diff -u ppp-ok ppp-fail > --- ppp-ok 2016-12-06 13:32:04.358393578 +0800 > +++ ppp-fail 2016-12-06 13:32:18.577864406 +0800 > @@ -1,10 +1,10 @@ > -------------------------- BEGIN NETLINK MESSAGE > --------------------------- > [HEADER] 16 octets > - .nlmsg_len = 628 > + .nlmsg_len = 644 > .nlmsg_type = 16 <route/link::new> > .nlmsg_flags = 2 <MULTI> > - .nlmsg_seq = 1481001940 > - .nlmsg_pid = 7462 > + .nlmsg_seq = 1481002252 > + .nlmsg_pid = 7376 > [PAYLOAD] 16 octets > 00 00 00 02 0a 00 00 00 d1 10 01 00 00 00 00 00 ................ > [ATTR 03] 5 octets > @@ -71,6 +71,8 @@ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > .................. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > .................. > 00 00 00 00 00 00 ...... > + [ATTR 18] 12 octets > + 08 00 01 00 70 70 70 00 04 00 02 00 ....ppp..... > [ATTR 26] 132 octets > 84 00 02 00 80 00 01 00 01 00 00 00 00 00 00 00 00 00 > .................. > 00 00 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 > .................. > @@ -81,3 +83,4 @@ > 00 00 00 00 10 27 00 00 e8 03 00 00 00 00 00 00 00 00 > .....'............ > 00 00 00 00 00 00 ...... > --------------------------- END NETLINK MESSAGE > --------------------------- > 'ATTR 18' is the IFLA_LINKINFO attribute. It contains two sub-attributes: * IFLA_INFO_KIND ('08 00 01 00 70 70 70 00'), containing the "ppp" string, * and IFLA_INFO_DATA ('04 00 02 00') which has no payload because, currently, ppp has no device specific data to return. > Running with NLDBG=4 seems to generate this : > DBG<2>: While picking up for 0x26d2e00 <route/link>, recvmsgs() returned > -34: (errno = Numerical result out of range)DBG<1>: Clearing cache > 0x26d2e00 <route/link>... > libnl1 rejects the IFLA_INFO_DATA attribute because it expects it to contain a sub-attribute. Since the payload size is zero it doesn't match the policy and parsing fails. There's no problem with libnl3 because its policy accepts empty payloads for NLA_NESTED attributes (see libnl3 commit 4be02ace4826 "Be liberal when receiving an empty nested attribute"). I think empty nested attributes make perfect sense. At least we accept them from user space since commit ea5693ccc553 ("netlink: allow empty nested attributes"), so it should be fine to generate some from the kernel. OTOH, since some user space programs broke because of this, it might be better to always add attributes in the .fill_info() callbacks, to work around libnl1's policy. David, Thomas, do you have any opinion on this?