Tue, Aug 06, 2019 at 07:34:59PM CEST, dsah...@gmail.com wrote: >On 8/5/19 9:20 AM, Jiri Pirko wrote: >> Mon, Aug 05, 2019 at 04:51:22PM CEST, dsah...@gmail.com wrote: >>> On 8/5/19 8:49 AM, Jiri Pirko wrote: >>>>> Your commit 5fc494225c1eb81309cc4c91f183cd30e4edb674 changed that from a >>>>> per-namepace accounting to all namespaces managed by a single devlink >>>>> instance in init_net - which is completely wrong. >>>> No. Not "all namespaces". Only the one where the devlink is. And that is >>>> always init_net, until this patchset. >>>> >>>> >>> >>> Jiri: your change to fib.c does not take into account namespace when >>> doing rules and routes accounting. you broke it. fix it. >> >> What do you mean by "account namespace"? It's a device resource, why to >> tight it with namespace? What if you have 2 netdevsim-devlink instances >> in one namespace? Why the setting should be per-namespace? >> > >Jiri: > >Here's an example of how your 5.2 change to netdevsim broke the resource >controller: > >Create a netdevsim device: >$ modprobe netdevsim >$ echo "0 1" > /sys/bus/netdevsim/new_device > >Get the current number of IPv4 routes: >$ n=$(ip -4 ro ls table all | wc -l) >$ echo $n >13 > >Prevent any more from being added. This limit should apply solely to >this namespace, init_net: > >$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size $n >$ devlink dev reload netdevsim/netdevsim0 >Error: netdevsim: New size is less than current occupancy. >devlink answers: Invalid argument > >So that is the first breakage: accounting is off - maybe. Note there are >no other visible namespaces, but who knows what systemd or other >processes are doing with namespaces. Perhaps this accounting is another >example of your changes not properly handling namespaces: > >$ devlink resource show netdevsim/netdevsim0 >netdevsim/netdevsim0: > name IPv4 size unlimited unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > resources: > name fib size 13 occ 17 unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > name fib-rules size unlimited occ 6 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > name IPv6 size unlimited unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > resources: > name fib size unlimited occ 10 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > name fib-rules size unlimited occ 4 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > >So the occupancy does not match the tables for init_net. > >Reset the max to 17, the current occupancy: >$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 17 >$ devlink dev reload netdevsim/netdevsim0 >$ devlink resource show netdevsim/netdevsim0 >netdevsim/netdevsim0: > name IPv4 size unlimited unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > resources: > name fib size 17 occ 17 unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > name fib-rules size unlimited occ 6 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > name IPv6 size unlimited unit entry size_min 0 size_max unlimited >size_gran 1 dpipe_tables none > resources: > name fib size unlimited occ 10 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > name fib-rules size unlimited occ 4 unit entry size_min 0 size_max >unlimited size_gran 1 dpipe_tables none > >Create a new namespace, bring up lo which attempts to add more route >entries: >$ unshare -n >$ ip li set lo up > >If you list routes you see the lo routes failed to installed because of >the limits, but it is a silent failure. Try to add a new route and you >see the cross namespace accounting now: >$ ip ro add 192.168.1.0/24 dev lo >Error: netdevsim: Exceeded number of supported fib entries. > > >Contrast that behavior with 5.1 and you see the new namespaces have no >bearing on accounting in init_net and limits in init_net do not affect >other namespaces. > >That behavior needs to be restored in 5.2 and 5.3.
Let's figure out the devlink-controlling-kernel-resources thread first. What you describe here is exactly that.