On Fri, Oct 16, 2020 at 00:27, Vladimir Oltean <vladimir.olt...@nxp.com> wrote: > Currently any DSA switch that implements the multicast ops (properly, > that is) gets these errors after just sitting for a while, with at least > 2 ports bridged: > > [ 286.013814] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object > (id=3) > > The reason has to do with this piece of code: > > netdev_for_each_lower_dev(dev, lower_dev, iter) > br_mdb_switchdev_host_port(dev, lower_dev, mp, type); > > called from: > > br_multicast_group_expired > -> br_multicast_host_leave > -> br_mdb_notify > -> br_mdb_switchdev_host > > Basically, that code is correct. It tells each switchdev port that the > host can leave that multicast group. But in the case of DSA, all user > ports are connected to the host through the same pipe. So, because DSA > translates a host MDB to a normal MDB on the CPU port, this means that > when all user ports leave a multicast group, DSA tries to remove it N > times from the CPU port. > > We should be reference-counting these addresses.
That sounds like a good idea. We have run into another issue with the MDB that maybe could be worked into this changeset. This is what we have observed on 4.19, but from looking at the source it does not look like anything has changed with respect to this issue. The DSA driver handles the addition/removal of router ports by enabling/disabling multicast flooding to the port in question. On mv88e6xxx at least, this is only part of the solution. It only takes care of the unregistered multicast. You also have to iterate through all _registered_ groups and add the port to the destination vector. If you do that the naïve way, you run in to a secondary problem. Without any caching middle layer, you now have a single bit in the vector that can be set either because the port is a router port, or because someone has joined the group, _OR_ both (if the querier moves due to a topology change for example). So you need a cache of the joined ports for each group, and the router port vector. That way you can make sure to only clear the bit if neither the cached group entry nor the router port vector has the bit set. If you think it could be useful I could try to rebase our internal solution on net-next and post it as an RFC. Maybe there are at least parts that can be used from it.