On Sat, Jun 02, 2018 at 09:10:08AM -0700, Florian Fainelli wrote: > On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas > <ilias.apalodi...@linaro.org> wrote: > >Hi Florian, > > > >Thanks for taking time to look into this > > > >On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote: > >> > >> > >> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote: > >> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote: > >> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote: > >> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote: > >> >>>> Device tree is supposed to describe the hardware. Using that > >hardware > >> >>>> in different ways is not something you should describe in DT. > >> >>>> > >> >>> The new switchdev mode is applied with a .config option in the > >kernel. What you > >> >>> see is pre-existing code, so i am not sure if i should change it > >in this > >> >>> patchset. > >> >> > >> >> If you break the code up into a library and two drivers, it > >becomes a > >> >> moot point. > >> > Agree > >> > > >> >> > >> >> But what i don't like here is that the device tree says to do dual > >> >> mac. But you ignore that and do sometime else. I would prefer that > >if > >> >> DT says dual mac, and switchdev is compiled in, the probe fails > >with > >> >> EINVAL. Rather than ignore something, make it clear it is invalid. > >> > The switch has 3 modes of operation as is. > >> > 1. switch mode, to enable that you don't need to add anything on > >> > the DTS and linux registers a single netdev interface. > >> > 2. dual mac mode, this is when you need to add dual_emac; on the > >DTS. > >> > 3. switchdev mode which is controlled by a .config option, since as > >you > >> > pointed out DTS was not made for controlling config options. > >> > > >> > I agree that this is far from beautiful. If the driver remains as > >in though, > >> > i'd prefer either keeping what's there or making "switchdev" a DTS > >option, > >> > following the pre-existing erroneous usage rather than making the > >device > >> > unusable. If we end up returning some error and refuse to > >initialize, users > >> > that remote upgrade their equipment, without taking a good look at > >changelog, > >> > will loose access to their devices with no means of remotely fixing > >that. > >> > >> It seems to me that the mistake here is seeing multiple modes of > >> operations for the cpsw. There are not actually many, there is one > >> usage, and then there is what you can and cannot offload. > >CPSW has in fact 2 modes of operation, different FIFO usage/lookup > >entry(it's > >called ALE in the current driver) by-pass(which is used in dual emac > >for > >example) and other features. Again Grygorii is better suited to answer > >the > >exact differences. > >> The basic > >> premise with switchdev and DSA (which uses switchdev) is that each > >> user-facing port of your switch needs to work as if it were a normal > >> Ethernet NIC, that is what you call dual-MAC I believe. Then, when > >you > >> create a bridge and you enslave those ports into the bridge, you need > >to > >> have forwarding done in hardware between these two ports when the > >> SMAC/DMAC are not for the host/CPU/management interface and you must > >> simultaneously still have the host have the ability to send/receive > >> traffic through the bridge device. > >Yes dual emac does that. But dual emac configures the port facing VLAN > >to the > >CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is > >configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU > >port > >That's exactly what the current RFC does as well, with the addition of > >registering a sw0p0 (i'll explain why later on this mail) > >A little more detail on the issue we are having. On my description > >sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the > >ports > >that have PHYs attached. > > > >When we start in the new switchdev mode all interfaces are added to > >VLAN 0 > >so CPU port + port1 + port2 are all in the same VLAN group. In that > >case sw0p1 > >and sw0p2 are working as you describe. So those 2 interfaces can > >send/receive > >traffic normally which matches the switchdev case. > > > >When we add them on a bridge(let's say br0), VLAN1(or any default > >bridge VLAN) > >is now configured on sw0p1 and sw0p2 but *not* on the CPU port. > >From this point on the whole fuunctionality just collapses. The switch > >will > >work and offload traffic between sw0p1/sw0p2 but the bridge won't be > >able to > >get an ip address (since VLAN1 is not a member of the CPU port and the > >packet > >gets dropped). > >IGMPv2/V3 messages will never reach the br_multicast.c code to trigger > >switchdev and configure the MDBs on the ports. i am prety sure there > >are other > >fail scenarios which i haven't discovered already, but those 2 are the > >most > >basic ones. If we add VLAN1 on the CPU port, everything works as > >intended of > >course. > > > >That's the reason we registered sw0p0 as the CPU port. It can't do any > >"real" > >traffic, but you can configure the CPU port independantly and not be > >forced to > >do an OR on every VLAN add/delete grouping the CPU port with your port > >command. > >The TL;DR version of this is that the switch is working exactly as > >switchdev is > >expecting offloading traffic to the hardware when possible as long as > >the CPU > >port is member of the proper VLANs > > > >Petr's patch solves this for us > >(9c86ce2c1ae337fc10568a12aea812ed03de8319). > >We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and > >decide > >when to add the CPU port or not. > > > >There are still a couple of cases that are not covered though, if we > >don't > >register the CPU port. We cant decide when to forward multicast > >traffic on the CPU port if a join hasn't been sent from br0. > >So let's say you got 2 hosts doing multicast and for whatever reason > >the host > >wants to see that traffic. > >With the CPU port present you can do a > >"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will > >offload > >the traffic to the CPU port and thus the host. If this goes away we are > >forced > >to send a join. > > Thanks for the detailed explanation. Somehow I was under the impression that > cpsw had the ability, through specific DMA descriptor bits to direct traffic > towards one external port or another and conversely, have that information > from the HW when receiving packets. That's one mode of operation when by-passing the ALE if my understanding of the hardware is correct. You can choose not to do that. I am still earning all the details of the ahrdware myself. On Rx though you still need the CPU to participate to receive the packet(and yes the descriptor indicates the port)
> What you describe is exactly the same problem we have in DSA when the switch > advertises DSA_TAG_PROTO_NONE where only VLAN tags could help differentiate > traffic from external ports. At some point there was a discuss of making > DSA_TAG_PROTO_NONE automatically create one VLAN per port but this is a good > source for other problems... I am pretty sure i am not the first one to encounter this kind of problems > > Looking forward to your follow-up patch series! Will do! > > -- > Florian Thanks Ilias