On Tue, Nov 14, 2017 at 8:02 PM, Jakub Kicinski <jakub.kicin...@netronome.com> wrote: > On Tue, 14 Nov 2017 19:04:36 -0800, Alexander Duyck wrote: >> On Tue, Nov 14, 2017 at 3:36 PM, Jakub Kicinski >> <jakub.kicin...@netronome.com> wrote: >> > On Tue, 14 Nov 2017 15:05:08 -0800, Alexander Duyck wrote: >> >> >> We basically need to do some feasability research to see if we can >> >> >> actually meet all the requirements for switchdev on i40e. We have been >> >> >> getting mixed messages where we are given a great many "yes, but" type >> >> >> answers. For i40e we are looking into it but I don't have high >> >> >> confidence in our ability to actually support it in hardare/firmware. >> >> >> If it were as easy as you have been led to believe, we would have done >> >> >> it months ago when we were researching the requirements to support >> >> >> switchdev >> >> > >> >> > wait, Sridhar made seven rounds of his submission (this is the v7 >> >> > pointer [1]) and you >> >> > still don't know if what you were attempting to push upstream can >> >> > work, something is >> >> > weird here, can you clarify? Jeff? >> >> >> >> Not weird so much as stubborn. The patches were being pushed based on >> >> the assumption that the community would accept a NIC generating port >> >> representors that didn't necessarily pass traffic, and then even when >> >> we had them passing traffic the PF still wasn't configured to handle >> >> being the default destination for traffic without any rules >> >> associated, instead VFs would directly send to the outside world. >> > >> > Perhaps the way forward is to lift the requirement on passing traffic, >> > as long as the limitation is clearly expressed to the users. >> >> No, I am not arguing for that because then SwitchDev will fall into >> disarray. If we want to have a strict definition for what is SwitchDev >> and what isn't I am okay with that. It gives us a definition of what >> our hardware needs to do in order to support it and without that we >> are going to get hardware that just bends the rules to claim support >> for it. > > Let me make sure we understand each other. The switchdev SR-IOV mode is > what happens when user requests DEVLINK_ESWITCH_MODE_SWITCHDEV. Are you > saying you are opposed to adding DEVLINK_ESWITCH_MODE_VEPA?
I wouldn't say I am opposed to that idea. We just need to clearly define what MODE_VEPA is. I would say that even in MODE_VEPA we would be passing traffic. The limitation though is that we wouldn't have the same mechanisms in place to route the traffic. The big issue with VEPA is that the traffic is routed to an external entity before it makes a hairpin turn and comes back. As such we don't have the actual origin of the packet to work with other than MAC and VLAN. As far as directing a packet to a specific port the only way we really have of doing that is to direct it to the MAC/VLAN pair for the VF. This is one of the reasons why I am thinking source mode macvlan is the solution to go with for something like this. Basically the source mode macvlan can get pretty close to identifying the origin of any packet that came from the VF assuming it is programmed with all the MAC entries belonging to the VF. The only case where this doesn't work is the "trusted" legacy mode VF that is running in promiscuous with anti-spoof disabled. >> All I am asking for is for us to not close the door to the possibility >> of adding features to legacy SR-IOV. I am hoping to use a source >> macvlan based approach to make it so that we can support "port >> representors" for devices that can't support full SwitchDev. The idea >> would be to use them to get as close to SwitchDev level support on >> legacy devices as possible without using full SwitchDev. That should >> solve a good part of the issue, but I am pretty certain I need to be >> able to extend legacy SR-IOV in order to support it. I had talked with >> Jiri at netdev 2.1 about it back when we had submitted the v7 patches, >> and the decision was to look at doing "port representors" but don't >> associate them with SwitchDev. I was out on Sabbatical for most of the >> summer and I am just now starting on the macvlan work I had planned. I >> hope to have it done before the next netdev and then we can discuss it >> there if it needs more discussion than what we can have on the mailing >> list. > > I don't know what you mean with the macvlan based approach. Could you > perhaps describe it in more detail? Will it allow users to configure > forwarding and queueing with existing, standard tools and APIs? So there are a few issues with our devices doing SwitchDev mode that I am trying to address. One of the issues is that we have no direct way to figure out where the packets are coming from as I described above. So instead of us implementing multiple approaches for the same thing my thought was to look at using source mode macvlan which does filtering on the source MAC address instead of the destination. It shouldn't take much to extend it so that a PF could notify a source mode macvlan interface of all the unicast addresses a VF can use as a source address for transmitting. With that we would at least be able to tell where the traffic came from. Another issue is directing transmit packets to the VF for any specific interface. My thought is for our source mode based "port representor" macvlan would be to limit the transmits so that we can only transmit unicast packets that are guaranteed to be delivered to the proper destination. Basically we would have to tag all broadcast and multicast packets as being already forwarded and they would have to be dropped on the "port representor" interfaces. Ideally there would be some sort of uplink representor that would then be able to handle the broadcast/multicast packets for the device since we end up replicating the packets across all ports on the same VLAN currently. The last issue is that by default all transmits that don't have a matching filter in hardware are transmitted out the uplink port. That was part of the issue that we don't think can be solved for ixgbe, and even with a firmware change I am not certain how will i40e will work for this. With macvlan being used as the model we basically skirt the whole issue since that is kind of the standard behavior for macvlan anyway. In theory this all should work together to allow forwarding with the existing tools. It would basically just mean we need to use FDB programming on the port representor to control what MAC addresses are handled for each interface. In addition we could probably handle the ndo_setup_tc call in the port representors with some limited subset of fields supported by flower to use that to route traffic. It will be much easier to show all this once I have have code. It will probably take me a month or so to dig out the technical debt that is currently present for macvlan offload, and the fact that i40e currently doesn't support it. Once I get those two items addressed my plan is to then start tackling the source mode macvlan based port representors. I hope to have an RFC ready early next year. Thanks. - Alex