Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

Ilya Maximets Mon, 11 Jul 2016 23:44:18 -0700

On 07.07.2016 03:27, Daniele Di Proietto wrote:
> Hi Ilya,
> 
> 
> 
> apologies for the delay
> 
> On 20/06/2016 07:22, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
> 
>> On 11.06.2016 02:53, Daniele Di Proietto wrote:
>>> On 02/06/2016 06:55, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
>>>
>>>> Hi, Daniele.
>>>> Thanks for review.
>>>>
>>>> On 02.06.2016 04:33, Daniele Di Proietto wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> apologies for the delay.
>>>>>
>>>>> I didn't take a extremely detailed look at this series, but I have
>>>>> a few high level comments.
>>>>>
>>>>> Thanks for adding a command to configure the rxq affinity.  Have
>>>>> you thought about using the database instead?  I think it will
>>>>> be easier to use because it survives restarts, and one can batch
>>>>> the affinity assignment for multiple ports without explicitly
>>>>> calling pmd-reconfigure.  I'm not sure what the best interface
>>>>> would look like. Perhaps a string in Interface:other_config that
>>>>> maps rxqs with core ids?
>>>>>
>>>>> I'd prefer to avoid exporting an explicit command like
>>>>> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
>>>>> right?
>>>>
>>>> I thought about solution with database. Actually, I can't see big
>>>> difference between database and appctl in this case. For automatic
>>>> usage both commands may be scripted, but for manual pinning this
>>>> approaches equally uncomfortable.
>>>> IMHO, if it will be database it shouldn't be a per 'Interface'
>>>> string with mapping, because one map influences on other ports
>>>> (core isolation). Also there is an issue with synchronization with
>>>> 'pmd-cpu-mask' that should be performed manually anyway.
>>>> appctl command may be changed to receive string of all mappings and
>>>> trigger reconfiguration. In this case there will be no need to have
>>>> explicit 'dpif-netdev/pmd-reconfigure'.
>>>
>>> Do we really need to implement core isolation? I'd prefer an interface where
>>> if an interface has an affinity we enforce that (as far as we can with the
>>> current pmd-cpu-mask), and for other interfaces we keep the current model.
>>> Probably there are some limitation I'm not seeing with this model.
>>
>> Generally, core isolation prevents polling of other ports on PMD thread.
>> This is useful to keep constant polling rate on some performance
>> critical port while adding/deleting of other ports. Without isolation
>> we will need to pin exactly all ports to achieve desired level of 
>> performance.
>>
>>> I'd prefer to keep the mapping in the database because it's more in line
>>> with the rest of OVS configuration.  The database survives crashes, restarts
>>> and reboots.
>>
>> Ok. How about something like this:
>>
>>      * Per-port database entry for available core-ids:
>>
>>        # ovs-vsctl set interface <iface> \
>>                      other_config:pmd-rxq-affinity=<rxq-affinity-list>
>>
>>        where:
>>        <rxq-affinity-list> ::= NULL | <non-empty-list>
>>        <non-empty-list> ::= <affinity-pair> |
>>                             <affinity-pair> , <non-empty-list>
>>        <affinity-pair> ::= <queue-id> : <core-id>
>>
>>        Example:
>>
>>        # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>>                              other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>>            Queue #0 pinned to core 3;
>>            Queue #1 pinned to core 7;
>>            Queue #2 not pinned.
>>            Queue #3 pinned to core 8;
> 
> Unless someone has better ideas, this looks good


Ok. I'll implement this in third (last) part of this patch-set.

>>
>>      * Configurable mask of isolated PMD threads:
>>
>>        # ./bin/ovs-vsctl set Open_vSwitch . \
>>                          other_config:pmd-isol-cpus=<core-list>
>>        Empty means "none".
> 
> I still think this looks kind of complicated.  These are the options:
> 
> 1) Do not deal with isolation.  If some "isolation" is required, the user
>    has to assign the rxqs of every port.
> 
> 2) Automatically isolate cores that have rxq explicitly assigned to them.
> 
> 3) Add a pmd-isol-cpus parameter
> 
> 4) Add a rxq-default-affinity (the opposite of pmd-isol-cpus).
> 
> 1) and 2) only add a single configuration parameter, but
> 1) puts the burden on the controller (as you point out), and 2) is
> not too flexible.
> 
> 3) and 4) are equivalent, they allow more flexibility, but add two
> configuration parameters.
> 
> If you think more flexibility is required, we can go for 3) or 4).
> 
> Otherwise we can probably try implementing 2).
> 
> Are there better ideas?

I think we can implement 2) for now. In the future maybe we will abandon
core isolation after implementing of some automatic load balancer.
 
>>      * Pinned RX queue can be polled only by PMD thread on core to
>>        which it is pinned.
>>
>>      * Only pinned RX queues can be polled by isolated PMD thread.
>>
>>>>
>>>>> I'm not sure what's the best way to introduce XPS in OVS.  First of all,
>>>>> for physical NICs I'd want to keep the current model where possible
>>>>> (one queue per core, no locking).  Maybe we can introduce some mapping
>>>>> just for vhost-user ports in netdev-dpdk to implement packet steering?
>>>>
>>>> We can just set default values for 'n_txq' more accurately: 'n_cores() + 1'
>>>> for phy ports and '1' for virtual. To avoid locking on TX to phy ports
>>>> we may just check that 'n_txq' >= 'n_cores() + 1'. We can do that because
>>>> reconfiguration required to change 'n_txq' and, in current implementation
>>>> of XPS, one queue will not be used twice if we have unused queues.
>>>
>>> Ok, if it can be done without any overhead for phy ports, I'm fine with 
>>> that.
>>>
>>>>
>>>>> Also, have you considered packet reordering issues?  If a 5-tuple is
>>>>> processed by one core and we often change the tx queue, we end up
>>>>> with the same 5-tuple on two different tx queues.
>>>>
>>>> To avoid reordering issues there is a timeout mechanism inside XPS.
>>>> Current tx queue_id may be changed only if there was no packets to
>>>> this queue for a significant amount of time (XPS_CYCLES).
>>>
>>> Ok, no problem then.
>>>
>>>>
>>>>> Lastly I'm not 100% sure we need a "n-txq" parameter.  I think that
>>>>> for vhost-user ports, we know the value in new_device() (we should
>>>>> do that for rxqs too now that we have netdev_reconfigure).  physical
>>>>> NICs txqs should match the cpu cores and avoid locking, where
>>>>> possible.
>>>>
>>>> Actually, we don't need this if default values will be set as described
>>>> above and reconfiguration request will be used inside 'new_device()'
>>>> (I guess, this should be implemented separately).
>>>> But it's harmless and, actually, almost free in implementation. May be
>>>> it can be used for some specific cases or devices.
>>>
>>> In general, we try to limit the knobs that a user can play with, unless they
>>> provide some value.
>>>
>>> In this case, if we need something for netdev-dummy, we can implement it
>>> explicitly for it and not avoid exposing it otherwise.
>>
>> Ok. I agree. We can introduce something like 'options:dummy_n_txq' for
>> dummy interfaces only.
>>
>> What do you think?
> 
> Agreed
> 
> Thanks,
> 
> Daniele
> 
> 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

Reply via email to