Hi Wenzhuo,

On 05/04/2016 11:10 PM, Wenzhuo Lu wrote:
> When the physical link is down and recover later,
> the VF link cannot recover until the user stop and
> start it manually.
> This patch implements the automatic recovery of VF
> port.
> The automatic recovery bases on the link up/down
> message received from PF. When VF receives the link
> up/down message, it will replace the RX/TX and
> operation functions with fake ones to stop RX/TX
> and any future operation. Then reset the VF port.
> After successfully resetting the port, recover the
> RX/TX and operation functions.
> 
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu at intel.com>
> 
> [...]
> 
> +void
> +ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev)
> +{
> +     struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +     struct ixgbe_adapter *adapter =
> +             (struct ixgbe_adapter *)dev->data->dev_private;
> +     int diag;
> +     uint32_t vteiam;
> +
> +     /* Only one working core need to performance VF reset */
> +     if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
> +             /**
> +              * When fake rec/xmit is replaced, working thread may is running
> +              * into real RX/TX func, so wait long enough to assume all
> +              * working thread exit. The assumption is it will spend less
> +              * than 100us for each execution of RX and TX func.
> +              */
> +             rte_delay_us(100);
> +
> +             do {
> +                     dev->data->dev_started = 0;
> +                     ixgbevf_dev_stop(dev);
> +                     rte_delay_us(1000000);

If I understand well, ixgbevf_dev_link_up_down_handler() is called
by ixgbevf_recv_pkts_fake() on a dataplane core. It means that the
core that acquired the lock will loop during 100us + 1sec at least.
If this core was also in charge of polling other queues of other
ports, or timers, many packets will be dropped (even with a 100us
loop). I don't think it is acceptable to actively wait inside a
rx function.

I think it would avoid many issues to delegate this work to the
application, maybe by notifying it that the port is in a bad state
and must be restarted. The application could then properly stop
polling the queues, and stop and restart the port in a separate thread,
without bothering the dataplane cores.


Regards,
Olivier

Reply via email to