Hi Gaetan,

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.ri...@6wind.com]
> Sent: Tuesday, September 5, 2017 4:23 PM
> To: Matan Azrad <ma...@mellanox.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] net/failsafe: stat support enhancement
> 
> Hi Matan,
> 
> On Tue, Sep 05, 2017 at 12:56:34PM +0300, Matan Azrad wrote:
> > The previous stats code returned only the current TX sub device stats.
> >
> > This enhancement extends it to return the sum of all sub devices stats
> > with history of removed sub-devices.
> >
> > Dedicated stats accumulator saves the stat history of all sub device
> > remove events.
> >
> > Each failsafe sub device contains the last stats asked by the user and
> > updates the accumulator in removal time.
> >
> > I would like to implement ultimate snapshot on removal time.
> > The stats_get API needs to be changed to return error in the case it
> > is too late to retrieve statistics.
> 
> You need to have this API evolution first. It is not assured to be accepted by
> the community.
> 

I already sent suggestion to mailing list, find "stats_get API: return value 
suggestion":
No objections for now.


> As it is, this version is incorrect, because the only available stats for a
> removed slave is the last snapshot.
> 

It is for sure better from the previous version, the accuracy can be better if 
my suggestion will be accepted. 
If it is, I will improve it.

> Thus it complicates the rules while still being incorrect. Even if you were 
> able
> to push for this API evolution, PMDs with hard counters would be the ones
> to return an error code on stat read while removed.
> You may be lucky at the moment because MLX drivers do not support hard
> counters, but this won't always be true (and it will be false soon enough).
> You will thus be back at square one, with a new useless API and still 
> incorrect
> stats in the fail-safe. On the other hand, the fail-safe should strive to be 
> as
> easy to use as possible with most drivers, and not cater specifically to soft-
> counters ones.
> 
> So, my take on this: I understand that aggregated stats would be interesting.
> Keep it simple & stupid: simple aggregation on stats_get.
> You will have a rollback at some point on device removal, but this is still 
> easier
> to detect / deal with than partially / sometimes incorrect stats.
> 

100% accuracy is impossible when removal event is in the game,
but we can do it better as we can.

As I wrote in commit log(if stat gets will return value):
When some drivers can give the stats after RMV interrupt and before dev_close,
We just can to improve accuracy and be closer to 100%.
The stats of the sub devices which return error will be taken from the last 
snapshot. 

It is just enhancement which is important for customers.

> > By this way, failsafe can get stats snapshot in removal interrupt
> > callback for each PMD which can give stats after removal event.
> >
> > Signed-off-by: Matan Azrad <ma...@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c   | 33
> +++++++++++++++++++++++++++++++++
> >  drivers/net/failsafe/failsafe_ops.c     | 16 ++++++++++++----
> >  drivers/net/failsafe/failsafe_private.h |  5 +++++
> >  3 files changed, 50 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index a3a8cce..133080d 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev
> *dev)
> >     return ret;
> >  }
> >
> > +void
> > +fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats
> > +*to) {
> > +   uint8_t i;
> > +
> > +   RTE_ASSERT(from != NULL && to != NULL);
> > +   to->ipackets += from->ipackets;
> > +   to->opackets += from->opackets;
> > +   to->ibytes += from->ibytes;
> > +   to->obytes += from->obytes;
> > +   to->imissed += from->imissed;
> > +   to->ierrors += from->ierrors;
> > +   to->oerrors += from->oerrors;
> > +   to->rx_nombuf += from->rx_nombuf;
> > +   for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) {
> > +           to->q_ipackets[i] += from->q_ipackets[i];
> > +           to->q_opackets[i] += from->q_opackets[i];
> > +           to->q_ibytes[i] += from->q_ibytes[i];
> > +           to->q_obytes[i] += from->q_obytes[i];
> > +           to->q_errors[i] += from->q_errors[i];
> > +   }
> > +}
> > +
> > +void
> > +fs_increment_stats_accumulator(struct sub_device *sdev) {
> > +   fs_increment_stats(&sdev->stats_snapshot,
> > +                   &PRIV(sdev->fs_dev)->stats_accumulator);
> > +   memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats)); }
> > +
> >  int
> >  failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> >                             enum rte_eth_event_type event
> __rte_unused, @@ -410,6 +441,8 @@
> > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> >     fs_switch_dev(sdev->fs_dev, sdev);
> >     /* Use safe bursts in any case. */
> >     set_burst_fn(sdev->fs_dev, 1);
> > +   /* Increment the stats accumulator by the last stats snapshot. */
> > +   fs_increment_stats_accumulator(sdev);
> >     /*
> >      * Async removal, the sub-PMD will try to unregister
> >      * the callback at the source of the current thread context.
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index ff9ad15..e47cc85 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -586,9 +586,14 @@ static void
> >  fs_stats_get(struct rte_eth_dev *dev,
> >          struct rte_eth_stats *stats)
> >  {
> > -   if (TX_SUBDEV(dev) == NULL)
> > -           return;
> > -   rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> > +   struct sub_device *sdev;
> > +   uint8_t i;
> > +
> > +   memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
> > +   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +           rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> > +           fs_increment_stats(&sdev->stats_snapshot, stats);
> > +   }
> >  }
> >
> >  static void
> > @@ -597,8 +602,11 @@ fs_stats_reset(struct rte_eth_dev *dev)
> >     struct sub_device *sdev;
> >     uint8_t i;
> >
> > -   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> > +   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> >             rte_eth_stats_reset(PORT_ID(sdev));
> > +           memset(&sdev->stats_snapshot, 0, sizeof(struct
> rte_eth_stats));
> > +   }
> > +   memset(&PRIV(dev)->stats_accumulator, 0, sizeof(struct
> > +rte_eth_stats));
> >  }
> >
> >  /**
> > diff --git a/drivers/net/failsafe/failsafe_private.h
> > b/drivers/net/failsafe/failsafe_private.h
> > index 0361cf4..267c749 100644
> > --- a/drivers/net/failsafe/failsafe_private.h
> > +++ b/drivers/net/failsafe/failsafe_private.h
> > @@ -102,6 +102,8 @@ struct sub_device {
> >     uint8_t sid;
> >     /* Device state machine */
> >     enum dev_state state;
> > +   /* Last stats snapshot passed to user */
> > +   struct rte_eth_stats stats_snapshot;
> >     /* Some device are defined as a command line */
> >     char *cmdline;
> >     /* fail-safe device backreference */ @@ -140,6 +142,7 @@ struct
> > fs_priv {
> >      * synchronized state.
> >      */
> >     enum dev_state state;
> > +   struct rte_eth_stats stats_accumulator;
> >     unsigned int pending_alarm:1; /* An alarm is pending */
> >     /* flow isolation state */
> >     int flow_isolated:1;
> > @@ -180,6 +183,8 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev);
> >
> >  int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);  void
> > failsafe_dev_remove(struct rte_eth_dev *dev);
> > +void fs_increment_stats(struct rte_eth_stats *from, struct
> > +rte_eth_stats *to); void fs_increment_stats_accumulator(struct
> > +sub_device *sdev);
> >  int failsafe_eth_rmv_event_callback(uint8_t port_id,
> >                                 enum rte_eth_event_type type,
> >                                 void *arg, void *out);
> > --
> > 2.7.4
> >
> 
> --
> Gaëtan Rivet
> 6WIND

Reply via email to