On Wed, 2019-02-06 at 12:02 +0000, Tariq Toukan wrote: > > On 2/6/2019 2:35 AM, Cong Wang wrote: > > mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside > > gets a lot of contentions when we test some heavy workload > > with 60 RX queues and 80 CPU's, and it is clearly shown in the > > flame graph. > >
Hi Cong, The patch is ok to me, but i really doubt that you can hit a contention on latest upstream driver, since we already have spinlock per EQ, which means spinlock per core, each EQ (core) msix handler can only access one spinlock (its own), so I am surprised how you got the contention, Maybe you are not running on latest upstream driver ? what is the workload ? > > In fact, radix_tree_lookup() is perfectly fine with RCU read lock, > > we don't have to take a spinlock on this hot path. It is pretty > > much > > similar to commit 291c566a2891 > > ("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths > > are still serialized with the spinlock, and with synchronize_irq() > > it should be safe to just move the fast path to RCU read lock. > > > > This patch itself reduces the latency by about 50% with our > > workload. > > > > Cc: Saeed Mahameed <sae...@mellanox.com> > > Cc: Tariq Toukan <tar...@mellanox.com> > > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 ++++++------ > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > > index ee04aab65a9f..7092457705a2 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > > @@ -114,11 +114,11 @@ static struct mlx5_core_cq > > *mlx5_eq_cq_get(struct mlx5_eq *eq, u32 cqn) > > struct mlx5_cq_table *table = &eq->cq_table; > > struct mlx5_core_cq *cq = NULL; > > > > - spin_lock(&table->lock); > > + rcu_read_lock(); > > cq = radix_tree_lookup(&table->tree, cqn); > > if (likely(cq)) > > mlx5_cq_hold(cq); > > - spin_unlock(&table->lock); > > + rcu_read_unlock(); > > Thanks for you patch. > > I think we can improve it further, by taking the if statement out of > the > critical section. > No, mlx5_cq_hold must stay under RCU read, otherwise cq might get freed before the irq gets a change to increment ref count on it. another way to do it is not to do any refcounting in the irq handler and fence cq removal via synchronize_irq(eq->irqn) on mlx5_eq_del_cq. But let's keep one approach (refcounting), synchronize_irq/rcu can be heavy sometimes especially on RDMA workloads with many create/destroy cq in loops. > Other than that, patch LGTM. > > Regards, > Tariq > > > > > return cq; > > } > > @@ -371,9 +371,9 @@ int mlx5_eq_add_cq(struct mlx5_eq *eq, struct > > mlx5_core_cq *cq) > > struct mlx5_cq_table *table = &eq->cq_table; > > int err; > > > > - spin_lock_irq(&table->lock); > > + spin_lock(&table->lock); > > err = radix_tree_insert(&table->tree, cq->cqn, cq); > > - spin_unlock_irq(&table->lock); > > + spin_unlock(&table->lock); > > > > return err; > > } > > @@ -383,9 +383,9 @@ int mlx5_eq_del_cq(struct mlx5_eq *eq, struct > > mlx5_core_cq *cq) > > struct mlx5_cq_table *table = &eq->cq_table; > > struct mlx5_core_cq *tmp; > > > > - spin_lock_irq(&table->lock); > > + spin_lock(&table->lock); > > tmp = radix_tree_delete(&table->tree, cq->cqn); > > - spin_unlock_irq(&table->lock); > > + spin_unlock(&table->lock); > > > > if (!tmp) { > > mlx5_core_warn(eq->dev, "cq 0x%x not found in eq 0x%x > > tree\n", eq->eqn, cq->cqn); > >