https://bugs.dpdk.org/show_bug.cgi?id=533
Bug ID: 533 Summary: stack corruption in mlx5_xstats_reset when number of stats changes Product: DPDK Version: 20.08 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: ralf.hoffm...@allegro-packets.com Target Milestone: --- Created attachment 120 --> https://bugs.dpdk.org/attachment.cgi?id=120&action=edit Patch to fix stack corruption I stumbled across a crash in the mellanox driver due to stack corruption. The function mlx5_xstats_reset() uses a dynamically sized array on the stack based on the last number of statistic values. Then it queries the actual number of stats. If the returned value is larger than the previous number, it will overwrite the stack frame including the return addresses usually resulting in a crash. In my case the initial value of xstats_ctrl->mlx5_stats_n was zero and the return value of mlx5_os_get_stats_n() was 24 so it overwrote 96 bytes on the stack. The problem became visible after the update to 20.08. Apparently some statistic code has been refactored which now triggers this problem in our code but the actual problem in the reset function exists for a long time. We are calling rte_eth_xstats_reset() before calling rte_eth_xstats_get() to get the available statistics from the driver. There are two problems in this function: 1. The array is created not based on the actual number of items, but the previously known number of items. 2. The function mlx5_os_read_dev_counters() just gets a pointer to the array without knowing its size making it unsafe to use. I have attached a patch which works for me by using alloca instead. Since this is not completely portable, another approach can be using a separate scope block for the dynamically sized array. If it matters, the card we used is MCX516A-CDAT -- You are receiving this mail because: You are the assignee for the bug.