> From: David Marchand [mailto:david.march...@redhat.com] > Sent: Monday, 29 August 2022 13.58 > > On Mon, Aug 29, 2022 at 1:38 PM lic121 <chengt...@qq.com> wrote: > > > > On Mon, Aug 29, 2022 at 01:18:36AM +0000, lic121 wrote: > > > On Sat, Aug 27, 2022 at 05:56:54PM +0300, Dmitry Kozlyuk wrote: > > > > 2022-08-27 13:31 (UTC+0000), lic121: > > > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > > > 2022-08-27 09:25 (UTC+0000), chengt...@qq.com: > > > > > > > From: lic121 <lic...@chinatelecom.cn> > > > > > > > > > > > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() > doens't > > > > > > > zero oute allocaed memory. Because memory are zeroed out > when free > > > > > > > in malloc_elem_free(). But seems the initial allocated > memory is > > > > > > > not zeroed out as expected. > > > > > > > > > > > > > > This patch zero out initial allocated memory in > > > > > > > malloc_heap_add_memory(). > > > > > > >
[...] > > > > > > Hi, > > > > > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. David, are you suggesting that this invariant - guaranteeing that DPDK memory is zeroed - was violated by SELinux in the SELinux/container issue you were tracking? If so, the method to ensure the invariant is faulty for SELinux. Assuming DPDK supports SELinux, this bug should be fixed. > > > > > > What makes you think that it is not zeroed? > > > > > > Were you able to catch [start; start+len) contain non-zero > bytes > > > > > > at the start of this function? > > > > > > If so, is it system memory (not an external heap)? > > > > > > If so, what is the CPU, kernel, any custom settings? > > > > > > > > > > > > Can it be the PMD or the app that uses rte_malloc instead of > rte_zmalloc? > > > > > > > > > > > > This patch cannot be accepted as-is anyway: > > > > > > 1. It zeroes memory even if the code was called not via > rte_zmalloc(). > > > > > > 2. It leads to zeroing on both alloc and free, which is > suboptimal. > > > > > > > > > > Hi Dmitry, thanks for the review. > > > > > > > > > > In rte_eth_dev_pci_allocate(), imediately after > rte_zmalloc_socket()[1] > > > > > I printed > > > > > the content in gdb. It's not zero. > > > > > > > > > > print ((struct qede_dev *)(eth_dev->data->dev_private))->edev- > >p_iov_info > > > > > > > > > > cpu: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz > > > > > kernel: 4.19.90-2102 > > > > > > > > > > [1] > > > > > > https://github.com/DPDK/dpdk/blob/v20.11/lib/librte_ethdev/rte_ethdev_p > ci.h#L91-L93 > > > > > > > > Sorry, it seems that something is wrong with your debug. > > > > Your link is for 20.11.0. > > > > In 20.11.5 (apparently always) struct qede_dev::edev is not a > pointer [2]. > > > > Even if it was, in zeroed memory it would be a NULL pointer, > > > > reading a member would give a random value at NULL + some offset. > > > > I suggest to print content of the allocated memory with > rte_hexdump(). > > > > > > > > > > Sorry I didn't describe my debug clear. At first I debuged with > version > > > 20.11.0, I found that the rte_zmalloc_socket() memory is dirty. > Then I > > > tried 20.11.5, I didn't debug on 20.11.5 but the behave is the > same(nic > > > failed to be probed). So in the commit msg I said v20.11.5 has the > > > issue. But when I describe my debug I uesd 20.11.0 url. > > > > > > More debug info: > > > 1. I reproduced the issue for tens of times, every time the printed > var > > > has the same value. > > > 2. After search malloc_heap_add_memory, I found that there are 3 > places > > > where call this function to add memory, malloc_add_seg(), > > > alloc_pages_on_heap() and malloc_heap_add_external_memory(). > Firstly, I > > > zero out memory only for malloc_add_seg(), it didn't fix the issue. > Then > > > I zero out meory in malloc_heap_add_memory() to cover all 3 cases, > this > > > time nic is probed successfully. > > > 3. Once nic is probed, I roll back my fix code, try to reproduce > the > > > issue. But I can't reproduce anymore. So I guess: the memory > allocated > > > when probe qede nic is at a fixed memory location. Because every > time in > > > my debug the printed var has the same value. After I zeroed out the > > > allocated memory once, I can't reproduce the issue anymore. > > > > > > > [2]: > > > > http://git.dpdk.org/dpdk- > stable/tree/drivers/net/qede/qede_ethdev.h?h=v20.11.5#n223 > > > > Today we probaly meet the same issue with intel E810 nic, the behave > is > > that E810 nic can be probed on some host, but can't one some other. > On > > the same host, one E810 may be probed while the other one can't be. > > After I applied this patch, no such issue anymore. > > Are you perhaps running your DPDK application from inside a container? > I remember tracking an issue which had to do with reusing a "dirty" > hugepage file (because of SELinux forbidding to destroy those files). > > > -- > David Marchand >