Re: [PATCH] eal: zero out new added memory
On Tue, Aug 30, 2022 at 12:53:37PM +, lic121 wrote: > On Tue, Aug 30, 2022 at 01:59:16PM +0300, Dmitry Kozlyuk wrote: > > Thank you for the most detailed info! > > > > 1. If you run the poisoner program the second time, > >does it also see dirty memory immediately after mmap()? > > No, running the poisoner program again doesn't show dirty memory > immediately after mmap. > > I assume that there are some differences on how my poisoner program > using hugepage and how testpmd using hugepage. I notice that testpmd > leaves some files under /dev/hugepages/rtemap_xxx even after testpmd > process exits. But my program didn't create any file under > /dev/hugepages/. > > > > 2. Kernel 4.19.90-2102 patchlevel 2102 is very high, > >can there be any unusual patches applied? > >Your host has "compute" in its name, > >can it have patches that trade security for performance? > > I may need to talk to the kernel team. Indeed, I failed to reproduce the > issue on a Virtual Machine of kernel 4.18.0-305.12.1.el8_4.x86_64.(2M > page size insetad of 1G) Verified that this issue is caused by an down stream linux kernel issue. Thanks for your attention.
Re: [PATCH] eal: zero out new added memory
On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > 2022-08-27 09:25 (UTC+), chengt...@qq.com: > > From: lic121 > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() doens't > > zero oute allocaed memory. Because memory are zeroed out when free > > in malloc_elem_free(). But seems the initial allocated memory is > > not zeroed out as expected. > > > > This patch zero out initial allocated memory in > > malloc_heap_add_memory(). > > > > With dpdk 20.11.5, "QLogic Corp. FastLinQ QL41000" probe triggers > > this problem. > > ``` > > Stack trace of thread 412780: > > #0 0x00e5fb99 ecore_int_igu_read_cam (dpdk-testpmd) > > #1 0x00e4df54 ecore_get_hw_info (dpdk-testpmd) > > #2 0x00e504aa ecore_hw_prepare (dpdk-testpmd) > > #3 0x00e8a7ca qed_probe (dpdk-testpmd) > > #4 0x00e83c59 qede_common_dev_init (dpdk-testpmd) > > #5 0x00e84c8e qede_eth_dev_init (dpdk-testpmd) > > #6 0x009dd5a7 rte_pci_probe_one_driver (dpdk-testpmd) > > #7 0x009734e3 rte_bus_probe (dpdk-testpmd) > > #8 0x009933bd rte_eal_init (dpdk-testpmd) > > #9 0x0041768f main (dpdk-testpmd) > > #10 0x7f41a7001b17 __libc_start_main (libc.so.6) > > #11 0x0067e34a _start (dpdk-testpmd) > > ``` > > > > Signed-off-by: lic121 > > --- > > lib/librte_eal/common/malloc_heap.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/lib/librte_eal/common/malloc_heap.c > > b/lib/librte_eal/common/malloc_heap.c > > index f4e20ea..1607401 100644 > > --- a/lib/librte_eal/common/malloc_heap.c > > +++ b/lib/librte_eal/common/malloc_heap.c > > @@ -96,11 +96,19 @@ > > void *start, size_t len) > > { > > struct malloc_elem *elem = start; > > + void *ptr; > > + size_t data_len > > + > > > > malloc_elem_init(elem, heap, msl, len, elem, len); > > > > malloc_elem_insert(elem); > > > > + /* Zero out new added memory. */ > > + *ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); > > + data_len = elem->size - MALLOC_ELEM_OVERHEAD; > > + memset(ptr, 0, data_len); > > + > > elem = malloc_elem_join_adjacent_free(elem); > > > > malloc_elem_free_list_insert(elem); > > Hi, > > The kernel ensures that the newly mapped memory is zeroed, > and DPDK ensures that files in hugetlbfs are not re-mapped. > What makes you think that it is not zeroed? > Were you able to catch [start; start+len) contain non-zero bytes > at the start of this function? > If so, is it system memory (not an external heap)? > If so, what is the CPU, kernel, any custom settings? > > Can it be the PMD or the app that uses rte_malloc instead of rte_zmalloc? > > This patch cannot be accepted as-is anyway: > 1. It zeroes memory even if the code was called not via rte_zmalloc(). > 2. It leads to zeroing on both alloc and free, which is suboptimal. Hi Dmitry, thanks for the review. In rte_eth_dev_pci_allocate(), imediately after rte_zmalloc_socket()[1] I printed the content in gdb. It's not zero. print ((struct qede_dev *)(eth_dev->data->dev_private))->edev->p_iov_info cpu: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz kernel: 4.19.90-2102 [1] https://github.com/DPDK/dpdk/blob/v20.11/lib/librte_ethdev/rte_ethdev_pci.h#L91-L93
Re: [PATCH] eal: zero out new added memory
On Sat, Aug 27, 2022 at 05:56:54PM +0300, Dmitry Kozlyuk wrote: > 2022-08-27 13:31 (UTC+), lic121: > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > 2022-08-27 09:25 (UTC+), chengt...@qq.com: > > > > From: lic121 > > > > > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() doens't > > > > zero oute allocaed memory. Because memory are zeroed out when free > > > > in malloc_elem_free(). But seems the initial allocated memory is > > > > not zeroed out as expected. > > > > > > > > This patch zero out initial allocated memory in > > > > malloc_heap_add_memory(). > > > > > > > > With dpdk 20.11.5, "QLogic Corp. FastLinQ QL41000" probe triggers > > > > this problem. > > > > ``` > > > > Stack trace of thread 412780: > > > > #0 0x00e5fb99 ecore_int_igu_read_cam (dpdk-testpmd) > > > > #1 0x00e4df54 ecore_get_hw_info (dpdk-testpmd) > > > > #2 0x00e504aa ecore_hw_prepare (dpdk-testpmd) > > > > #3 0x00e8a7ca qed_probe (dpdk-testpmd) > > > > #4 0x00e83c59 qede_common_dev_init (dpdk-testpmd) > > > > #5 0x00e84c8e qede_eth_dev_init (dpdk-testpmd) > > > > #6 0x009dd5a7 rte_pci_probe_one_driver (dpdk-testpmd) > > > > #7 0x009734e3 rte_bus_probe (dpdk-testpmd) > > > > #8 0x009933bd rte_eal_init (dpdk-testpmd) > > > > #9 0x0041768f main (dpdk-testpmd) > > > > #10 0x7f41a7001b17 __libc_start_main (libc.so.6) > > > > #11 0x0067e34a _start (dpdk-testpmd) > > > > ``` > > > > > > > > Signed-off-by: lic121 > > > > --- > > > > lib/librte_eal/common/malloc_heap.c | 8 > > > > 1 file changed, 8 insertions(+) > > > > > > > > diff --git a/lib/librte_eal/common/malloc_heap.c > > > > b/lib/librte_eal/common/malloc_heap.c > > > > index f4e20ea..1607401 100644 > > > > --- a/lib/librte_eal/common/malloc_heap.c > > > > +++ b/lib/librte_eal/common/malloc_heap.c > > > > @@ -96,11 +96,19 @@ > > > > void *start, size_t len) > > > > { > > > > struct malloc_elem *elem = start; > > > > + void *ptr; > > > > + size_t data_len > > > > + > > > > > > > > malloc_elem_init(elem, heap, msl, len, elem, len); > > > > > > > > malloc_elem_insert(elem); > > > > > > > > + /* Zero out new added memory. */ > > > > + *ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); > > > > + data_len = elem->size - MALLOC_ELEM_OVERHEAD; > > > > + memset(ptr, 0, data_len); > > > > + > > > > elem = malloc_elem_join_adjacent_free(elem); > > > > > > > > malloc_elem_free_list_insert(elem); > > > > > > Hi, > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > What makes you think that it is not zeroed? > > > Were you able to catch [start; start+len) contain non-zero bytes > > > at the start of this function? > > > If so, is it system memory (not an external heap)? > > > If so, what is the CPU, kernel, any custom settings? > > > > > > Can it be the PMD or the app that uses rte_malloc instead of rte_zmalloc? > > > > > > This patch cannot be accepted as-is anyway: > > > 1. It zeroes memory even if the code was called not via rte_zmalloc(). > > > 2. It leads to zeroing on both alloc and free, which is suboptimal. > > > > Hi Dmitry, thanks for the review. > > > > In rte_eth_dev_pci_allocate(), imediately after rte_zmalloc_socket()[1] > > I printed > > the content in gdb. It's not zero. > > > > print ((struct qede_dev *)(eth_dev->data->dev_private))->edev->p_iov_info > > > > cpu: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz > > kernel: 4.19.90-2102 > > > > [1] > > https://github.com/DPDK/dpdk/blob/v20.11/lib/librte_ethdev/rte_ethdev_pci.h#L91-L93 > > Sorry, it seems that something is wrong with your debug. > Your link is for 20.11.0. > In 20.11.5 (apparently always) struct qede_dev::edev is not a pointer [2]. > Even if it was, in z
Re: [PATCH] eal: zero out new added memory
On Mon, Aug 29, 2022 at 01:18:36AM +, lic121 wrote: > On Sat, Aug 27, 2022 at 05:56:54PM +0300, Dmitry Kozlyuk wrote: > > 2022-08-27 13:31 (UTC+), lic121: > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > 2022-08-27 09:25 (UTC+), chengt...@qq.com: > > > > > From: lic121 > > > > > > > > > > When RTE_MALLOC_DEBUG not configured, rte_zmalloc_socket() doens't > > > > > zero oute allocaed memory. Because memory are zeroed out when free > > > > > in malloc_elem_free(). But seems the initial allocated memory is > > > > > not zeroed out as expected. > > > > > > > > > > This patch zero out initial allocated memory in > > > > > malloc_heap_add_memory(). > > > > > > > > > > With dpdk 20.11.5, "QLogic Corp. FastLinQ QL41000" probe triggers > > > > > this problem. > > > > > ``` > > > > > Stack trace of thread 412780: > > > > > #0 0x00e5fb99 ecore_int_igu_read_cam (dpdk-testpmd) > > > > > #1 0x00e4df54 ecore_get_hw_info (dpdk-testpmd) > > > > > #2 0x00e504aa ecore_hw_prepare (dpdk-testpmd) > > > > > #3 0x00e8a7ca qed_probe (dpdk-testpmd) > > > > > #4 0x00e83c59 qede_common_dev_init (dpdk-testpmd) > > > > > #5 0x00e84c8e qede_eth_dev_init (dpdk-testpmd) > > > > > #6 0x009dd5a7 rte_pci_probe_one_driver (dpdk-testpmd) > > > > > #7 0x009734e3 rte_bus_probe (dpdk-testpmd) > > > > > #8 0x009933bd rte_eal_init (dpdk-testpmd) > > > > > #9 0x0041768f main (dpdk-testpmd) > > > > > #10 0x7f41a7001b17 __libc_start_main (libc.so.6) > > > > > #11 0x0067e34a _start (dpdk-testpmd) > > > > > ``` > > > > > > > > > > Signed-off-by: lic121 > > > > > --- > > > > > lib/librte_eal/common/malloc_heap.c | 8 > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > diff --git a/lib/librte_eal/common/malloc_heap.c > > > > > b/lib/librte_eal/common/malloc_heap.c > > > > > index f4e20ea..1607401 100644 > > > > > --- a/lib/librte_eal/common/malloc_heap.c > > > > > +++ b/lib/librte_eal/common/malloc_heap.c > > > > > @@ -96,11 +96,19 @@ > > > > > void *start, size_t len) > > > > > { > > > > > struct malloc_elem *elem = start; > > > > > + void *ptr; > > > > > + size_t data_len > > > > > + > > > > > > > > > > malloc_elem_init(elem, heap, msl, len, elem, len); > > > > > > > > > > malloc_elem_insert(elem); > > > > > > > > > > + /* Zero out new added memory. */ > > > > > + *ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); > > > > > + data_len = elem->size - MALLOC_ELEM_OVERHEAD; > > > > > + memset(ptr, 0, data_len); > > > > > + > > > > > elem = malloc_elem_join_adjacent_free(elem); > > > > > > > > > > malloc_elem_free_list_insert(elem); > > > > > > > > Hi, > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > > What makes you think that it is not zeroed? > > > > Were you able to catch [start; start+len) contain non-zero bytes > > > > at the start of this function? > > > > If so, is it system memory (not an external heap)? > > > > If so, what is the CPU, kernel, any custom settings? > > > > > > > > Can it be the PMD or the app that uses rte_malloc instead of > > > > rte_zmalloc? > > > > > > > > This patch cannot be accepted as-is anyway: > > > > 1. It zeroes memory even if the code was called not via rte_zmalloc(). > > > > 2. It leads to zeroing on both alloc and free, which is suboptimal. > > > > > > Hi Dmitry, thanks for the review. > > > > > > In rte_eth_dev_pci_allocate(), imediately after rte_zmalloc_socket()[1] > > > I printed > > > the content in gdb. It's not zero. > > > > > > print ((struct qed
Re: [PATCH] eal: zero out new added memory
On Mon, Aug 29, 2022 at 03:49:25PM +0300, Dmitry Kozlyuk wrote: > 2022-08-29 14:37 (UTC+0200), Morten Brørup: > > > From: David Marchand [mailto:david.march...@redhat.com] > > > Sent: Monday, 29 August 2022 13.58 > > > > > > > > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > > David, are you suggesting that this invariant - guaranteeing that DPDK > > memory is zeroed - was violated by SELinux in the SELinux/container issue > > you were tracking? > > > > If so, the method to ensure the invariant is faulty for SELinux. Assuming > > DPDK supports SELinux, this bug should be fixed. > > +1, I'd like to know more about that case. > > EAL checks the unlink() result, so if it fails, the allocation should fail > and the invariant should not be broken. > Code from 20.11.5: > > if (rte_eal_process_type() == RTE_PROC_PRIMARY && > unlink(path) == -1 && > errno != ENOENT) { > RTE_LOG(DEBUG, EAL, "%s(): could not remove '%s': %s\n", > __func__, path, strerror(errno)); > return -1; > } > > Can SELinux restriction result in errno == ENOENT? > I'd expect EPERM/EACCESS. Thanks for your info, the selinux is disabled on my server. Also I checked that the selinux fix is already in my dpdk. Could any other settings may cause dirty memory? If you can think of any thing related, I can have a try. BTW, this is my nic info: ``` Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) driver: ice version: 1.9.3 firmware-version: 2.30 0x80005d22 1.2877.0 expansion-rom-version: bus-info: :3b:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes ```
Re: [PATCH] eal: zero out new added memory
On Tue, Aug 30, 2022 at 01:11:25AM +, lic121 wrote: > On Mon, Aug 29, 2022 at 03:49:25PM +0300, Dmitry Kozlyuk wrote: > > 2022-08-29 14:37 (UTC+0200), Morten Brørup: > > > > From: David Marchand [mailto:david.march...@redhat.com] > > > > Sent: Monday, 29 August 2022 13.58 > > > > > > > > > > > > On Sat, Aug 27, 2022 at 12:57:50PM +0300, Dmitry Kozlyuk wrote: > > > > > > > > > > > > > > > > > The kernel ensures that the newly mapped memory is zeroed, > > > > > > > > > and DPDK ensures that files in hugetlbfs are not re-mapped. > > > > > > David, are you suggesting that this invariant - guaranteeing that DPDK > > > memory is zeroed - was violated by SELinux in the SELinux/container issue > > > you were tracking? > > > > > > If so, the method to ensure the invariant is faulty for SELinux. Assuming > > > DPDK supports SELinux, this bug should be fixed. > > > > +1, I'd like to know more about that case. > > > > EAL checks the unlink() result, so if it fails, the allocation should fail > > and the invariant should not be broken. > > Code from 20.11.5: > > > > if (rte_eal_process_type() == RTE_PROC_PRIMARY && > > unlink(path) == -1 && > > errno != ENOENT) { > > RTE_LOG(DEBUG, EAL, "%s(): could not remove '%s': %s\n", > > __func__, path, strerror(errno)); > > return -1; > > } > > > > Can SELinux restriction result in errno == ENOENT? > > I'd expect EPERM/EACCESS. > > Thanks for your info, the selinux is disabled on my server. Also I > checked that the selinux fix is already in my dpdk. Could any other > settings may cause dirty memory? If you can think of any thing related, > I can have a try. > > BTW, this is my nic info: > ``` > Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02) > > driver: ice > version: 1.9.3 > firmware-version: 2.30 0x80005d22 1.2877.0 > expansion-rom-version: > bus-info: :3b:00.1 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: yes > ``` update with more debugs: Preparation: 1. set hugepage size to 2 GB. ``` [root@gz15-compute-s3-55e247e16e22 huge]# grep -i huge /proc/meminfo AnonHugePages:124928 kB ShmemHugePages:0 kB HugePages_Total: 2 HugePages_Free:2 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize:1048576 kB Hugetlb: 2097152 kB ``` 2. make a simple programe to poison memory ```c #include #include #include static int memvcmp(void *memory, unsigned char val, size_t size) { unsigned char *mm = (unsigned char*)memory; return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0; } int main(int argc, char *argv[]){ size_t size = 2 * (1 << 30)-1; void *ptr2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (! ptr2) { printf("failed to allocted mm"); return 0; } if (argc > 1) { memset(ptr2, 0xff, size); } unsigned char * ss = ptr2; printf("ss: %x\n", *ss); if (memvcmp(ptr2, 0, size)){ printf("all zero\n"); } else { printf("not all zero\n"); } } ``` 3. insert debug info to check if memory all zero ``` diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 5a09247a6..026560333 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -91,16 +91,32 @@ malloc_socket_to_heap_id(unsigned int socket_id) /* * Expand the heap with a memory area. */ +static int memvcmp(void *memory, unsigned char val, size_t size) +{ +unsigned char *mm = (unsigned char*)memory; +return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0; +} static struct malloc_elem * malloc_heap_add_memory(struct malloc_heap *heap, struct rte_memseg_list *msl, void *start, size_t len) { struct malloc_elem *elem = start; + void *ptr; + size_t data_len; + malloc_elem_init(elem, heap, msl, len, elem, len); malloc_elem_insert(elem); + ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); + data_len = elem->size - MALLOC_ELEM_OVERHEAD; +if (memvcmp(ptr, 0, data_len)){ +RTE_LOG(ERR, EAL, "liiilog: all zero\n"); +} else { +RTE_LOG(ERR, EAL, "liiilog: not all zero\n"); +} +
Re: [PATCH] eal: zero out new added memory
On Tue, Aug 30, 2022 at 01:59:16PM +0300, Dmitry Kozlyuk wrote: > Thank you for the most detailed info! > > 1. If you run the poisoner program the second time, >does it also see dirty memory immediately after mmap()? If I run the poisoner program the second time, no dirty memory. There could be some difference on how my poisoner program and how testpmd using mmap. Because I notice that testpmd leaves the hugepage files under /dev/hugepage/rtemap_xxx even after testpmd exits. But my poisoner program didn't create any file under /dev/hugepage. > > 2. Kernel 4.19.90-2102 patchlevel 2102 is very high, >can there be any unusual patches applied? >Your host has "compute" in its name, >can it have patches that trade security for performance? I may need to talk with the kernel team on this. Indeed, I failed to reprouce the issue on a virtuam machine of kernel 4.18.0-305.12.1.el8_4.x86_64(2M page size instead of 1G).
Re: [PATCH] eal: zero out new added memory
On Tue, Aug 30, 2022 at 01:59:16PM +0300, Dmitry Kozlyuk wrote: > Thank you for the most detailed info! > > 1. If you run the poisoner program the second time, >does it also see dirty memory immediately after mmap()? No, running the poisoner program again doesn't show dirty memory immediately after mmap. I assume that there are some differences on how my poisoner program using hugepage and how testpmd using hugepage. I notice that testpmd leaves some files under /dev/hugepages/rtemap_xxx even after testpmd process exits. But my program didn't create any file under /dev/hugepages/. > > 2. Kernel 4.19.90-2102 patchlevel 2102 is very high, >can there be any unusual patches applied? >Your host has "compute" in its name, >can it have patches that trade security for performance? I may need to talk to the kernel team. Indeed, I failed to reproduce the issue on a Virtual Machine of kernel 4.18.0-305.12.1.el8_4.x86_64.(2M page size insetad of 1G)