On 21-Apr-22 2:18 PM, Burakov, Anatoly wrote:
On 21-Apr-22 10:37 AM, David Marchand wrote:On Wed, Apr 20, 2022 at 4:47 PM Burakov, Anatoly <anatoly.bura...@intel.com> wrote:On 15-Apr-22 6:31 PM, David Marchand wrote:When releasing some memory, the allocator can choose to return some pages to the OS. At the same time, this memory was poisoned in ASAn shadow. Doing the latter made it impossible to remap this same page later. On the other hand, without this poison, the OS would pagefault in any case for this page. Remove the poisoning for unmapped pages. Bugzilla ID: 994 Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan") Cc: sta...@dpdk.org Signed-off-by: David Marchand <david.march...@redhat.com> --- lib/eal/common/malloc_elem.h | 4 ++++ lib/eal/common/malloc_heap.c | 12 +++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)diff --git a/lib/eal/common/malloc_elem.h b/lib/eal/common/malloc_elem.hindex 228f178418..b859003722 100644 --- a/lib/eal/common/malloc_elem.h +++ b/lib/eal/common/malloc_elem.h @@ -272,6 +272,10 @@ old_malloc_size(struct malloc_elem *elem) #else /* !RTE_MALLOC_ASAN */ +static inline void +asan_set_zone(void *ptr __rte_unused, size_t len __rte_unused, + uint32_t val __rte_unused) { } + static inline voidasan_set_freezone(void *ptr __rte_unused, size_t size __rte_unused) { }diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.cindex 6c572b6f2c..5913d9f862 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -860,6 +860,7 @@ malloc_heap_free(struct malloc_elem *elem) size_t len, aligned_len, page_sz; struct rte_memseg_list *msl; unsigned int i, n_segs, before_space, after_space; + bool unmapped_pages = false; int ret; const struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -999,6 +1000,13 @@ malloc_heap_free(struct malloc_elem *elem) /* don't care if any of this fails */ malloc_heap_free_pages(aligned_start, aligned_len); + /*+ * Clear any poisoning in ASan for the associated pages so that + * next time EAL maps those pages, the allocator can access+ * them. + */ + asan_set_zone(aligned_start, aligned_len, 0x00); + unmapped_pages = true; request_sync(); } else { @@ -1032,7 +1040,9 @@ malloc_heap_free(struct malloc_elem *elem) rte_mcfg_mem_write_unlock(); free_unlock: - asan_set_freezone(asan_ptr, asan_data_len);+ /* Poison memory range if belonging to some still mapped pages. */+ if (!unmapped_pages) + asan_set_freezone(asan_ptr, asan_data_len); rte_spinlock_unlock(&(heap->lock)); return ret;I suspect the patch should be a little more complicated than that. When we unmap pages, we don't necessarily unmap the entire malloc element, it could be that we have a freed allocation like so: | malloc header | free space | unmapped space | free space | next malloc header | So, i think the freezone should be set from asan_ptr till aligned_start, and then from (aligned_start + aligned_len) till (asan_ptr + asan_data_len). Does that make sense?(btw, I get a bounce for Zhihong mail address, is he not working at Intel anymore?) To be honest, I don't understand if we can get to this situation :-) (especially the free space after the unmapped region). But I guess you mean something like (on top of current patch): @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem) rte_mcfg_mem_write_unlock(); free_unlock:- /* Poison memory range if belonging to some still mapped pages. */- if (!unmapped_pages) + if (!unmapped_pages) { asan_set_freezone(asan_ptr, asan_data_len); + } else { + /*+ * We may be in a situation where we unmapped pages like this:+ * malloc header | free space | unmapped space | free space | malloc header + */ + void *free1_start = asan_ptr; + void *free1_end = aligned_start;+ void *free2_start = RTE_PTR_ADD(aligned_start, aligned_len);+ void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len); + + if (free1_start < free1_end) + asan_set_freezone(free1_start, + RTE_PTR_DIFF(free1_end, free1_start)); + if (free2_start < free2_end) + asan_set_freezone(free2_start, + RTE_PTR_DIFF(free2_end, free2_start)); + } rte_spinlock_unlock(&(heap->lock)); return ret;Something like that, yes. I will have to think through this a bit more, especially in light of your func_reentrancy splat :)
So, the reason splat in func_reentrancy test happens is as follows: the above patch is sorta correct (i have a different one but does the same thing), but incomplete. What happens then is when we add new memory, we are integrating it into our existing malloc heap, which triggers `malloc_elem_join_adjacent_free()` which will trigger a write into old header space being merged, which may be marked as "freed". So, again we are hit with our internal allocator messing with ASan.
To properly fix this is to answer the following question: what is the goal of having ASan support in DPDK? Is it there to catch bugs *in the allocator*, or can we just trust that our allocator code is correct, and only concern ourselves with user-allocated areas of the code? Because it seems like the best way to address this issue would be to just avoid triggering ASan checks for certain allocator-internal actions: this way, we don't need to care what allocator itself does, just what user code does. As in, IIRC there was a compiler attribute that disables ASan checks for a specific function: perhaps we could just wrap certain access in that and be done with it?
What do you think? -- Thanks, Anatoly