----- On Apr 30, 2020, at 12:30 PM, rostedt rost...@goodmis.org wrote: > On Thu, 30 Apr 2020 12:18:22 -0400 (EDT) > Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote: > >> ----- On Apr 30, 2020, at 12:16 PM, rostedt rost...@goodmis.org wrote: >> >> > On Thu, 30 Apr 2020 11:20:15 -0400 (EDT) >> > Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote: >> > >> >> > The right fix is to call vmalloc_sync_mappings() right after allocating >> >> > tracing or perf buffers via v[zm]alloc(). >> >> >> >> Either right after allocation, or right before making the vmalloc'd data >> >> structure visible to the instrumentation. In the case of the pid filter, >> >> that would be the rcu_assign_pointer() which publishes the new pid filter >> >> table. >> >> >> >> As long as vmalloc_sync_mappings() is performed somewhere *between* >> >> allocation >> >> and publishing the pointer for instrumentation, it's fine. >> >> >> >> I'll let Steven decide on which approach works best for him. >> > >> > As stated in the other email, I don't see it having anything to do with >> > vmalloc, but with the per_cpu() allocation. I'll test this theory out by >> > not even allocating the pid masks and touching the per cpu data at every >> > event to see if it crashes. >> >> As pointed out in my other email, per-cpu allocation uses vmalloc when >> size > PAGE_SIZE. > > And as I replied: > > buf->data = alloc_percpu(struct trace_array_cpu); > > struct trace_array_cpu { > atomic_t disabled; > void *buffer_page; /* ring buffer spare */ > > unsigned long entries; > unsigned long saved_latency; > unsigned long critical_start; > unsigned long critical_end; > unsigned long critical_sequence; > unsigned long nice; > unsigned long policy; > unsigned long rt_priority; > unsigned long skipped_entries; > u64 preempt_timestamp; > pid_t pid; > kuid_t uid; > char comm[TASK_COMM_LEN]; > > bool ignore_pid; > #ifdef CONFIG_FUNCTION_TRACER > bool ftrace_ignore_pid; > #endif > }; > > That doesn't look bigger than PAGE_SIZE to me.
Let me point you to: pcpu_alloc() calling pcpu_create_chunk() which is then responsible for calling the underlying pcpu_mem_zalloc() which then uses vmalloc. So batching those allocations can be responsible for using vmalloc'd memory rather than kmalloc'd even though the allocation size is smaller than 4kB. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com