On Thu, Apr 18, 2024 at 11:43:46PM -0400, Steven Rostedt wrote: > On Thu, 18 Apr 2024 09:55:55 +0300 > Mike Rapoport <r...@kernel.org> wrote: > > Hi Mike, > > Thanks for doing this review! > > > > +/** > > > + * struct trace_buffer_meta - Ring-buffer Meta-page description > > > + * @meta_page_size: Size of this meta-page. > > > + * @meta_struct_len: Size of this structure. > > > + * @subbuf_size: Size of each sub-buffer. > > > + * @nr_subbufs: Number of subbfs in the ring-buffer, including > > > the reader. > > > + * @reader.lost_events: Number of events lost at the time of the reader > > > swap. > > > + * @reader.id: subbuf ID of the current reader. ID range [0 : > > > @nr_subbufs - 1] > > > + * @reader.read: Number of bytes read on the reader subbuf. > > > + * @flags: Placeholder for now, 0 until new features are > > > supported. > > > + * @entries: Number of entries in the ring-buffer. > > > + * @overrun: Number of entries lost in the ring-buffer. > > > + * @read: Number of entries that have been read. > > > + * @Reserved1: Reserved for future use. > > > + * @Reserved2: Reserved for future use. > > > + */ > > > +struct trace_buffer_meta { > > > + __u32 meta_page_size; > > > + __u32 meta_struct_len; > > > + > > > + __u32 subbuf_size; > > > + __u32 nr_subbufs; > > > + > > > + struct { > > > + __u64 lost_events; > > > + __u32 id; > > > + __u32 read; > > > + } reader; > > > + > > > + __u64 flags; > > > + > > > + __u64 entries; > > > + __u64 overrun; > > > + __u64 read; > > > + > > > + __u64 Reserved1; > > > + __u64 Reserved2; > > > > Why do you need reserved fields? This structure always resides in the > > beginning of a page and the rest of the page is essentially "reserved". > > So this code is also going to be used in arm's pkvm hypervisor code, > where it will be using these fields, but since we are looking at > keeping the same interface between the two, we don't want these used by > this interface. > > We probably should add a comment about that. > > > > > > +}; > > > + > > > +#endif /* _TRACE_MMAP_H_ */ > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > > index cc9ebe593571..793ecc454039 100644 > > > --- a/kernel/trace/ring_buffer.c > > > +++ b/kernel/trace/ring_buffer.c > > > > ... > > > > > +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu > > > *cpu_buffer, > > > + unsigned long *subbuf_ids) > > > +{ > > > + struct trace_buffer_meta *meta = cpu_buffer->meta_page; > > > + unsigned int nr_subbufs = cpu_buffer->nr_pages + 1; > > > + struct buffer_page *first_subbuf, *subbuf; > > > + int id = 0; > > > + > > > + subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page; > > > + cpu_buffer->reader_page->id = id++; > > > + > > > + first_subbuf = subbuf = rb_set_head_page(cpu_buffer); > > > + do { > > > + if (WARN_ON(id >= nr_subbufs)) > > > + break; > > > + > > > + subbuf_ids[id] = (unsigned long)subbuf->page; > > > + subbuf->id = id; > > > + > > > + rb_inc_page(&subbuf); > > > + id++; > > > + } while (subbuf != first_subbuf); > > > + > > > + /* install subbuf ID to kern VA translation */ > > > + cpu_buffer->subbuf_ids = subbuf_ids; > > > + > > > + /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */ > > > + meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order; > > > > Isn't this a single page? > > One thing we are doing is to make sure that the subbuffers are aligned > by their size. If a subbuffer is 3 pages, it should be aligned on 3 > page boundaries. This was something that Linus suggested. > > > > > > + meta->meta_struct_len = sizeof(*meta); > > > + meta->nr_subbufs = nr_subbufs; > > > + meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE; > > > + > > > + rb_update_meta_page(cpu_buffer); > > > +} > > > > ... > > > > > +#define subbuf_page(off, start) \ > > > + virt_to_page((void *)((start) + ((off) << PAGE_SHIFT))) > > > + > > > +#define foreach_subbuf_page(sub_order, start, page) \ > > > > Nit: usually iterators in kernel use for_each > > Ah, good catch. Yeah, that should be changed. But then ... > > > > > > + page = subbuf_page(0, (start)); \ > > > + for (int __off = 0; __off < (1 << (sub_order)); \ > > > + __off++, page = subbuf_page(__off, (start))) > > > > The pages are allocated with alloc_pages_node(.. subbuf_order) are > > physically contiguous and struct pages for them are also contiguous, so > > inside a subbuf_order allocation you can just do page++. > > > > I'm wondering if we should just nuke the macro. It was there because of > the previous implementation did things twice. But now it's just done > once here: > > + while (s < nr_subbufs && p < nr_pages) { > + struct page *page; > + > + foreach_subbuf_page(subbuf_order, cpu_buffer->subbuf_ids[s], > page) { > + if (p >= nr_pages) > + break; > + > + pages[p++] = page; > + } > + s++; > + } > > Perhaps that should just be: > > while (s < nr_subbufs && p < nr_pages) { > struct page *page; > int off; > > page = subbuf_page(0, cpu_buffer->subbuf_ids[s]); > for (off = 0; off < (1 << subbuf_order); off++, page++, p++) { > if (p >= nr_pages) > break; > > pages[p] = page; > } > s++; > } > > ?
Yeah, was hesitating to kill it with the last version. Happy to do it for the next one. > > -- Steve