On Sat, 7 Mar 2026 10:27:11 -0500 Steven Rostedt <[email protected]> wrote:
> On Sat, 7 Mar 2026 23:26:38 +0900 > "Masami Hiramatsu (Google)" <[email protected]> wrote: > > > kernel/trace/ring_buffer.c | 63 > > +++++++++++++++++++++++--------------------- > > 1 file changed, 33 insertions(+), 30 deletions(-) > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > index b6f3ac99834f..8599de5cf59b 100644 > > --- a/kernel/trace/ring_buffer.c > > +++ b/kernel/trace/ring_buffer.c > > @@ -396,6 +396,12 @@ static __always_inline unsigned int > > rb_page_commit(struct buffer_page *bpage) > > return local_read(&bpage->page->commit); > > } > > > > +/* Size is determined by what has been committed */ > > +static __always_inline unsigned int rb_page_size(struct buffer_page *bpage) > > +{ > > + return rb_page_commit(bpage) & ~RB_MISSED_MASK; > > +} > > + > > static void free_buffer_page(struct buffer_page *bpage) > > { > > /* Range pages are not to be freed */ > > @@ -1819,7 +1825,7 @@ static bool rb_cpu_meta_valid(struct > > ring_buffer_cpu_meta *meta, int cpu, > > > > bitmap_clear(subbuf_mask, 0, meta->nr_subbufs); > > > > - /* Is the meta buffers and the subbufs themselves have correct data? */ > > + /* Is the meta buffers themselves have correct data? */ > > I just realized that the origin didn't have correct grammar. But we > still check the subbufs, why remove that comment? > > The original should have said: > > /* Do the meta buffers and subbufs have correct data? */ I just removed the data check from this loop, so I think this should focus on checking metadata itself. The data is checked later. > > > for (i = 0; i < meta->nr_subbufs; i++) { > > if (meta->buffers[i] < 0 || > > meta->buffers[i] >= meta->nr_subbufs) { > > @@ -1827,11 +1833,6 @@ static bool rb_cpu_meta_valid(struct > > ring_buffer_cpu_meta *meta, int cpu, > > return false; > > } > > > > - if ((unsigned)local_read(&subbuf->commit) > subbuf_size) { > > - pr_info("Ring buffer boot meta [%d] buffer invalid > > commit\n", cpu); > > - return false; > > - } > > This should still be checked, although it doesn't need to fail the loop > but instead continue to the next buffer. We already have another check of the data in the loop in rb_meta_validate_events() so data corruption should be handled there. > > Also, I mentioned that if the commit == RB_MISSED_EVENTS, then we know > the sub buffer was corrupted and should be skipped. Yes, if RB_MISSED_EVENTS bit is set, the commit field is out of range. That is checked in rb_validate_buffer(). > > And honestly, the commit should never be greater than the subbuf_size, > even if corrupted. As we are only worried about corruption due to cache > not writing out. That should not corrupt the commit size (now we can > ignore the flags and use page size instead). Hmm, but if the kernel crash and reboot when it sets RB_MISSED_EVENTS, we will see the bit is set and commit size is different. Note, I think the reader_page RB_MISSED_EVENTS flag is not cleared after read. commit ca296d32ece3 ("tracing: ring_buffer: Rewind persistent ring buffer on reboot") drops clearing commit field for unwinding the buffer. @@ -5342,7 +5440,6 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) */ local_set(&cpu_buffer->reader_page->write, 0); local_set(&cpu_buffer->reader_page->entries, 0); - local_set(&cpu_buffer->reader_page->page->commit, 0); cpu_buffer->reader_page->real_end = 0; Should we clear the RB_MISSED_* bits here? Thanks, > > So, perhaps we should invalidate the entire buffer if the commit part > is corrupted, as that is a major corruption. > > -- Steve > -- Masami Hiramatsu (Google) <[email protected]>
