On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote:
> On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin <g...@fb.com> wrote:
> >
> > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote:
> > > Hi all,
> > >
> >
> > Hi Bruce!
> >
> > > The Yocto project has an upcoming release this fall, and I've been trying 
> > > to
> > > sort through some issues that are happening with kernel 5.2+ .. although
> > > there is a specific yocto kernel, I'm testing and seeing this with
> > > normal / vanilla
> > > mainline kernels as well.
> > >
> > > I'm running into an issue that is *very* similar to the one discussed in 
> > > the
> > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
> > > thread from this past may: https://lkml.org/lkml/2019/5/12/272
> > >
> > > I can confirm that I have the proposed fix for the initial regression 
> > > report in
> > > my build (05b2892637 [signal: unconditionally leave the frozen state
> > > in ptrace_stop()]),
> > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to 
> > > take 3 or
> > > 4 seconds.
> >
> > So, the problem is that you're experiencing a severe performance regression
> > in some test, right?
> 
> Hi Roman,
> 
> Correct. In particular, running some of the tests that ship with strace 
> itself.
> The performance change is so drastic, that it definitely makes you wonder
> "What have I done wrong? Since everyone must be seeing this" .. and I
> always blame myself first.
> 
> >
> > >
> > > This isn't my normal area of kernel hacking, so I've so far come up empty
> > > at either fixing it myself, or figuring out a viable workaround. (well, I 
> > > can
> > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ...
> > > but obviously,
> > > that is just me trying to figure out what could be causing the issue).
> > >
> > > As part of the release, we run tests that come with various applications. 
> > > The
> > > ptrace test that is causing us issues can be boiled down to this:
> > >
> > > $ cd /usr/lib/strace/ptest/tests
> > > $ time ../strace -o log -qq -esignal=none -e/clock 
> > > ./printpath-umovestr>ttt
> > >
> > > (I can provide as many details as needed, but I wanted to keep this 
> > > initial
> > > email relatively short).
> > >
> > > I'll continue to debug and attempt to fix this myself, but I grabbed the
> > > email list from the regression report in May to see if anyone has any 
> > > ideas
> > > or angles that I haven't covered in my search for a fix.
> >
> > I'm definitely happy to help, but it's a bit hard to say anything from what
> > you've provided. I'm not aware of any open issues with the freezer except
> > some spurious cgroup frozen<->not frozen transitions which can happen in 
> > some
> > cases. If you'll describe how can I reproduce the issue, and I'll try to 
> > take
> > a look asap.
> 
> That would be great.
> 
> I'll attempt to remove all of the build system specifics out of this
> (and Richard Purdie
> on the cc' of this can probably help provide more details / setup info as 
> well).
> 
> We are running the built-in tests of strace. So here's a cut and paste of 
> what I
> did to get the tests available (ignore/skip what is common sense or isn't 
> needed
> in your test rig).
> 
> % git clone https://github.com/strace/strace.git
> % cd strace
> % ./bootstrap
> # the --enable flag isn't strictly required, but may break on some
> build machines
> % ./configure --enable-mpers=no
> % make
> % make check-TESTS
> 
> That last step will not only build the tests, but run them all .. so
> ^c the run once
> it starts, since it is a lot of noise (we carry a patch to strace that
> allows us to build
> the tests without running them).
> 
> % cd tests
> % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff
> real    0m2.566s
> user    0m0.284s
> sys     0m2.519
> 
> On pre-cgroup2 freezer kernels, you see a run time similar to what I have 
> above.
> On the newer kernels we are testing, it is taking 3 or 4 minutes to
> run the test.
> 
> I hope that is simple enough to setup and try. Since I've been seeing
> this on both
> mainline kernels and the yocto reference kernels, I don't think it is
> something that
> I'm carrying in the distro/reference kernel that is causing this (but
> again, I always
> blame myself first). If you don't see that same run time, then that
> does point the finger
> back at what we are doing and I'll have to apologize for chewing up some of 
> your
> time.

Thank you for the detailed description!
I'll try to reproduce the issue and will be back
by the end of the week.

Thank you!

Roman

Reply via email to