Re: [Discuss] [Proposal] [C++] Arrow multithreaded stress test suite

Weston Pace Wed, 19 May 2021 11:23:17 -0700

> I would recommend writing such tests in Python, such as is already done
> for the CSV reader.


Agreed, that is my current thinking as well.

> I'm not sure what you have in mind.  You're intending to run this test
> 40k minutes per day?

40k minutes per month.  24 hours * 60 minutes * 30 days ~ 40k minutes.

> Such an approach
> can be performed by introducing explicit breakpoints in the code and
> starting when the breakpoint is reached the other code that we know might
> cause problems when executed concurrently.

We have some number of tests doing this in async_generator_test.cc.  I
agree that it is a good idea in general.  Here though I'm after issues
that we don't know about and so I wound't know where to insert the
breakpoints.  Once an issue has been identified and fixed then a more
targeted regression unit test could be created using these techniques.

On Tue, May 18, 2021 at 10:23 PM Alessandro Molina
<[email protected]> wrote:
>
> Another approach that could reduce the amount of heavy tests that we have
> to write (if the tests are written in Python) might be to drive the code to
> interleave in the ways we feel might introduce problems. Such an approach
> can be performed by introducing explicit breakpoints in the code and
> starting when the breakpoint is reached the other code that we know might
> cause problems when executed concurrently.
>
> For example, Imagine you want to simulate what happens when two threads
> write to a file concurrently, you put a Breakpoint on file.write, wait for
> that breakpoint to be reached, and explicitly invoke another file.write
> when that happens.
>
> That way instead of having to throw tons of threads trying to trigger for
> race conditions randomly you can reduce the amount of time/computation by
> explicitly searching for problems in the areas that are more certain to
> hide them and raising the chances that they happen by forcing the two code
> blocks to always run interleaved in the way you expect that might cause
> problems.
>
> mock.patch to wrap the function where you want to stop is the easy way to
> put those breakpoints in Python usually. At crunch for example was written
> a Breakpoint class extensively used to write tests that simulate race
> conditions which was released under MIT license (
> https://github.com/Crunch-io/diagnose#breakpoints )
>
> On Wed, May 19, 2021 at 9:01 AM Antoine Pitrou <[email protected]> wrote:
>
> >
> > Le 19/05/2021 à 07:37, Weston Pace a écrit :
> > > I spoke a while ago about working on a multithreaded stress test
> > > suite.  I have put together some very early details[1].  I would
> > > appreciate any feedback.
> >
> > I would recommend writing such tests in Python, such as is already done
> > for the CSV reader.
> >
> > > One particular item I could use feedback on is how this gets deployed.
> > > In my mind this would be an ongoing test that is continuously running
> > > against the previous nightly build.  Such a test would quickly consume
> > > Apache's GHA minutes so I don't think GHA is an option.  Other free CI
> > > options probably wouldn't have enough minutes for a continuous daily
> > > test (e.g. ~40k minutes).
> >
> > I'm not sure what you have in mind.  You're intending to run this test
> > 40k minutes per day?
> >
> > Regards
> >
> > Antoine.
> >

Re: [Discuss] [Proposal] [C++] Arrow multithreaded stress test suite

Reply via email to