Re: IO IT Patterns: Simplifying data loading

Stephen Sisk Wed, 29 Mar 2017 07:50:58 -0700

Hey Cham,

Debugging is harder
================
I definitely agree. As I said (and I think you still generally agree), I
think the tradeoff is worth it. Looking at the data store in question can
quickly narrow it down to one vs the other for a particular failure.


Eventually consistent data stores
==========================
I agree that this is a problem, however I don't think this is a problem
created by doing a writeThenRead test, because we have exactly the same
problems for regular write tests (which are themselves writeThenRead, just
with a native reader). I agree it exacerbates the "debugging is harder"
question.

I think we're in general agreement - folks testing eventually consistent
data stores need to be careful, and consider what's best for them. This may
not be the correct solution for them. I added a note to the testing doc to
make sure to address this.

S

On Tue, Mar 28, 2017 at 10:27 PM Chamikara Jayalath <chamik...@apache.org>
wrote:

> On Tue, Mar 28, 2017 at 3:00 AM Etienne Chauchot <echauc...@gmail.com>
> wrote:
>
> > Hi Stephen,
> >
> > I have some comments bellow:
> >
> >
> > Le 24/03/2017 à 00:26, Stephen Sisk a écrit :
> > > hi!
> > >
> > > I just opened a jira ticket that I wanted to make sure the mailing list
> > got
> > > a chance to see.
> > >
> > > The problem is that the current design pattern for doing data loading
> in
> > IO
> > > ITs (either writing a small program or using an external tool) is
> > complex,
> > > inefficient and requires extra steps like installing external
> > > tools/probably using a VM. It also really doesn't scale well to the
> > larger
> > > data sizes we'd like to use for performance benchmarking.
> > >
> > > My proposal is that instead of trying to test read and write
> separately,
> > > the test should be a "write, then read back what you just wrote", all
> > using
> > > the IO being tested.
> > Sure, joining read and write tests will allow to write less often and
> > thus be more efficient. Indeed, instead of writing once for all the read
> > test runs and write at each write test run, we will only write at each
> > read+write test run. We will also avoid using another writing place.
> >
>
> I agree that this is beneficial from a test efficiency perspective but
> there is a downside.
>
> I think a failure of this kind of a write+read test could be quite hard to
> debug and it might even be hard to develop such a test to be non-flaky
> depending on the I/O. For example, for a eventually consistent file-system
> such as GCS, a failure of a write+read test could mean any one of
> following.
>
> * write failed
> * read failed
> * read was executed prior to write finishing and file system reaching a
> consistent state.
>
> At first glance one might think that adding barrier in the middle that
> waits for read to be consistent would solve that problem but that will not
> be the case if the data source serves requests using multiple replicas
> which may be in inconsistent states (which is the case for GCS).
>
> Separate read and write tests with fixed input are much easier to
> manage/debug.
>
> So I think we should be careful when converting I/O ITs to do read+write
> and probably should only make this a recommendation for I/O ITs that would
> not run into issues due to this.
>
> Just my 2 cents.
>
> Thanks,
> Cham
>
>
> > > To support scenarios like "I want to run my read test
> > > repeatedly without re-writing the data", tests would add flags for
> > > "skipCleanUp" and "useExistingData".
> > But this does the assumption of the order of test runs: write test needs
> > to have been run before read test can happen. Maybe a little dangerous
> > to do this assumption no?
> > >
> > > I think we've all likely seen this type of solution when testing
> storage
> > > layers in the past, and I've previously shied away from it in this
> > context,
> > > but I think now that I've seen some real ITs and thought about scaling
> > > them, in this case it's the right solution.
> > >
> > > Please take a look at the jira if you have questions - there's a lot
> more
> > > detail there.
> > >
> > > S
> > >
> > Etienne
> >
>

Re: IO IT Patterns: Simplifying data loading

Reply via email to