Hey Cham, Debugging is harder ================ I definitely agree. As I said (and I think you still generally agree), I think the tradeoff is worth it. Looking at the data store in question can quickly narrow it down to one vs the other for a particular failure.
Eventually consistent data stores ========================== I agree that this is a problem, however I don't think this is a problem created by doing a writeThenRead test, because we have exactly the same problems for regular write tests (which are themselves writeThenRead, just with a native reader). I agree it exacerbates the "debugging is harder" question. I think we're in general agreement - folks testing eventually consistent data stores need to be careful, and consider what's best for them. This may not be the correct solution for them. I added a note to the testing doc to make sure to address this. S On Tue, Mar 28, 2017 at 10:27 PM Chamikara Jayalath <chamik...@apache.org> wrote: > On Tue, Mar 28, 2017 at 3:00 AM Etienne Chauchot <echauc...@gmail.com> > wrote: > > > Hi Stephen, > > > > I have some comments bellow: > > > > > > Le 24/03/2017 à 00:26, Stephen Sisk a écrit : > > > hi! > > > > > > I just opened a jira ticket that I wanted to make sure the mailing list > > got > > > a chance to see. > > > > > > The problem is that the current design pattern for doing data loading > in > > IO > > > ITs (either writing a small program or using an external tool) is > > complex, > > > inefficient and requires extra steps like installing external > > > tools/probably using a VM. It also really doesn't scale well to the > > larger > > > data sizes we'd like to use for performance benchmarking. > > > > > > My proposal is that instead of trying to test read and write > separately, > > > the test should be a "write, then read back what you just wrote", all > > using > > > the IO being tested. > > Sure, joining read and write tests will allow to write less often and > > thus be more efficient. Indeed, instead of writing once for all the read > > test runs and write at each write test run, we will only write at each > > read+write test run. We will also avoid using another writing place. > > > > I agree that this is beneficial from a test efficiency perspective but > there is a downside. > > I think a failure of this kind of a write+read test could be quite hard to > debug and it might even be hard to develop such a test to be non-flaky > depending on the I/O. For example, for a eventually consistent file-system > such as GCS, a failure of a write+read test could mean any one of > following. > > * write failed > * read failed > * read was executed prior to write finishing and file system reaching a > consistent state. > > At first glance one might think that adding barrier in the middle that > waits for read to be consistent would solve that problem but that will not > be the case if the data source serves requests using multiple replicas > which may be in inconsistent states (which is the case for GCS). > > Separate read and write tests with fixed input are much easier to > manage/debug. > > So I think we should be careful when converting I/O ITs to do read+write > and probably should only make this a recommendation for I/O ITs that would > not run into issues due to this. > > Just my 2 cents. > > Thanks, > Cham > > > > > To support scenarios like "I want to run my read test > > > repeatedly without re-writing the data", tests would add flags for > > > "skipCleanUp" and "useExistingData". > > But this does the assumption of the order of test runs: write test needs > > to have been run before read test can happen. Maybe a little dangerous > > to do this assumption no? > > > > > > I think we've all likely seen this type of solution when testing > storage > > > layers in the past, and I've previously shied away from it in this > > context, > > > but I think now that I've seen some real ITs and thought about scaling > > > them, in this case it's the right solution. > > > > > > Please take a look at the jira if you have questions - there's a lot > more > > > detail there. > > > > > > S > > > > > Etienne > > >