Hi Ewen, Thanks for the feedback. Answers are inlined.
On Sun, Nov 6, 2016 at 8:46 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > Yeah, I'm all for getting these to run more frequently and on lighter > weight infrastructure. (By the way, I also saw the use of docker; I'd > really like to get a "native" docker cluster type into ducktape at some > point so all you have to do is bake the image and then spawn containers on > demand.) > I completely agree, supporting docker integration in ducktape would be the ideal solution of the problem. > > A few things. First, it'd be nice to know if we can chain these with normal > PR builds or something like that. Even starting the system tests when we > don't know the unit tests will pass seems like it'd be wasteful. > If we do chaining one problem that it will bring is that the turn around time will suffer. It would take 1.5 hrs to run unit tests then another 1.5 hrs to run decktape tests. Also, don't dev run relevant unit tests before they submit a patch ? > > Second, agreed on getting things stable before turning this on across the > board. I have done some work for stabilizing the tests. But I need help from kafka community to take this further. It will be great if someone can guide me on how to do this ? Should we start with a subset of tests that are stable and enable others as we make progress ? Who are the people that can I work with on this problem ? > Confluent runs these tests nightly on full VMs in AWS and > historically, besides buggy logic in tests, underprovisioned resources tend > to be the biggest source of flakiness in tests. > Good to know that I am not the only one worrying about this problem :-) Finally, should we be checking w/ infra and/or Travis folks before enabling > something this expensive? Are the Storm integration tests of comparable > cost? There are some in-flight patches for parallelizing test runs of > ducktape tests (which also results in better utilization). But even with > those changes, the full test run is still quite a few VM-hours per PR and > we only expect it to increase. > We can ask infra people about this. But I think this will not be a problem. For e.g. Flink <https://travis-ci.org/apache/flink/builds/173852382> is using 11 hrs of computation time for each run. For kafka we are going to start with 6hrs. Also, with the docker setup we can bring up the whole 12 node cluster on the laptop and run ducktape tests against it. So, test development cycles will become faster. With Regards, Raghav. > > -Ewen > > On Thu, Nov 3, 2016 at 11:26 AM, Becket Qin <becket....@gmail.com> wrote: > > > Thanks for the explanation, Raghav. > > > > If the workload is not a concern then it is probably fine to run tests > for > > each PR update, although it may not be necessary :) > > > > On Thu, Nov 3, 2016 at 10:40 AM, Raghav Kumar Gautam <rag...@apache.org> > > wrote: > > > > > Hi Becket, > > > > > > The tests would be run each time a PR is created/updated this will look > > > similar to https://github.com/apache/storm/pulls. Ducktape tests take > > > about > > > 7-8 hours to run on my laptop. For travis-ci we can split them in > groups > > > and run them in parallel. This was done in the POC run which took 1.5 > > hrs. > > > It had 10 splits with 5 jobs running in parallel. > > > https://travis-ci.org/raghavgautam/kafka/builds/171502069 > > > For apache projects the limit is 30 jobs in parallel and across all > > > projects, so I expect it to take less time but it also depends on the > > > workload at the time. > > > https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci > > > > > > Thanks, > > > Raghav. > > > > > > On Thu, Nov 3, 2016 at 9:41 AM, Becket Qin <becket....@gmail.com> > wrote: > > > > > > > Thanks Raghav, > > > > > > > > +1 for the idea in general. > > > > > > > > One thing I am wondering is when the tests would be run? Would it be > > run > > > > when we merge a PR or it would be run every time a PR is > > created/updated? > > > > I am not sure how long do the tests in other projects take. For Kafka > > it > > > > may take a few hours to run all the ducktape tests, will that be an > > issue > > > > if we run the tests for each updates of the PR? > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Thu, Nov 3, 2016 at 8:16 AM, Harsha Chintalapani <ka...@harsha.io > > > > > > wrote: > > > > > > > > > Thanks, Raghav . I am +1 for having this in Kafka. It will help > > > identify > > > > > any potential issues, especially with big patches. Given that we've > > > some > > > > > tests failing due to timing issues > > > > > can we disable the failing tests for now so that we don't get any > > false > > > > > negatives? > > > > > > > > > > Thanks, > > > > > Harsha > > > > > > > > > > On Tue, Nov 1, 2016 at 11:47 AM Raghav Kumar Gautam < > > rag...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I want to start a discussion about running ducktape tests for > each > > > pull > > > > > > request. I have been working on KAFKA-4345 > > > > > > <https://issues.apache.org/jira/browse/KAFKA-4345> to enable > this > > > > using > > > > > > docker on travis-ci. > > > > > > Pull request: https://github.com/apache/kafka/pull/2064 > > > > > > Working POC: https://travis-ci.org/raghavgautam/kafka/builds/ > > > 171502069 > > > > > > > > > > > > In the POC I am able to run 124/149 tests out of which 88 pass. > The > > > > > failure > > > > > > are mostly timing issues. We can run the same scripts on the > laptop > > > > with > > > > > > which I am able to run 138/149 tests successfully. > > > > > > > > > > > > For this to work we need to enable travis-ci for Kafka. I can > open > > a > > > > > infra > > > > > > bug to request travis-ci for this. Travis-ci is already running > > tests > > > > for > > > > > > many apache projects like Storm, Hive, Flume, Thrift etc. see: > > > > > > https://travis-ci.org/apache/. > > > > > > > > > > > > Does this sound interesting ? Please comment. > > > > > > > > > > > > Thanks, > > > > > > Raghav. > > > > > > > > > > > > > > > > > > > > > > > > -- > Thanks, > Ewen >