I see there's support for this, but people are still pouring in commits. I proposed we have a quick vote on this to lock down the commits until we get to green. That way everyone knows we have drawn the line at a specific point. Any commits after that point would be reverted. There isn't a category in the bylaws that fits this kind of vote but I suggest lazy majority as the most appropriate one (at least 3 votes, more +1s than -1s).
Alan. On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar <vih...@cloudera.com> wrote: > I worked on a few quick-fix optimizations in Ptest infrastructure over the > weekend which reduced the execution run from ~90 min to ~70 min per run. I > had to restart Ptest multiple times. I was resubmitting the patches which > were in the queue manually, but I may have missed a few. In case you have a > patch which is pending pre-commit and you don't see it in the queue, please > submit it manually or let me know if you don't have access to the jenkins > job. I will continue to work on the sub-tasks in HIVE-19425 and will do > some maintenance next weekend as well. > > On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < > jcama...@apache.org> wrote: > > > Vineet has already been working on disabling those tests that were timing > > out. I am working on disabling those that are generating different q > files > > consistently for last ptests n runs. I am keeping track of all these > tests > > in https://issues.apache.org/jira/browse/HIVE-19509. > > > > -Jesús > > > > On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < > > pjayachand...@hortonworks.com> wrote: > > > > +1 on freezing commits until we get repetitive green tests. We should > > probably disable (and remember in a jira to reenable then at later point) > > tests that are flaky to get repetitive green test runs. > > > > Thanks > > Prasanth > > > > > > > > On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < > lirui.fu...@gmail.com > > <mailto:lirui.fu...@gmail.com>> wrote: > > > > > > +1 to freezing commits until we stabilize > > > > On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar > > wrote: > > > > > In order to understand the end-to-end precommit flow I would like > to > > get > > > access to the PreCommit-HIVE-Build jenkins script. Does anyone one > > know how > > > can I get that? > > > > > > On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < > > > jcama...@apache.org> wrote: > > > > > > > Bq. For the short term green runs, I think we should @Ignore the > > tests > > > > which > > > > are known to be failing since many runs. They are anyways not > being > > > > addressed as such. If people think they are important to be run > we > > should > > > > fix them and only then re-enable them. > > > > > > > > I think that is a good idea, as we would minimize the time that > we > > halt > > > > development. We can create a JIRA where we list all tests that > were > > > > failing, and we have disabled to get the clean run. From that > > moment, we > > > > will have zero tolerance towards committing with failing tests. > > And we > > > need > > > > to pick up those tests that should not be ignored and bring them > > up again > > > > but passing. If there is no disagreement, I can start working on > > that. > > > > > > > > Once I am done, I can try to help with infra tickets too. > > > > > > > > -Jesús > > > > > > > > > > > > On 5/11/18, 1:57 PM, "Vineet Garg" wrote: > > > > > > > > +1. I strongly vote for freezing commits and getting our > > testing > > > > coverage in acceptable state. We have been struggling to > stabilize > > > > branch-3 due to test failures and releasing Hive 3.0 in current > > state > > > would > > > > be unacceptable. > > > > > > > > Currently there are quite a few test suites which are not > even > > > running > > > > and are being timed out. We have been committing patches (to both > > > branch-3 > > > > and master) without test coverage for these tests. > > > > We should immediately figure out what’s going on before we > > proceed > > > > with commits. > > > > > > > > For reference following test suites are timing out on > master: ( > > > > https://issues.apache.org/jira/browse/HIVE-19506) > > > > > > > > > > > > TestDbNotificationListener - did not produce a TEST-*.xml > file > > > (likely > > > > timed out) > > > > > > > > TestHCatHiveCompatibility - did not produce a TEST-*.xml file > > (likely > > > > timed out) > > > > > > > > TestNegativeCliDriver - did not produce a TEST-*.xml file > > (likely > > > > timed out) > > > > > > > > TestNonCatCallsWithCatalog - did not produce a TEST-*.xml > file > > > (likely > > > > timed out) > > > > > > > > TestSequenceFileReadWrite - did not produce a TEST-*.xml file > > (likely > > > > timed out) > > > > > > > > TestTxnExIm - did not produce a TEST-*.xml file (likely timed > > out) > > > > > > > > > > > > Vineet > > > > > > > > > > > > On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < > > > vih...@cloudera.com > > > > > wrote: > > > > > > > > +1 There are many problems with the test infrastructure and > in > > my > > > > opinion > > > > it has not become number one bottleneck for the project. I > was > > > looking > > > > at > > > > the infrastructure yesterday and I think the current > > infrastructure > > > > (even > > > > its own set of problems) is still under-utilized. I am > > planning to > > > > increase > > > > the number of threads to process the parallel test batches to > > start > > > > with. > > > > It needs a restart on the server side. I can do it now, it > > folks are > > > > okay > > > > with it. Else I can do it over weekend when the queue is > small. > > > > > > > > I listed the improvements which I thought would be useful > under > > > > https://issues.apache.org/jira/browse/HIVE-19425 but frankly > > > speaking > > > > I am > > > > not able to devote as much time as I would like to on it. I > > would > > > > appreciate if folks who have some more time if they can help > > out. > > > > > > > > I think to start with https://issues.apache.org/ > > > jira/browse/HIVE-19429 > > > > will > > > > help a lot. We need to pack more test runs in parallel and > > containers > > > > provide good isolation. > > > > > > > > For the short term green runs, I think we should @Ignore the > > tests > > > > which > > > > are known to be failing since many runs. They are anyways not > > being > > > > addressed as such. If people think they are important to be > > run we > > > > should > > > > fix them and only then re-enable them. > > > > > > > > Also, I feel we need light-weight test run which we can run > > locally > > > > before > > > > submitting it for the full-suite. That way minor issues with > > the > > > patch > > > > can > > > > be handled locally. May be create a profile which runs a > > subset of > > > > important tests which are consistent. We can apply some label > > that > > > > pre-checkin-local tests are runs successful and only then we > > submit > > > > for the > > > > full-suite. > > > > > > > > More thoughts are welcome. Thanks for starting this > > conversation. > > > > > > > > On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < > > > > jcama...@apache.org> wrote: > > > > > > > > I believe we have reached a state (maybe we did reach it a > > while ago) > > > > that > > > > is not sustainable anymore, as there are so many tests > failing > > / > > > > timing out > > > > that it is not possible to verify whether a patch is breaking > > some > > > > critical > > > > parts of the system or not. It also seems to me that due to > the > > > > timeouts > > > > (maybe due to infra, maybe not), ptest runs are taking even > > longer > > > than > > > > usual, which in turn creates even longer queue of patches. > > > > > > > > There is an ongoing effort to improve ptests usability ( > > > > https://issues.apache.org/jira/browse/HIVE-19425), but apart > > from > > > > that, > > > > we need to make an effort to stabilize existing tests and > > bring that > > > > failure count to zero. > > > > > > > > Hence, I am suggesting *we stop committing any patch before > we > > get a > > > > green > > > > run*. If someone thinks this proposal is too radical, please > > come up > > > > with > > > > an alternative, because I do not think it is OK to have the > > ptest > > > runs > > > > in > > > > their current state. Other projects of certain size (e.g., > > Hadoop, > > > > Spark) > > > > are always green, we should be able to do the same. > > > > > > > > Finally, once we get to zero failures, I suggest we are less > > tolerant > > > > with > > > > committing without getting a clean ptests run. If there is a > > failure, > > > > we > > > > need to fix it or revert the patch that caused it, then we > > continue > > > > developing. > > > > > > > > Please, let’s all work together as a community to fix this > > issue, > > > that > > > > is > > > > the only way to get to zero quickly. > > > > > > > > Thanks, > > > > Jesús > > > > > > > > PS. I assume the flaky tests will come into the discussion. > > Let´s see > > > > first how many of those we have, then we can work to find a > > fix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Best regards! > > Rui Li > > > > > > > > > > >