Wondering if we can add a state transition from “Patch Available” to “Ready To Commit” which can only be triggered by ptest bot on green test run.
Thanks Prasanth On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho Rodriguez" <jcama...@apache.org<mailto:jcama...@apache.org>> wrote: I have been working on fixing this situation while commits were still coming in. All the tests that have been disabled are in: https://issues.apache.org/jira/browse/HIVE-19509 I have created new issues to reenable each of them, they are linked to that issue. Maybe I was slightly aggressive disabling some of the tests, however that seemed to be the only way to bring the tests failures with age count > 1 to zero. Instead of starting a vote to freeze the commits in another thread, I will start a vote to be stricter wrt committing to master, i.e., only commit if we get a clean QA run. We can discuss more about this issue over there. Thanks, Jesús On 5/14/18, 4:11 PM, "Sergey Shelukhin" wrote: Can we please make this freeze conditional, i.e. we unfreeze automatically after ptest is clean (as evidenced by the clean HiveQA run on a given JIRA). On 18/5/14, 15:16, "Alan Gates" wrote: >We should do it in a separate thread so that people can see it with the >[VOTE] subject. Some people use that as a filter in their email to know >when to pay attention to things. > >Alan. > >On Mon, May 14, 2018 at 2:36 PM, Prasanth Jayachandran < >pjayachand...@hortonworks.com> wrote: > >> Will there be a separate voting thread? Or the voting on this thread is >> sufficient for lock down? >> >> Thanks >> Prasanth >> >> > On May 14, 2018, at 2:34 PM, Alan Gates wrote: >> > >> > I see there's support for this, but people are still pouring in >>commits. >> > I proposed we have a quick vote on this to lock down the commits >>until we >> > get to green. That way everyone knows we have drawn the line at a >> specific >> > point. Any commits after that point would be reverted. There isn't a >> > category in the bylaws that fits this kind of vote but I suggest lazy >> > majority as the most appropriate one (at least 3 votes, more +1s than >> > -1s). >> > >> > Alan. >> > >> > On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar < >> vih...@cloudera.com> >> > wrote: >> > >> >> I worked on a few quick-fix optimizations in Ptest infrastructure >>over >> the >> >> weekend which reduced the execution run from ~90 min to ~70 min per >> run. I >> >> had to restart Ptest multiple times. I was resubmitting the patches >> which >> >> were in the queue manually, but I may have missed a few. In case you >> have a >> >> patch which is pending pre-commit and you don't see it in the queue, >> please >> >> submit it manually or let me know if you don't have access to the >> jenkins >> >> job. I will continue to work on the sub-tasks in HIVE-19425 and will >>do >> >> some maintenance next weekend as well. >> >> >> >> On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < >> >> jcama...@apache.org> wrote: >> >> >> >>> Vineet has already been working on disabling those tests that were >> timing >> >>> out. I am working on disabling those that are generating different q >> >> files >> >>> consistently for last ptests n runs. I am keeping track of all these >> >> tests >> >>> in https://issues.apache.org/jira/browse/HIVE-19509. >> >>> >> >>> -Jesús >> >>> >> >>> On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < >> >>> pjayachand...@hortonworks.com> wrote: >> >>> >> >>> +1 on freezing commits until we get repetitive green tests. We >> should >> >>> probably disable (and remember in a jira to reenable then at later >> point) >> >>> tests that are flaky to get repetitive green test runs. >> >>> >> >>> Thanks >> >>> Prasanth >> >>> >> >>> >> >>> >> >>> On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < >> >> lirui.fu...@gmail.com >> >>> > wrote: >> >>> >> >>> >> >>> +1 to freezing commits until we stabilize >> >>> >> >>> On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar >> >>> wrote: >> >>> >> >>>> In order to understand the end-to-end precommit flow I would like >> >> to >> >>> get >> >>>> access to the PreCommit-HIVE-Build jenkins script. Does anyone one >> >>> know how >> >>>> can I get that? >> >>>> >> >>>> On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < >> >>>> jcama...@apache.org> wrote: >> >>>> >> >>>>> Bq. For the short term green runs, I think we should @Ignore the >> >>> tests >> >>>>> which >> >>>>> are known to be failing since many runs. They are anyways not >> >> being >> >>>>> addressed as such. If people think they are important to be run >> >> we >> >>> should >> >>>>> fix them and only then re-enable them. >> >>>>> >> >>>>> I think that is a good idea, as we would minimize the time that >> >> we >> >>> halt >> >>>>> development. We can create a JIRA where we list all tests that >> >> were >> >>>>> failing, and we have disabled to get the clean run. From that >> >>> moment, we >> >>>>> will have zero tolerance towards committing with failing tests. >> >>> And we >> >>>> need >> >>>>> to pick up those tests that should not be ignored and bring them >> >>> up again >> >>>>> but passing. If there is no disagreement, I can start working on >> >>> that. >> >>>>> >> >>>>> Once I am done, I can try to help with infra tickets too. >> >>>>> >> >>>>> -Jesús >> >>>>> >> >>>>> >> >>>>> On 5/11/18, 1:57 PM, "Vineet Garg" wrote: >> >>>>> >> >>>>> +1. I strongly vote for freezing commits and getting our >> >>> testing >> >>>>> coverage in acceptable state. We have been struggling to >> >> stabilize >> >>>>> branch-3 due to test failures and releasing Hive 3.0 in current >> >>> state >> >>>> would >> >>>>> be unacceptable. >> >>>>> >> >>>>> Currently there are quite a few test suites which are not >> >> even >> >>>> running >> >>>>> and are being timed out. We have been committing patches (to both >> >>>> branch-3 >> >>>>> and master) without test coverage for these tests. >> >>>>> We should immediately figure out what’s going on before we >> >>> proceed >> >>>>> with commits. >> >>>>> >> >>>>> For reference following test suites are timing out on >> >> master: ( >> >>>>> https://issues.apache.org/jira/browse/HIVE-19506) >> >>>>> >> >>>>> >> >>>>> TestDbNotificationListener - did not produce a TEST-*.xml >> >> file >> >>>> (likely >> >>>>> timed out) >> >>>>> >> >>>>> TestHCatHiveCompatibility - did not produce a TEST-*.xml file >> >>> (likely >> >>>>> timed out) >> >>>>> >> >>>>> TestNegativeCliDriver - did not produce a TEST-*.xml file >> >>> (likely >> >>>>> timed out) >> >>>>> >> >>>>> TestNonCatCallsWithCatalog - did not produce a TEST-*.xml >> >> file >> >>>> (likely >> >>>>> timed out) >> >>>>> >> >>>>> TestSequenceFileReadWrite - did not produce a TEST-*.xml file >> >>> (likely >> >>>>> timed out) >> >>>>> >> >>>>> TestTxnExIm - did not produce a TEST-*.xml file (likely timed >> >>> out) >> >>>>> >> >>>>> >> >>>>> Vineet >> >>>>> >> >>>>> >> >>>>> On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < >> >>>> vih...@cloudera.com >> >>>>>> wrote: >> >>>>> >> >>>>> +1 There are many problems with the test infrastructure and >> >> in >> >>> my >> >>>>> opinion >> >>>>> it has not become number one bottleneck for the project. I >> >> was >> >>>> looking >> >>>>> at >> >>>>> the infrastructure yesterday and I think the current >> >>> infrastructure >> >>>>> (even >> >>>>> its own set of problems) is still under-utilized. I am >> >>> planning to >> >>>>> increase >> >>>>> the number of threads to process the parallel test batches to >> >>> start >> >>>>> with. >> >>>>> It needs a restart on the server side. I can do it now, it >> >>> folks are >> >>>>> okay >> >>>>> with it. Else I can do it over weekend when the queue is >> >> small. >> >>>>> >> >>>>> I listed the improvements which I thought would be useful >> >> under >> >>>>> https://issues.apache.org/jira/browse/HIVE-19425 but frankly >> >>>> speaking >> >>>>> I am >> >>>>> not able to devote as much time as I would like to on it. I >> >>> would >> >>>>> appreciate if folks who have some more time if they can help >> >>> out. >> >>>>> >> >>>>> I think to start with https://issues.apache.org/ >> >>>> jira/browse/HIVE-19429 >> >>>>> will >> >>>>> help a lot. We need to pack more test runs in parallel and >> >>> containers >> >>>>> provide good isolation. >> >>>>> >> >>>>> For the short term green runs, I think we should @Ignore the >> >>> tests >> >>>>> which >> >>>>> are known to be failing since many runs. They are anyways not >> >>> being >> >>>>> addressed as such. If people think they are important to be >> >>> run we >> >>>>> should >> >>>>> fix them and only then re-enable them. >> >>>>> >> >>>>> Also, I feel we need light-weight test run which we can run >> >>> locally >> >>>>> before >> >>>>> submitting it for the full-suite. That way minor issues with >> >>> the >> >>>> patch >> >>>>> can >> >>>>> be handled locally. May be create a profile which runs a >> >>> subset of >> >>>>> important tests which are consistent. We can apply some label >> >>> that >> >>>>> pre-checkin-local tests are runs successful and only then we >> >>> submit >> >>>>> for the >> >>>>> full-suite. >> >>>>> >> >>>>> More thoughts are welcome. Thanks for starting this >> >>> conversation. >> >>>>> >> >>>>> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < >> >>>>> jcama...@apache.org> wrote: >> >>>>> >> >>>>> I believe we have reached a state (maybe we did reach it a >> >>> while ago) >> >>>>> that >> >>>>> is not sustainable anymore, as there are so many tests >> >> failing >> >>> / >> >>>>> timing out >> >>>>> that it is not possible to verify whether a patch is breaking >> >>> some >> >>>>> critical >> >>>>> parts of the system or not. It also seems to me that due to >> >> the >> >>>>> timeouts >> >>>>> (maybe due to infra, maybe not), ptest runs are taking even >> >>> longer >> >>>> than >> >>>>> usual, which in turn creates even longer queue of patches. >> >>>>> >> >>>>> There is an ongoing effort to improve ptests usability ( >> >>>>> https://issues.apache.org/jira/browse/HIVE-19425), but apart >> >>> from >> >>>>> that, >> >>>>> we need to make an effort to stabilize existing tests and >> >>> bring that >> >>>>> failure count to zero. >> >>>>> >> >>>>> Hence, I am suggesting *we stop committing any patch before >> >> we >> >>> get a >> >>>>> green >> >>>>> run*. If someone thinks this proposal is too radical, please >> >>> come up >> >>>>> with >> >>>>> an alternative, because I do not think it is OK to have the >> >>> ptest >> >>>> runs >> >>>>> in >> >>>>> their current state. Other projects of certain size (e.g., >> >>> Hadoop, >> >>>>> Spark) >> >>>>> are always green, we should be able to do the same. >> >>>>> >> >>>>> Finally, once we get to zero failures, I suggest we are less >> >>> tolerant >> >>>>> with >> >>>>> committing without getting a clean ptests run. If there is a >> >>> failure, >> >>>>> we >> >>>>> need to fix it or revert the patch that caused it, then we >> >>> continue >> >>>>> developing. >> >>>>> >> >>>>> Please, let’s all work together as a community to fix this >> >>> issue, >> >>>> that >> >>>>> is >> >>>>> the only way to get to zero quickly. >> >>>>> >> >>>>> Thanks, >> >>>>> Jesús >> >>>>> >> >>>>> PS. I assume the flaky tests will come into the discussion. >> >>> Let´s see >> >>>>> first how many of those we have, then we can work to find a >> >>> fix. >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best regards! >> >>> Rui Li >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> >>