In order to understand the end-to-end precommit flow I would like to get access to the PreCommit-HIVE-Build jenkins script. Does anyone one know how can I get that?
On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < [email protected]> wrote: > Bq. For the short term green runs, I think we should @Ignore the tests > which > are known to be failing since many runs. They are anyways not being > addressed as such. If people think they are important to be run we should > fix them and only then re-enable them. > > I think that is a good idea, as we would minimize the time that we halt > development. We can create a JIRA where we list all tests that were > failing, and we have disabled to get the clean run. From that moment, we > will have zero tolerance towards committing with failing tests. And we need > to pick up those tests that should not be ignored and bring them up again > but passing. If there is no disagreement, I can start working on that. > > Once I am done, I can try to help with infra tickets too. > > -Jesús > > > On 5/11/18, 1:57 PM, "Vineet Garg" <[email protected]> wrote: > > +1. I strongly vote for freezing commits and getting our testing > coverage in acceptable state. We have been struggling to stabilize > branch-3 due to test failures and releasing Hive 3.0 in current state would > be unacceptable. > > Currently there are quite a few test suites which are not even running > and are being timed out. We have been committing patches (to both branch-3 > and master) without test coverage for these tests. > We should immediately figure out what’s going on before we proceed > with commits. > > For reference following test suites are timing out on master: ( > https://issues.apache.org/jira/browse/HIVE-19506) > > > TestDbNotificationListener - did not produce a TEST-*.xml file (likely > timed out) > > TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely > timed out) > > TestNegativeCliDriver - did not produce a TEST-*.xml file (likely > timed out) > > TestNonCatCallsWithCatalog - did not produce a TEST-*.xml file (likely > timed out) > > TestSequenceFileReadWrite - did not produce a TEST-*.xml file (likely > timed out) > > TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) > > > Vineet > > > On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar <[email protected] > <mailto:[email protected]>> wrote: > > +1 There are many problems with the test infrastructure and in my > opinion > it has not become number one bottleneck for the project. I was looking > at > the infrastructure yesterday and I think the current infrastructure > (even > its own set of problems) is still under-utilized. I am planning to > increase > the number of threads to process the parallel test batches to start > with. > It needs a restart on the server side. I can do it now, it folks are > okay > with it. Else I can do it over weekend when the queue is small. > > I listed the improvements which I thought would be useful under > https://issues.apache.org/jira/browse/HIVE-19425 but frankly speaking > I am > not able to devote as much time as I would like to on it. I would > appreciate if folks who have some more time if they can help out. > > I think to start with https://issues.apache.org/jira/browse/HIVE-19429 > will > help a lot. We need to pack more test runs in parallel and containers > provide good isolation. > > For the short term green runs, I think we should @Ignore the tests > which > are known to be failing since many runs. They are anyways not being > addressed as such. If people think they are important to be run we > should > fix them and only then re-enable them. > > Also, I feel we need light-weight test run which we can run locally > before > submitting it for the full-suite. That way minor issues with the patch > can > be handled locally. May be create a profile which runs a subset of > important tests which are consistent. We can apply some label that > pre-checkin-local tests are runs successful and only then we submit > for the > full-suite. > > More thoughts are welcome. Thanks for starting this conversation. > > On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < > [email protected]<mailto:[email protected]>> wrote: > > I believe we have reached a state (maybe we did reach it a while ago) > that > is not sustainable anymore, as there are so many tests failing / > timing out > that it is not possible to verify whether a patch is breaking some > critical > parts of the system or not. It also seems to me that due to the > timeouts > (maybe due to infra, maybe not), ptest runs are taking even longer than > usual, which in turn creates even longer queue of patches. > > There is an ongoing effort to improve ptests usability ( > https://issues.apache.org/jira/browse/HIVE-19425), but apart from > that, > we need to make an effort to stabilize existing tests and bring that > failure count to zero. > > Hence, I am suggesting *we stop committing any patch before we get a > green > run*. If someone thinks this proposal is too radical, please come up > with > an alternative, because I do not think it is OK to have the ptest runs > in > their current state. Other projects of certain size (e.g., Hadoop, > Spark) > are always green, we should be able to do the same. > > Finally, once we get to zero failures, I suggest we are less tolerant > with > committing without getting a clean ptests run. If there is a failure, > we > need to fix it or revert the patch that caused it, then we continue > developing. > > Please, let’s all work together as a community to fix this issue, that > is > the only way to get to zero quickly. > > Thanks, > Jesús > > PS. I assume the flaky tests will come into the discussion. Let´s see > first how many of those we have, then we can work to find a fix. > > > > > > > >
