Hi Vihang/community, Thanks a lot Vihang for working on the major test failure. This blocked more than 35 test cases. Now we are down to the final 4 failures. I have analyzed some of them and here they are (Link : http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4067/12/tests) :
1. multi_in_clause - This was committed in HIVE-21685 without validating the scenario. This fails because Hive is not able to parse explain cbo select * from very_simple_table_for_in_test where name IN('g','r') AND name IN('a','b') If we want this to work, I am able to do it in my local. We have 2 options : a. Either revert HIVE-21685 since this scenario was not validated back then before adding this test. b. This fix was present in https://issues.apache.org/jira/browse/HIVE-20718 but to cherry pick this we need to cherry pick https://issues.apache.org/jira/browse/HIVE-17040 since HIVE-20718<https://issues.apache.org/jira/browse/HIVE-20718> has a lot of merge conflicts with HIVE-17040<https://issues.apache.org/jira/browse/HIVE-17040>. But after cherry picking these we have other failures to fix. 2. current_date_timestamp.q - This breaking change was committed in HIVE-21388 without validation. The failure is because again Hive is not able to parse explain cbo select current_timestamp() from alltypesorc The solution or revert option is same as point 1. 3. testBootstrapReplLoadRetryAfterFailureForPartitions() - This I have not investigated till now. 4. mm_all.q - This I have not investigated till now. Thanks, Aman. ________________________________ From: vihang karajgaonkar <vihan...@apache.org> Sent: Friday, March 17, 2023 8:42 PM To: dev@hive.apache.org <dev@hive.apache.org> Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability Just wanted to close the loop on the TestMiniSparkOnYarnCliDriver test failures. We will be able to re-enable most of them back on branch-3. The ones which were disabled are being tracked separately in a different ticket <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27146&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C19is4AtBNH04Dm1F1bwp4wVw6erFn736e47p6STrzE%3D&reserved=0> but they don't look like a blocker. Hi Aman, Do you know how close are we to reopening branch-3? Thanks, Vihang On Sat, Mar 4, 2023 at 7:23 PM Aman Raj <raja...@microsoft.com.invalid> wrote: > Or you can cd into itests and run the command you are using. Just another > way I run. > > Thanks, > Aman. > Get Outlook for > Android<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nAL14KzxAWwQAV5WJmfkBgaJh0M0wPwq5qORrXcQ6fk%3D&reserved=0> > ________________________________ > From: Aman Raj <raja...@microsoft.com> > Sent: Saturday, March 4, 2023 7:20:36 PM > To: dev@hive.apache.org <dev@hive.apache.org> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability > > Hi Vihang, > > Thanks a lot for working on this. Can you try using -Pqsplits,itests. > Also, I usually give a -o option after doing a clean install. > > Thanks, > Aman. > > Get Outlook for > Android<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nAL14KzxAWwQAV5WJmfkBgaJh0M0wPwq5qORrXcQ6fk%3D&reserved=0> > > ________________________________ > From: vihang karajgaonkar <vihan...@apache.org> > Sent: Saturday, 4 March, 2023, 11:35 > To: dev@hive.apache.org <dev@hive.apache.org> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability > > [You don't often get email from vihan...@apache.org. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > Just to update on the HoS test failures for TestMiniSparkOnYarnCliDriver, I > think I was finally able to resolve them (at least on local). I had to > revert HIVE-21044 because it was causing OOM for those tests. Also, in > order for these tests to work we will have to downgrade netty from > 4.1.69.Final to 4.1.51.Final. I understand that we had upgraded netty from > 4.1.17.Final to 4.1.69.Final for CVEs but the highest netty version that we > can support without breaking HoS is 4.1.51.Final. Note that 4.1.51.Final > includes many of the CVEs which affected 4.1.17.Final so we are still in a > better place than branch-3.1. Unfortunately, there is no good way to make > HoS work with a higher netty version so I think we should downgrade the > netty version to 4.1.51.Final for now and look at more options to upgrade > it 4.1.69.Final in a separate ticket. > > I still need to understand why the tests which are working for me locally > don't work on the PR job. I tried running the split test classes using the > following command. Is that the right way to simulate builds from the PR > job? Let me know if anyone has more ideas. > > mvn test > -Dtest=org.apache.hadoop.hive.cli.split2.TestMiniSparkOnYarnCliDriver > -Pqsplits > > Thanks, > Vihang > > > On Fri, Feb 17, 2023 at 4:01 AM Stamatis Zampetakis <zabe...@gmail.com> > wrote: > > > Hello, > > > > Thanks Aman for bringing this up and also for cleaning up after others (I > > saw that you raised tickets and PRs for addressing the failures). > > > > Many thanks to Vihang as well for helping out. Regarding flaky tests, yes > > we should disable them as soon as we see them. > > There have been some other discussions on how to approach flaky tests the > > more recent I could find is here [1]. > > > > Best, > > Stamatis > > > > [1] > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Flv3bhlfoq8fwd9dwyjf7g4nx32wtrygv&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mIYO5QQf%2Fnt7A%2FfB9v5WxYVWKzzrlu75GYWVcRu%2BJMU%3D&reserved=0 > > > > On Fri, Feb 17, 2023 at 4:37 AM Aman Raj <raja...@microsoft.com.invalid> > > wrote: > > > > > Hi team, > > > > > > Thanks Vihang for looking into this. I have commented on the JIRA you > > > created. > > > > > > Just to bring everyone's notice, I have seen that there has been a > couple > > > of pushes to branch-3, which has lead to 5 more new test failures. The > > test > > > failures are in orc_merge1, orc_merge2, orc_merge3, orc_merge4 and > > > orc_merge10. These tests did not use to fail before. I would sincerely > > urge > > > the community to raise a PR against branch-3, so that the Jenkins > > pipeline > > > can run and then only merge things to branch-3. We had 2900+ failures > > when > > > we started 2 months back and now having brought it down to less than > 15, > > > new failures again has pushed us back in this effort. > > > > > > I would like to thank everyone who has participated in this effort and > > > made it possible till this stage. Also, if the contributors can take > > > ownership of these new test case failures and fix them, it will be of > > great > > > help. > > > > > > Thanks, > > > Aman. > > > ________________________________ > > > From: vihang karajgaonkar <vihan...@apache.org> > > > Sent: Friday, February 17, 2023 6:10 AM > > > To: dev@hive.apache.org <dev@hive.apache.org> > > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability > > > > > > [You don't often get email from vihan...@apache.org. Learn why this is > > > important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > Hi Aman, > > > > > > I created > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27087&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ty%2BvnDmVyTiOXtgoH1YdVYfROcX9pKsx%2FhF6C6pPPaA%3D&reserved=0 > > > to look into > > > TestMiniSparkOnYarnCliDriver failures. I have a working theory of what > > > might be going on there. I am still investigating what is the right way > > to > > > fix it though. > > > > > > Thanks, > > > Vihang > > > > > > On Fri, Feb 10, 2023 at 10:26 AM Aman Raj > <raja...@microsoft.com.invalid > > > > > > wrote: > > > > > > > Hi Vihang, > > > > > > > > Yes the tests are failing locally as well with the same issue. > > > > > > > > Thanks, > > > > Aman. > > > > > > > > Get Outlook for Android< > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nAL14KzxAWwQAV5WJmfkBgaJh0M0wPwq5qORrXcQ6fk%3D&reserved=0 > > > > > > > > ________________________________ > > > > From: Vihang Karajgaonkar <vihang.karajgaon...@databricks.com.INVALID > > > > > > Sent: Friday, February 10, 2023 11:22:15 PM > > > > To: dev@hive.apache.org <dev@hive.apache.org> > > > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability > > > > > > > > [You don't often get email from > > > vihang.karajgaon...@databricks.com.invalid. > > > > Learn why this is important at > > > > https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > Thanks a lot Stamatis for starting this thread. I really appreciate > all > > > the > > > > efforts to stabilize branch-3 to get it to a releasable state and I > > agree > > > > that we should get it to a green state before opening it for PRs not > > > > related to test failures. I can help with the effort as well. > > > > > > > > If we want to get the branch back to green state soon, have we > > considered > > > > disabling the tests which are clearly flaky? (e.g pass on some builds > > and > > > > fail on the other build with no new code changes). If we don't do > that, > > > we > > > > will keep playing whack a mole with those tests. I propose for such > > tests > > > > we should disable them and create tickets to unflake them separately. > > > This > > > > will help us get back to a green state faster. > > > > > > > > Hi Aman, > > > > For TestMiniSparkOnYarnCliDriver failures, you probably should also > > look > > > > into the spark driver/application logs and see if there are > > > infrastructure > > > > errors (e.g OOMs). Are these tests failing when you run locally? > > > > > > > > Thanks, > > > > Vihang > > > > > > > > On Tue, Feb 7, 2023 at 10:05 PM Aman Raj > <raja...@microsoft.com.invalid > > > > > > > wrote: > > > > > > > > > +1, > > > > > Thanks Stamatis and Lazlo for helping in the test case fixes till > > now. > > > > > > > > > > Team, > > > > > I need help in fixing the following tests in Hive. I have tried > > > different > > > > > approaches but no luck till now. > > > > > I am facing some issues in fixing the following tests : > > > > > org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver > > > > > > > > > > Issue : > > > > > PREHOOK: Input: default@src > > > > > PREHOOK: Output: default@src > > > > > Failed to monitor Job[-1] with exception > > > > > 'java.lang.IllegalStateException(Connection to remote Spark driver > > was > > > > > lost)' Last known state = SENT > > > > > Failed to execute spark task, with exception > > > > > 'java.lang.IllegalStateException(RPC channel is closed.)' > > > > > FAILED: Execution Error, return code 1 from > > > > > org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is > > closed. > > > > > > > > > > History : > > > > > Initially the tests had failed with errors which I fixed in the > > > following > > > > > task : > > > > > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XJvVVYRbp2h8M2f%2BeAZdY5T1jwym5h3522kGS7tZWic%3D&reserved=0 > > > > > > > > > > Does anyone know what the issue is here ? There are 6-7 failures > > > because > > > > > of this test case. Link to the failed test cases for the > stacktrace : > > > > > > > > > > > > > > > https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3949%2F2%2Ftests%2F&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gVJNSjFUhvUUMiKghSW%2F6OMVRgxtjQxm5BJ2h0pTv2s%3D&reserved=0 > > > > > Thanks, > > > > > Aman. > > > > > > > > > > ________________________________ > > > > > From: László Bodor <bodorlaszlo0...@gmail.com> > > > > > Sent: Tuesday, February 7, 2023 4:46 PM > > > > > To: dev@hive.apache.org <dev@hive.apache.org> > > > > > Subject: [EXTERNAL] Re: Branch-3 backports and build stability > > > > > > > > > > +1 > > > > > also, if I merged something that I thought was for test stability > > (but > > > > > instead it was a feature), excuse me :) > > > > > for reference, the whole green test initiative is tracked under > this > > > > > umbrella: > > > > > > > > > > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26836&data=05%7C01%7Crajaman%40microsoft.com%7Cfe96faae91f8418ecaa108db26fa0a5e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638146627636747901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wIr0EHWrRcXh0D0lLvu8g5r0sxpdFkfn2pFu6Ag%2BJ38%3D&reserved=0 > > > > > > > > > > Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2023. > > febr. > > > > 7., > > > > > K, 12:09): > > > > > > > > > > > Hi all, > > > > > > > > > > > > The build in branch-3 is not yet green; there are ~25 test > > failures. > > > It > > > > > is > > > > > > a common practice that we shouldn't push changes on top of a > broken > > > > build > > > > > > unless they are addressing test failures. > > > > > > > > > > > > Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor) > are > > > > > working > > > > > > hard to stabilize the build for quite some time now. If you want > to > > > > help > > > > > > out then start by reviewing, merging, and fixing things around > test > > > > > > failures. > > > > > > > > > > > > It's not yet the time to bring new features, upgrades, bugs, > etc., > > in > > > > > > branch-3. I would encourage committers to not approve such > changes > > > > till > > > > > we > > > > > > get back to a stable branch. > > > > > > > > > > > > Best, > > > > > > Stamatis > > > > > > > > > > > > > > > > > > > > > >