Sure Vihang, will look at the other ones. You can pick this up.

Thanks,
Aman.

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: vihang karajgaonkar <vihan...@apache.org>
Sent: Monday, March 20, 2023 7:58:48 AM
To: dev@hive.apache.org <dev@hive.apache.org>
Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability

I think we should revert offending commits first to unblock the branch. We
can create followup tickets to determine if these fixes are blockers for
3.2 release and if yes, we should merge them the right way with a green
test run. Fixing forward always comes with the risk that it introduces new
test failures.

Thanks for all your efforts on this Aman.

I can take a look at testBootstrapReplLoadRetryAfterFailureForPartitions if
you haven’t already started on it.

Thanks,
Vihang

On Sun, Mar 19, 2023 at 10:09 PM Aman Raj <raja...@microsoft.com.invalid>
wrote:

> Hi Vihang/community,
>
> Thanks a lot Vihang for working on the major test failure. This blocked
> more than 35 test cases. Now we are down to the final 4 failures. I have
> analyzed some of them and here they are  (Link :
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-4067%2F12%2Ftests&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ju9ucPcRMofa7DERJyURmawbC5J3oIiiGOKpqFdXPG8%3D&reserved=0)
> :
>
>   1.
> multi_in_clause - This was committed in HIVE-21685 without validating the
> scenario.
> This fails because Hive is not able to parse
> explain cbo
> select * from very_simple_table_for_in_test where name IN('g','r') AND
> name IN('a','b')
> If we want this to work, I am able to do it in my local. We have 2 options
> :
> a. Either revert HIVE-21685 since this scenario was not validated back
> then before adding this test.
> b. This fix was present in
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-20718&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=B%2FEBhlFoOCCxepgG4dfYuTZhExHcIHBU19%2BvVYiOFhY%3D&reserved=0
>  but to cherry pick this
> we need to cherry pick 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-17040&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NrdNa%2FV%2BHoFU757IV380iIoAnAQpBdAmnOhc9Iy41gE%3D&reserved=0
> since 
> HIVE-20718<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-20718&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=B%2FEBhlFoOCCxepgG4dfYuTZhExHcIHBU19%2BvVYiOFhY%3D&reserved=0>
>  has a
> lot of merge conflicts with  HIVE-17040<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-17040&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NrdNa%2FV%2BHoFU757IV380iIoAnAQpBdAmnOhc9Iy41gE%3D&reserved=0>.
>  But after cherry
> picking these we have other failures to fix.
>   2.
> current_date_timestamp.q - This breaking change was committed in
> HIVE-21388 without validation.
> The failure is because again Hive is not able to parse
> explain cbo select current_timestamp() from alltypesorc
> The solution or revert option is same as point 1.
>   3.
> testBootstrapReplLoadRetryAfterFailureForPartitions() - This I have not
> investigated till now.
>   4.
> mm_all.q - This I have not investigated till now.
>
> Thanks,
> Aman.
> ________________________________
> From: vihang karajgaonkar <vihan...@apache.org>
> Sent: Friday, March 17, 2023 8:42 PM
> To: dev@hive.apache.org <dev@hive.apache.org>
> Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
>
> Just wanted to close the loop on the TestMiniSparkOnYarnCliDriver test
> failures. We will be able to re-enable most of them back on branch-3. The
> ones which were disabled are being tracked separately in a different ticket
> <
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27146&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7X5VpRr%2BlHe%2FwrR19syyuFn3wtHqloC99kStdgOrelU%3D&reserved=0>
> but they don't look like
> a blocker.
>
> Hi Aman,
>
> Do you know how close are we to reopening branch-3?
>
> Thanks,
> Vihang
>
> On Sat, Mar 4, 2023 at 7:23 PM Aman Raj <raja...@microsoft.com.invalid>
> wrote:
>
> > Or you can cd into itests and run the command you are using. Just another
> > way I run.
> >
> > Thanks,
> > Aman.
> > Get Outlook for Android<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bpf9R2HM8NYhbTiv4n4K%2B475BJglu2IAg5P8w0cxdcE%3D&reserved=0
> >
> > ________________________________
> > From: Aman Raj <raja...@microsoft.com>
> > Sent: Saturday, March 4, 2023 7:20:36 PM
> > To: dev@hive.apache.org <dev@hive.apache.org>
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > Hi Vihang,
> >
> > Thanks a lot for working on this. Can you try using -Pqsplits,itests.
> > Also, I usually give a -o option after doing a clean install.
> >
> > Thanks,
> > Aman.
> >
> > Get Outlook for Android<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bpf9R2HM8NYhbTiv4n4K%2B475BJglu2IAg5P8w0cxdcE%3D&reserved=0
> >
> >
> > ________________________________
> > From: vihang karajgaonkar <vihan...@apache.org>
> > Sent: Saturday, 4 March, 2023, 11:35
> > To: dev@hive.apache.org <dev@hive.apache.org>
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > [You don't often get email from vihan...@apache.org. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Just to update on the HoS test failures for
> TestMiniSparkOnYarnCliDriver, I
> > think I was finally able to resolve them (at least on local). I had to
> > revert HIVE-21044 because it was causing OOM for those tests. Also, in
> > order for these tests to work we will have to downgrade netty from
> > 4.1.69.Final to 4.1.51.Final. I understand that we had upgraded netty
> from
> > 4.1.17.Final to 4.1.69.Final for CVEs but the highest netty version that
> we
> > can support without breaking HoS is 4.1.51.Final. Note that 4.1.51.Final
> > includes many of the CVEs which affected 4.1.17.Final so we are still in
> a
> > better place than branch-3.1. Unfortunately, there is no good way to make
> > HoS work with a higher netty version so I think we should downgrade the
> > netty version to 4.1.51.Final for now and look at more options to upgrade
> > it 4.1.69.Final in a separate ticket.
> >
> > I still need to understand why the tests which are working for me locally
> > don't work on the PR job. I tried running the split test classes using
> the
> > following command. Is that the right way to simulate builds from the PR
> > job? Let me know if anyone has more ideas.
> >
> > mvn test
> > -Dtest=org.apache.hadoop.hive.cli.split2.TestMiniSparkOnYarnCliDriver
> > -Pqsplits
> >
> > Thanks,
> > Vihang
> >
> >
> > On Fri, Feb 17, 2023 at 4:01 AM Stamatis Zampetakis <zabe...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > Thanks Aman for bringing this up and also for cleaning up after others
> (I
> > > saw that you raised tickets and PRs for addressing the failures).
> > >
> > > Many thanks to Vihang as well for helping out. Regarding flaky tests,
> yes
> > > we should disable them as soon as we see them.
> > > There have been some other discussions on how to approach flaky tests
> the
> > > more recent I could find is here [1].
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Flv3bhlfoq8fwd9dwyjf7g4nx32wtrygv&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=axMTbO1ru%2B4cW0Buw9Zq2JNzG%2FqxVrgVMbao7Ej1u4A%3D&reserved=0
> > >
> > > On Fri, Feb 17, 2023 at 4:37 AM Aman Raj <raja...@microsoft.com.invalid
> >
> > > wrote:
> > >
> > > > Hi team,
> > > >
> > > > Thanks Vihang for looking into this. I have commented on the JIRA you
> > > > created.
> > > >
> > > > Just to bring everyone's notice, I have seen that there has been a
> > couple
> > > > of pushes to branch-3, which has lead to 5 more new test failures.
> The
> > > test
> > > > failures are in orc_merge1, orc_merge2, orc_merge3, orc_merge4 and
> > > > orc_merge10. These tests did not use to fail before. I would
> sincerely
> > > urge
> > > > the community to raise a PR against branch-3, so that the Jenkins
> > > pipeline
> > > > can run and then only merge things to branch-3. We had 2900+ failures
> > > when
> > > > we started 2 months back and now having brought it down to less than
> > 15,
> > > > new failures again has pushed us back in this effort.
> > > >
> > > > I would like to thank everyone who has participated in this effort
> and
> > > > made it possible till this stage. Also, if the contributors can take
> > > > ownership of these new test case failures and fix them, it will be of
> > > great
> > > > help.
> > > >
> > > > Thanks,
> > > > Aman.
> > > > ________________________________
> > > > From: vihang karajgaonkar <vihan...@apache.org>
> > > > Sent: Friday, February 17, 2023 6:10 AM
> > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> > > >
> > > > [You don't often get email from vihan...@apache.org. Learn why this
> is
> > > > important at https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > Hi Aman,
> > > >
> > > > I created
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27087&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fxfHjGkxeC9kRPCRTtdLNK7mJMSX6g7xCfBN2Iu3bGA%3D&reserved=0
> > > > to look into
> > > > TestMiniSparkOnYarnCliDriver failures. I have a working theory of
> what
> > > > might be going on there. I am still investigating what is the right
> way
> > > to
> > > > fix it though.
> > > >
> > > > Thanks,
> > > > Vihang
> > > >
> > > > On Fri, Feb 10, 2023 at 10:26 AM Aman Raj
> > <raja...@microsoft.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Vihang,
> > > > >
> > > > > Yes the tests are failing locally as well with the same issue.
> > > > >
> > > > > Thanks,
> > > > > Aman.
> > > > >
> > > > > Get Outlook for Android<
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bpf9R2HM8NYhbTiv4n4K%2B475BJglu2IAg5P8w0cxdcE%3D&reserved=0
> > > > >
> > > > > ________________________________
> > > > > From: Vihang Karajgaonkar
> <vihang.karajgaon...@databricks.com.INVALID
> > >
> > > > > Sent: Friday, February 10, 2023 11:22:15 PM
> > > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> > > > >
> > > > > [You don't often get email from
> > > > vihang.karajgaon...@databricks.com.invalid.
> > > > > Learn why this is important at
> > > > > https://aka.ms/LearnAboutSenderIdentification ]
> > > > >
> > > > > Thanks a lot Stamatis for starting this thread. I really appreciate
> > all
> > > > the
> > > > > efforts to stabilize branch-3 to get it to a releasable state and I
> > > agree
> > > > > that we should get it to a green state before opening it for PRs
> not
> > > > > related to test failures. I can help with the effort as well.
> > > > >
> > > > > If we want to get the branch back to green state soon, have we
> > > considered
> > > > > disabling the tests which are clearly flaky? (e.g pass on some
> builds
> > > and
> > > > > fail on the other build with no new code changes). If we don't do
> > that,
> > > > we
> > > > > will keep playing whack a mole with those tests. I propose for such
> > > tests
> > > > > we should disable them and create tickets to unflake them
> separately.
> > > > This
> > > > > will help us get back to a green state faster.
> > > > >
> > > > > Hi Aman,
> > > > > For TestMiniSparkOnYarnCliDriver failures, you probably should also
> > > look
> > > > > into the spark driver/application logs and see if there are
> > > > infrastructure
> > > > > errors (e.g OOMs). Are these tests failing when you run locally?
> > > > >
> > > > > Thanks,
> > > > > Vihang
> > > > >
> > > > > On Tue, Feb 7, 2023 at 10:05 PM Aman Raj
> > <raja...@microsoft.com.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > > > +1,
> > > > > > Thanks Stamatis and Lazlo for helping in the test case fixes till
> > > now.
> > > > > >
> > > > > > Team,
> > > > > > I need help in fixing the following tests in Hive. I have tried
> > > > different
> > > > > > approaches but no luck till now.
> > > > > > I am facing some issues in fixing the following tests :
> > > > > > org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
> > > > > >
> > > > > > Issue :
> > > > > > PREHOOK: Input: default@src
> > > > > > PREHOOK: Output: default@src
> > > > > > Failed to monitor Job[-1] with exception
> > > > > > 'java.lang.IllegalStateException(Connection to remote Spark
> driver
> > > was
> > > > > > lost)' Last known state = SENT
> > > > > > Failed to execute spark task, with exception
> > > > > > 'java.lang.IllegalStateException(RPC channel is closed.)'
> > > > > > FAILED: Execution Error, return code 1 from
> > > > > > org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is
> > > closed.
> > > > > >
> > > > > > History :
> > > > > > Initially the tests had failed with errors which I fixed in the
> > > > following
> > > > > > task :
> > > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BjUB874gg7OVYBqF3NoUCWGY8LjCzg0tzteuEu9t1Cw%3D&reserved=0
> > > > > >
> > > > > > Does anyone know what the issue is here ? There are 6-7 failures
> > > > because
> > > > > > of this test case. Link to the failed test cases for the
> > stacktrace :
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3949%2F2%2Ftests%2F&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9W%2FVM8oE1Uz%2FSVOKOsCT10pkZt2fRdbJnDnZBRr2LBs%3D&reserved=0
> > > > > > Thanks,
> > > > > > Aman.
> > > > > >
> > > > > > ________________________________
> > > > > > From: László Bodor <bodorlaszlo0...@gmail.com>
> > > > > > Sent: Tuesday, February 7, 2023 4:46 PM
> > > > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > > > Subject: [EXTERNAL] Re: Branch-3 backports and build stability
> > > > > >
> > > > > > +1
> > > > > > also, if I merged something that I thought was for test stability
> > > (but
> > > > > > instead it was a feature), excuse me :)
> > > > > > for reference, the whole green test initiative is tracked under
> > this
> > > > > > umbrella:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26836&data=05%7C01%7Crajaman%40microsoft.com%7C3c77d352209146ba91ec08db28eae05e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148761521049046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UjJtD9Z8piukwlZB6PC8unoRvSbQlDPx2X6e5JcPZh4%3D&reserved=0
> > > > > >
> > > > > > Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2023.
> > > febr.
> > > > > 7.,
> > > > > > K, 12:09):
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > The build in branch-3 is not yet green; there are ~25 test
> > > failures.
> > > > It
> > > > > > is
> > > > > > > a common practice that we shouldn't push changes on top of a
> > broken
> > > > > build
> > > > > > > unless they are addressing test failures.
> > > > > > >
> > > > > > > Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor)
> > are
> > > > > > working
> > > > > > > hard to stabilize the build for quite some time now. If you
> want
> > to
> > > > > help
> > > > > > > out then start by reviewing, merging, and fixing things around
> > test
> > > > > > > failures.
> > > > > > >
> > > > > > > It's not yet the time to bring new features, upgrades, bugs,
> > etc.,
> > > in
> > > > > > > branch-3. I would encourage  committers to not approve such
> > changes
> > > > > till
> > > > > > we
> > > > > > > get back to a stable branch.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>

Reply via email to