Just wanted to close the loop on the TestMiniSparkOnYarnCliDriver test
failures. We will be able to re-enable most of them back on branch-3. The
ones which were disabled are being tracked separately in a different ticket
<https://issues.apache.org/jira/browse/HIVE-27146> but they don't look like
a blocker.

Hi Aman,

Do you know how close are we to reopening branch-3?

Thanks,
Vihang

On Sat, Mar 4, 2023 at 7:23 PM Aman Raj <raja...@microsoft.com.invalid>
wrote:

> Or you can cd into itests and run the command you are using. Just another
> way I run.
>
> Thanks,
> Aman.
> Get Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Aman Raj <raja...@microsoft.com>
> Sent: Saturday, March 4, 2023 7:20:36 PM
> To: dev@hive.apache.org <dev@hive.apache.org>
> Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
>
> Hi Vihang,
>
> Thanks a lot for working on this. Can you try using -Pqsplits,itests.
> Also, I usually give a -o option after doing a clean install.
>
> Thanks,
> Aman.
>
> Get Outlook for Android<https://aka.ms/AAb9ysg>
>
> ________________________________
> From: vihang karajgaonkar <vihan...@apache.org>
> Sent: Saturday, 4 March, 2023, 11:35
> To: dev@hive.apache.org <dev@hive.apache.org>
> Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
>
> [You don't often get email from vihan...@apache.org. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Just to update on the HoS test failures for TestMiniSparkOnYarnCliDriver, I
> think I was finally able to resolve them (at least on local). I had to
> revert HIVE-21044 because it was causing OOM for those tests. Also, in
> order for these tests to work we will have to downgrade netty from
> 4.1.69.Final to 4.1.51.Final. I understand that we had upgraded netty from
> 4.1.17.Final to 4.1.69.Final for CVEs but the highest netty version that we
> can support without breaking HoS is 4.1.51.Final. Note that 4.1.51.Final
> includes many of the CVEs which affected 4.1.17.Final so we are still in a
> better place than branch-3.1. Unfortunately, there is no good way to make
> HoS work with a higher netty version so I think we should downgrade the
> netty version to 4.1.51.Final for now and look at more options to upgrade
> it 4.1.69.Final in a separate ticket.
>
> I still need to understand why the tests which are working for me locally
> don't work on the PR job. I tried running the split test classes using the
> following command. Is that the right way to simulate builds from the PR
> job? Let me know if anyone has more ideas.
>
> mvn test
> -Dtest=org.apache.hadoop.hive.cli.split2.TestMiniSparkOnYarnCliDriver
> -Pqsplits
>
> Thanks,
> Vihang
>
>
> On Fri, Feb 17, 2023 at 4:01 AM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
>
> > Hello,
> >
> > Thanks Aman for bringing this up and also for cleaning up after others (I
> > saw that you raised tickets and PRs for addressing the failures).
> >
> > Many thanks to Vihang as well for helping out. Regarding flaky tests, yes
> > we should disable them as soon as we see them.
> > There have been some other discussions on how to approach flaky tests the
> > more recent I could find is here [1].
> >
> > Best,
> > Stamatis
> >
> > [1]
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Flv3bhlfoq8fwd9dwyjf7g4nx32wtrygv&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=vB4E9RakrfYFCHGsxque1mnx9gb06JEXuuW2LJTzttM%3D&reserved=0
> >
> > On Fri, Feb 17, 2023 at 4:37 AM Aman Raj <raja...@microsoft.com.invalid>
> > wrote:
> >
> > > Hi team,
> > >
> > > Thanks Vihang for looking into this. I have commented on the JIRA you
> > > created.
> > >
> > > Just to bring everyone's notice, I have seen that there has been a
> couple
> > > of pushes to branch-3, which has lead to 5 more new test failures. The
> > test
> > > failures are in orc_merge1, orc_merge2, orc_merge3, orc_merge4 and
> > > orc_merge10. These tests did not use to fail before. I would sincerely
> > urge
> > > the community to raise a PR against branch-3, so that the Jenkins
> > pipeline
> > > can run and then only merge things to branch-3. We had 2900+ failures
> > when
> > > we started 2 months back and now having brought it down to less than
> 15,
> > > new failures again has pushed us back in this effort.
> > >
> > > I would like to thank everyone who has participated in this effort and
> > > made it possible till this stage. Also, if the contributors can take
> > > ownership of these new test case failures and fix them, it will be of
> > great
> > > help.
> > >
> > > Thanks,
> > > Aman.
> > > ________________________________
> > > From: vihang karajgaonkar <vihan...@apache.org>
> > > Sent: Friday, February 17, 2023 6:10 AM
> > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> > >
> > > [You don't often get email from vihan...@apache.org. Learn why this is
> > > important at https://aka.ms/LearnAboutSenderIdentification ]
> > >
> > > Hi Aman,
> > >
> > > I created
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27087&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=AxFvKZaLesnxQ9o3hITgazLHWK7dcxl47JhVcBs0uKQ%3D&reserved=0
> > > to look into
> > > TestMiniSparkOnYarnCliDriver failures. I have a working theory of what
> > > might be going on there. I am still investigating what is the right way
> > to
> > > fix it though.
> > >
> > > Thanks,
> > > Vihang
> > >
> > > On Fri, Feb 10, 2023 at 10:26 AM Aman Raj
> <raja...@microsoft.com.invalid
> > >
> > > wrote:
> > >
> > > > Hi Vihang,
> > > >
> > > > Yes the tests are failing locally as well with the same issue.
> > > >
> > > > Thanks,
> > > > Aman.
> > > >
> > > > Get Outlook for Android<
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=O5l3TzSJJrjDJgqIdxUlB1VI7%2BcvXZxEq%2F0l9wvvY2s%3D&reserved=0
> > > >
> > > > ________________________________
> > > > From: Vihang Karajgaonkar <vihang.karajgaon...@databricks.com.INVALID
> >
> > > > Sent: Friday, February 10, 2023 11:22:15 PM
> > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> > > >
> > > > [You don't often get email from
> > > vihang.karajgaon...@databricks.com.invalid.
> > > > Learn why this is important at
> > > > https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > Thanks a lot Stamatis for starting this thread. I really appreciate
> all
> > > the
> > > > efforts to stabilize branch-3 to get it to a releasable state and I
> > agree
> > > > that we should get it to a green state before opening it for PRs not
> > > > related to test failures. I can help with the effort as well.
> > > >
> > > > If we want to get the branch back to green state soon, have we
> > considered
> > > > disabling the tests which are clearly flaky? (e.g pass on some builds
> > and
> > > > fail on the other build with no new code changes). If we don't do
> that,
> > > we
> > > > will keep playing whack a mole with those tests. I propose for such
> > tests
> > > > we should disable them and create tickets to unflake them separately.
> > > This
> > > > will help us get back to a green state faster.
> > > >
> > > > Hi Aman,
> > > > For TestMiniSparkOnYarnCliDriver failures, you probably should also
> > look
> > > > into the spark driver/application logs and see if there are
> > > infrastructure
> > > > errors (e.g OOMs). Are these tests failing when you run locally?
> > > >
> > > > Thanks,
> > > > Vihang
> > > >
> > > > On Tue, Feb 7, 2023 at 10:05 PM Aman Raj
> <raja...@microsoft.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > +1,
> > > > > Thanks Stamatis and Lazlo for helping in the test case fixes till
> > now.
> > > > >
> > > > > Team,
> > > > > I need help in fixing the following tests in Hive. I have tried
> > > different
> > > > > approaches but no luck till now.
> > > > > I am facing some issues in fixing the following tests :
> > > > > org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
> > > > >
> > > > > Issue :
> > > > > PREHOOK: Input: default@src
> > > > > PREHOOK: Output: default@src
> > > > > Failed to monitor Job[-1] with exception
> > > > > 'java.lang.IllegalStateException(Connection to remote Spark driver
> > was
> > > > > lost)' Last known state = SENT
> > > > > Failed to execute spark task, with exception
> > > > > 'java.lang.IllegalStateException(RPC channel is closed.)'
> > > > > FAILED: Execution Error, return code 1 from
> > > > > org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is
> > closed.
> > > > >
> > > > > History :
> > > > > Initially the tests had failed with errors which I fixed in the
> > > following
> > > > > task :
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=dRsV1sHLgLxon8eBYh%2BX6kG3YaR%2F8Lqd4aZGj4cFjs4%3D&reserved=0
> > > > >
> > > > > Does anyone know what the issue is here ? There are 6-7 failures
> > > because
> > > > > of this test case. Link to the failed test cases for the
> stacktrace :
> > > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3949%2F2%2Ftests%2F&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=SGCWDnQ5QUiy5ycAZWv1V4jXdQHh4zPMi4vtHwP1slU%3D&reserved=0
> > > > > Thanks,
> > > > > Aman.
> > > > >
> > > > > ________________________________
> > > > > From: László Bodor <bodorlaszlo0...@gmail.com>
> > > > > Sent: Tuesday, February 7, 2023 4:46 PM
> > > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > > Subject: [EXTERNAL] Re: Branch-3 backports and build stability
> > > > >
> > > > > +1
> > > > > also, if I merged something that I thought was for test stability
> > (but
> > > > > instead it was a feature), excuse me :)
> > > > > for reference, the whole green test initiative is tracked under
> this
> > > > > umbrella:
> > > > >
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26836&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=yHtzz3SnJq8iJdgDw50qU6KxXYfwEeVCvtHP1C9sFdg%3D&reserved=0
> > > > >
> > > > > Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2023.
> > febr.
> > > > 7.,
> > > > > K, 12:09):
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > The build in branch-3 is not yet green; there are ~25 test
> > failures.
> > > It
> > > > > is
> > > > > > a common practice that we shouldn't push changes on top of a
> broken
> > > > build
> > > > > > unless they are addressing test failures.
> > > > > >
> > > > > > Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor)
> are
> > > > > working
> > > > > > hard to stabilize the build for quite some time now. If you want
> to
> > > > help
> > > > > > out then start by reviewing, merging, and fixing things around
> test
> > > > > > failures.
> > > > > >
> > > > > > It's not yet the time to bring new features, upgrades, bugs,
> etc.,
> > in
> > > > > > branch-3. I would encourage  committers to not approve such
> changes
> > > > till
> > > > > we
> > > > > > get back to a stable branch.
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

Reply via email to