Hi Vihang,

Thanks a lot for working on this. Can you try using -Pqsplits,itests. Also, I 
usually give a -o option after doing a clean install.

Thanks,
Aman.

Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________
From: vihang karajgaonkar <vihan...@apache.org>
Sent: Saturday, 4 March, 2023, 11:35
To: dev@hive.apache.org <dev@hive.apache.org>
Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability

[You don't often get email from vihan...@apache.org. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Just to update on the HoS test failures for TestMiniSparkOnYarnCliDriver, I
think I was finally able to resolve them (at least on local). I had to
revert HIVE-21044 because it was causing OOM for those tests. Also, in
order for these tests to work we will have to downgrade netty from
4.1.69.Final to 4.1.51.Final. I understand that we had upgraded netty from
4.1.17.Final to 4.1.69.Final for CVEs but the highest netty version that we
can support without breaking HoS is 4.1.51.Final. Note that 4.1.51.Final
includes many of the CVEs which affected 4.1.17.Final so we are still in a
better place than branch-3.1. Unfortunately, there is no good way to make
HoS work with a higher netty version so I think we should downgrade the
netty version to 4.1.51.Final for now and look at more options to upgrade
it 4.1.69.Final in a separate ticket.

I still need to understand why the tests which are working for me locally
don't work on the PR job. I tried running the split test classes using the
following command. Is that the right way to simulate builds from the PR
job? Let me know if anyone has more ideas.

mvn test
-Dtest=org.apache.hadoop.hive.cli.split2.TestMiniSparkOnYarnCliDriver
-Pqsplits

Thanks,
Vihang


On Fri, Feb 17, 2023 at 4:01 AM Stamatis Zampetakis <zabe...@gmail.com>
wrote:

> Hello,
>
> Thanks Aman for bringing this up and also for cleaning up after others (I
> saw that you raised tickets and PRs for addressing the failures).
>
> Many thanks to Vihang as well for helping out. Regarding flaky tests, yes
> we should disable them as soon as we see them.
> There have been some other discussions on how to approach flaky tests the
> more recent I could find is here [1].
>
> Best,
> Stamatis
>
> [1] 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Flv3bhlfoq8fwd9dwyjf7g4nx32wtrygv&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=vB4E9RakrfYFCHGsxque1mnx9gb06JEXuuW2LJTzttM%3D&reserved=0
>
> On Fri, Feb 17, 2023 at 4:37 AM Aman Raj <raja...@microsoft.com.invalid>
> wrote:
>
> > Hi team,
> >
> > Thanks Vihang for looking into this. I have commented on the JIRA you
> > created.
> >
> > Just to bring everyone's notice, I have seen that there has been a couple
> > of pushes to branch-3, which has lead to 5 more new test failures. The
> test
> > failures are in orc_merge1, orc_merge2, orc_merge3, orc_merge4 and
> > orc_merge10. These tests did not use to fail before. I would sincerely
> urge
> > the community to raise a PR against branch-3, so that the Jenkins
> pipeline
> > can run and then only merge things to branch-3. We had 2900+ failures
> when
> > we started 2 months back and now having brought it down to less than 15,
> > new failures again has pushed us back in this effort.
> >
> > I would like to thank everyone who has participated in this effort and
> > made it possible till this stage. Also, if the contributors can take
> > ownership of these new test case failures and fix them, it will be of
> great
> > help.
> >
> > Thanks,
> > Aman.
> > ________________________________
> > From: vihang karajgaonkar <vihan...@apache.org>
> > Sent: Friday, February 17, 2023 6:10 AM
> > To: dev@hive.apache.org <dev@hive.apache.org>
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > [You don't often get email from vihan...@apache.org. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Hi Aman,
> >
> > I created
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27087&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=AxFvKZaLesnxQ9o3hITgazLHWK7dcxl47JhVcBs0uKQ%3D&reserved=0
> > to look into
> > TestMiniSparkOnYarnCliDriver failures. I have a working theory of what
> > might be going on there. I am still investigating what is the right way
> to
> > fix it though.
> >
> > Thanks,
> > Vihang
> >
> > On Fri, Feb 10, 2023 at 10:26 AM Aman Raj <raja...@microsoft.com.invalid
> >
> > wrote:
> >
> > > Hi Vihang,
> > >
> > > Yes the tests are failing locally as well with the same issue.
> > >
> > > Thanks,
> > > Aman.
> > >
> > > Get Outlook for Android<
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=O5l3TzSJJrjDJgqIdxUlB1VI7%2BcvXZxEq%2F0l9wvvY2s%3D&reserved=0
> > >
> > > ________________________________
> > > From: Vihang Karajgaonkar <vihang.karajgaon...@databricks.com.INVALID>
> > > Sent: Friday, February 10, 2023 11:22:15 PM
> > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> > >
> > > [You don't often get email from
> > vihang.karajgaon...@databricks.com.invalid.
> > > Learn why this is important at
> > > https://aka.ms/LearnAboutSenderIdentification ]
> > >
> > > Thanks a lot Stamatis for starting this thread. I really appreciate all
> > the
> > > efforts to stabilize branch-3 to get it to a releasable state and I
> agree
> > > that we should get it to a green state before opening it for PRs not
> > > related to test failures. I can help with the effort as well.
> > >
> > > If we want to get the branch back to green state soon, have we
> considered
> > > disabling the tests which are clearly flaky? (e.g pass on some builds
> and
> > > fail on the other build with no new code changes). If we don't do that,
> > we
> > > will keep playing whack a mole with those tests. I propose for such
> tests
> > > we should disable them and create tickets to unflake them separately.
> > This
> > > will help us get back to a green state faster.
> > >
> > > Hi Aman,
> > > For TestMiniSparkOnYarnCliDriver failures, you probably should also
> look
> > > into the spark driver/application logs and see if there are
> > infrastructure
> > > errors (e.g OOMs). Are these tests failing when you run locally?
> > >
> > > Thanks,
> > > Vihang
> > >
> > > On Tue, Feb 7, 2023 at 10:05 PM Aman Raj <raja...@microsoft.com.invalid
> >
> > > wrote:
> > >
> > > > +1,
> > > > Thanks Stamatis and Lazlo for helping in the test case fixes till
> now.
> > > >
> > > > Team,
> > > > I need help in fixing the following tests in Hive. I have tried
> > different
> > > > approaches but no luck till now.
> > > > I am facing some issues in fixing the following tests :
> > > > org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
> > > >
> > > > Issue :
> > > > PREHOOK: Input: default@src
> > > > PREHOOK: Output: default@src
> > > > Failed to monitor Job[-1] with exception
> > > > 'java.lang.IllegalStateException(Connection to remote Spark driver
> was
> > > > lost)' Last known state = SENT
> > > > Failed to execute spark task, with exception
> > > > 'java.lang.IllegalStateException(RPC channel is closed.)'
> > > > FAILED: Execution Error, return code 1 from
> > > > org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is
> closed.
> > > >
> > > > History :
> > > > Initially the tests had failed with errors which I fixed in the
> > following
> > > > task :
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=dRsV1sHLgLxon8eBYh%2BX6kG3YaR%2F8Lqd4aZGj4cFjs4%3D&reserved=0
> > > >
> > > > Does anyone know what the issue is here ? There are 6-7 failures
> > because
> > > > of this test case. Link to the failed test cases for the stacktrace :
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3949%2F2%2Ftests%2F&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=SGCWDnQ5QUiy5ycAZWv1V4jXdQHh4zPMi4vtHwP1slU%3D&reserved=0
> > > > Thanks,
> > > > Aman.
> > > >
> > > > ________________________________
> > > > From: László Bodor <bodorlaszlo0...@gmail.com>
> > > > Sent: Tuesday, February 7, 2023 4:46 PM
> > > > To: dev@hive.apache.org <dev@hive.apache.org>
> > > > Subject: [EXTERNAL] Re: Branch-3 backports and build stability
> > > >
> > > > +1
> > > > also, if I merged something that I thought was for test stability
> (but
> > > > instead it was a feature), excuse me :)
> > > > for reference, the whole green test initiative is tracked under this
> > > > umbrella:
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26836&data=05%7C01%7Crajaman%40microsoft.com%7C24312f2572754c8a428908db1c76210e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638135067023705364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=yHtzz3SnJq8iJdgDw50qU6KxXYfwEeVCvtHP1C9sFdg%3D&reserved=0
> > > >
> > > > Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2023.
> febr.
> > > 7.,
> > > > K, 12:09):
> > > >
> > > > > Hi all,
> > > > >
> > > > > The build in branch-3 is not yet green; there are ~25 test
> failures.
> > It
> > > > is
> > > > > a common practice that we shouldn't push changes on top of a broken
> > > build
> > > > > unless they are addressing test failures.
> > > > >
> > > > > Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor) are
> > > > working
> > > > > hard to stabilize the build for quite some time now. If you want to
> > > help
> > > > > out then start by reviewing, merging, and fixing things around test
> > > > > failures.
> > > > >
> > > > > It's not yet the time to bring new features, upgrades, bugs, etc.,
> in
> > > > > branch-3. I would encourage  committers to not approve such changes
> > > till
> > > > we
> > > > > get back to a stable branch.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > >
> > >
> >
>

Reply via email to