Hi Sungwoo,
last time I tried to run TPCDS-based benchmark I stumbled upon a similar
situation, finally I found that statistics were not computed, so CBO was
not kicking in, and the automatic retry goes with CBO off which was failing
for something like 10 queries (subqueries cannot be decorrelated, but also
some runtime errors).

Making sure that (column) statistics were correctly computed fixed the
problem.

Can you check if this is the case for you?

HTH,
Alessandro

On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote:

> Hello Hive team,
>
> I wonder if anyone in the Hive team has tried the TPC-DS benchmark on
> the master branch recently.  We occasionally run TPC-DS system tests
> using the master branch, and the tests don't succeed completely. Here
> is how our TPC-DS tests proceed.
>
> 1. Compile and run Hive on Tez (not Hive-LLAP)
> 2. Load ORC tables from 1TB TPC-DS raw text data, and compute statistics
> 3. Run 99 TPC-DS queries which were slightly modified to return
> varying number of rows (rather than 100 rows)
> 4. Compare the results against the previous results
>
> The previous results were obtained and cross-checked by running Hive
> 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their
> correctness.
>
> For the latest commit in the master branch, step 2 fails. For earlier
> commits (for example, commits in February 2021), step 3 fails where
> several queries either fail or return wrong results.
>
> We can compile and report the test results in this mailing list, but
> would like to know if similar results have been reproduced by the Hive
> team, in order to make sure that we did not make errors in our tests.
>
> If it is okay to open a JIRA ticket that only reports failures in the
> TPC-DS test, we could also perform git bi-sect to locate the commit
> that begin to generate wrong results.
>
> --- Sungwoo Park
>
> On Tue, 1 Mar 2022, Zoltan Haindrich wrote:
>
> > Hey,
> >
> > Great to hear that we are on the same side regarding these things :)
> >
> > For around a week now - we have nightly builds for the master branch:
> > http://ci.hive.apache.org/job/hive-nightly/12/
> >
> > I think we have 1 blocker issue:
> > https://issues.apache.org/jira/browse/HIVE-25665
> >
> > I know about one more thing I would rather get fixed before we release
> it:
> > https://issues.apache.org/jira/browse/HIVE-25994
> > The best would be to introduce smoke tests (HIVE-22302) to ensure that
> > something like this will not happen in the future - but we should
> probably
> > start moving forward.
> >
> > I think we could call the first iteration of this as "4.0.0-alpha-1" :)
> >
> > I've added 4.0.0-alpha-1 as a version - and added the above two ticket
> to it.
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> >
> > Are there any more things you guys know which would be needed?
> >
> > cheers,
> > Zoltan
> >
> >
> > On 2/22/22 12:18 PM, Peter Vary wrote:
> >> I would vote for 4.0.0-alpha-1 or similar for all of the components.
> >>
> >> When we have more stable releases I would keep the 4.x.x schema, since
> >> everyone is familiar with it, and I do not see a really good reason to
> >> change it.
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >>> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com> wrote:
> >>>
> >>> +1 that would be awesome to see Hive master released after so long.
> >>>
> >>> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would
> pick
> >>> any 3.x or calendar date (which could tend to slip and be more
> >>> confusing?).
> >>>
> >>> Thanks in any case to get the ball rolling.
> >>> Szehon
> >>>
> >>> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> wrote:
> >>>
> >>>> Hey,
> >>>>
> >>>> Thank you guys for chiming in; versioning is for sure something we
> should
> >>>> get to some common ground.
> >>>> Its a triple problem right now; I think we have the following things:
> >>>> * storage-api
> >>>> ** we have "2.7.3-SNAPSHOT" in the repo
> >>>> ***
> >>>>
> https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
> >>>> ** meanwhile we already have 2.8.1 released to maven central
> >>>> ***
> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
> >>>> * standalone-metastore
> >>>> ** 4.0.0-SNAPSHOT in the repo
> >>>> ** last release is 3.1.2
> >>>> * hive
> >>>> ** 4.0.0-SNAPSHOT in the repo
> >>>> ** last release is 3.1.2
> >>>>
> >>>> Regarding the actual version number I'm not entirely sure where we
> should
> >>>> start the numbering - that's why I was referring to it as Hive-X in my
> >>>> first letter.
> >>>>
> >>>> I think the key point here would be to start shipping releases
> regularily
> >>>> and not the actual version number we will use - I'll kinda open to any
> >>>> versioning scheme which
> >>>> reflects that this is a newer release than 3.1.2.
> >>>>
> >>>> I could imagine the following ones:
> >>>> (A) start with something less expected; but keep 3 in the prefix to
> >>>> reflect that this is not yet 4.0
> >>>>      I can imagine the following numbers:
> >>>>      3.900.0, 3.901.0, ...
> >>>>      3.9.0, 3.9.1, ...
> >>>> (B) start 4.0.0
> >>>>      4.0.0, 4.1.0, ...
> >>>> (C) jump to some calendar based version number like 2022.2.9
> >>>>      trunk based development has pros and cons...making a move like
> this
> >>>> irreversibly pledges trunk based development; and makes release
> branches
> >>>> hard to introduce
> >>>> (X) somewhat orthogonal is to (also) use some suffixes
> >>>>      4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> >>>>      this is probably the most tempting to use - but this versioning
> >>>> schema with a non-changing MINOR and PATCH number will
> >>>>      also suggest that the actual software is fully compatible - and
> only
> >>>> bugs are being fixed - which will not be true...
> >>>>
> >>>> I really like the idea to suffix these releases with alpha or beta -
> >>>> which
> >>>> will communicate our level commitment that these are not 100%
> production
> >>>> ready artifacts.
> >>>>
> >>>> I think we could fix HIVE-25665; and probably experiment with
> >>>> 4.0.0-alpha1
> >>>> for start...
> >>>>
> >>>>> This also means there should *not* be a branch-4 after releasing Hive
> >>>> 4.0
> >>>>> and let that diverge (and becomes the next, super-ignored branch-3),
> >>>> correct; no need to keep a branch we don't maintain...but in any case
> I
> >>>> think we can postpone this decision until there will be something to
> >>>> release... :)
> >>>>
> >>>> cheers,
> >>>> Zoltan
> >>>>
> >>>>
> >>>>
> >>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> >>>>> Hi All!
> >>>>>
> >>>>> A purely technical question: what will the SNAPSHOT version become
> after
> >>>>> releasing Hive 4.0.0? I think this is important, as it defines and
> >>>> reflects
> >>>>> the future release plans.
> >>>>>
> >>>>> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 +
> branch-3.
> >>>>> Hive is an evolving and super-active project: if we want to make
> regular
> >>>>> releases, we should simply release Hive 4.0 and bump pom to
> >>>> 4.1.0-SNAPSHOT,
> >>>>> which clearly says that we can release Hive 4.1 anytime we want,
> without
> >>>>> being frustrated about "whether we included enough cool stuff to
> release
> >>>>> 5.0".
> >>>>>
> >>>>> This also means there should *not* be a branch-4 after releasing
> Hive
> >>>>> 4.0
> >>>>> and let that diverge (and becomes the next, super-ignored branch-3),
> >>>>> only
> >>>>> when we end up bringing a minor backward-incompatible thing that
> needs a
> >>>>> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. For
> me,
> >>>> a
> >>>>> branch called *branch-4.0* doesn't imply either I can expect cool
> >>>> releases
> >>>>> in the future from there or the branch is maintained and tries to be
> in
> >>>>> sync with the *master*.
> >>>>>
> >>>>> Regards,
> >>>>> Laszlo Bodor
> >>>>>
> >>>>> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta
> (id?pont:
> >>>>> 2022. febr. 8., K, 16:42):
> >>>>>
> >>>>>> Hello everyone,
> >>>>>> thank you for starting this discussion.
> >>>>>>
> >>>>>> I agree that releasing the master branch regularly and sufficiently
> >>>> often
> >>>>>> is welcome and vital for the health of the community.
> >>>>>>
> >>>>>> It would be great to hear from others too, especially PMC members
> and
> >>>>>> committers, but even simple contributors/followers as myself.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Alessandro
> >>>>>>
> >>>>>> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis <zabe...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Thanks for starting the discussion Zoltan.
> >>>>>>>
> >>>>>>> I strongly believe that it is important to have regular and often
> >>>>>> releases
> >>>>>>> otherwise people will create and maintain separate Hive forks.
> >>>>>>> The latter is not good for the project and the community may lose
> >>>>>> valuable
> >>>>>>> members because of it.
> >>>>>>>
> >>>>>>> Going forward I fully agree that there is no point bringing up
> strong
> >>>>>>> blockers for the next release. For sure there are many backward
> >>>>>>> incompatible changes and possibly unstable features but unless we
> get
> >>>>>>> a
> >>>>>>> release out it will be difficult to determine what is broken and
> what
> >>>>>> needs
> >>>>>>> to be fixed.
> >>>>>>>
> >>>>>>> Due to the big number of changes that are going to appear in the
> next
> >>>>>>> version I would suggest using the terms Hive X-alpha, Hive X-beta
> for
> >>>> the
> >>>>>>> first few releases. This will make it clear to the end users that
> they
> >>>>>> need
> >>>>>>> to be careful when upgrading from an older version and it will
> give us
> >>>> a
> >>>>>>> bit more time and freedom to treat issues that the users will
> likely
> >>>>>>> discover.
> >>>>>>>
> >>>>>>> The only real blocker that we may want to treat is HIVE-25665 [1]
> but
> >>>> we
> >>>>>>> can continue the discussion under that ticket and re-evaluate if
> >>>>>> necessary,
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Stamatis
> >>>>>>>
> >>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-25665
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu>
> wrote:
> >>>>>>>
> >>>>>>>> Hey All,
> >>>>>>>>
> >>>>>>>> We didn't made a release for a long time now; (3.1.2 was released
> on
> >>>> 26
> >>>>>>>> August 2019) - and I think because we didn't made that many
> branch-3
> >>>>>>>> releases; not too many fixes
> >>>>>>>> were ported there - which made that release branch kinda erode
> away.
> >>>>>>>>
> >>>>>>>> We have a lot of new features/changes in the current master.
> >>>>>>>> I think instead of aiming for big feature-packed releases we
> should
> >>>> aim
> >>>>>>>> for making a regular release every few months - we should make
> >>>>>>>> regular
> >>>>>>>> releases which people could
> >>>>>>>> install and use.
> >>>>>>>> After all releasing Hive after more than 2 years would be big step
> >>>>>>> forward
> >>>>>>>> in itself alone - we have so many improvements that I can't even
> >>>>>> count...
> >>>>>>>>
> >>>>>>>> But I may know not every aspects of the project / states of some
> >>>>>> internal
> >>>>>>>> features - so I would like to ask you:
> >>>>>>>> What would be the bare minimum requirements before we could
> release
> >>>> the
> >>>>>>>> current master as Hive X?
> >>>>>>>>
> >>>>>>>> There are many nice-to-have-s like:
> >>>>>>>> * hadoop upgrade
> >>>>>>>> * jdk11
> >>>>>>>> * remove HoS or MR
> >>>>>>>> * ?
> >>>>>>>> but I don't think these are blockers...we can make any of these
> in
> >>>>>>>> the
> >>>>>>>> next release if we start making them...
> >>>>>>>>
> >>>>>>>> cheers,
> >>>>>>>> Zoltan
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
>

Reply via email to