I tried 10TB TPC-DS benchmark with Iceberg, but from preliminary results,
the execution time increases about 20% (total execution time from 4900s to
5800s, geo-mean from 19s to 21s). However, please note that the result is
not conclusive because 1) I used the build from last November, instead of
th
Hi,
In my opinion, another major issue to address before switching to Iceberg
as the default is Iceberg catalog support, e.g.:
HIVE-28658: Iceberg REST Catalog Support
HIVE-28879: Federated Catalog support
My guess is that potential new users would be quite surprised to find no
support for the I
ve release.
>
> If you think there is still a bug remaining in the latest releases (in
> either Hadoop or Hive), please let us know.
>
> Chris Nauroth
>
>
> On Tue, Feb 4, 2025 at 5:25 AM Ayush Saxena wrote:
>
>> Thanks Sungwoo Park for sharing the details. I
I reported a bug in ZStandardCodec in the Hadoop library. If you run Hive with
ZStandard compression for Tez intermediate data, you might be affected by this
bug.
https://issues.apache.org/jira/browse/HDFS-14099
The problem occurs when the input file is large (e.g., 25MB) and does not
compres
TEZ-4577.
Thanks,
--- Sungwoo Park
On Sun, Dec 15, 2024 at 6:11 PM Butao Zhang wrote:
> Hi dev,
>
> As we discussed in this thread[1], we are planning to release
> the Apache Hive 4.1.x version. IMO, Many of Tez commits are coupled with
> Hive development. Therefore, sh
Congratulations and huge thanks to Apache Hive team and contributors for
releasing Hive 4. We have been watching the development of Hive 4 since the
release of Hive 3.1, and it's truly satisfying to witness the resolution of
all the critical issues at last after 5 years. Hive 4 comes with a lot of
>
> Based on HIVE-26654, it looks like we have 3 PR pending review:
> 1. HIVE-26986 - Query 71
> 2. HIVE-27006 - Query 2
> 3. HIVE-27269 - Query 97 (is that ready to be reviewed?)
>
Yes, Seonggon just submitted a pull request for HIVE-27269. It is not a
simple fix that I originally proposed - it i
Hi everyone,
I would like to resume the discussion on the release of Hive 4 and
the result of the TPC-DS benchmark.
Currently there are four unresolved JIRAs marked 'hive-4.0.0-must' which must be
resolved before the release of Hive 4 ([1], [2], [3], [4]). The most urgent one
is perhaps HIVE
In addition to the two main benefits summarized by Rory, I would like to
add another benefit of using remote shuffle service:
3. If you run large jobs in public clouds, sometimes the amount of local
storage attached to your instances can be a limiting factor. By using
remote shuffle service, you c
Hi, everyone.
I have not tested the master branch with Java 11/17 yet, but I would like
to share my experience with testing a fork of branch-3.1 with Java 11/17
(as part of developing Hive-MR3), in case that it can be useful for the
discussion. I merged the patches listed in [1] HIVE-22415 and upd
optimization can miss a chance. Of course, I know
> it can also positively work in some cases.
>
> Note that the version I used is a bit old, my memory could be wrong, and
> again I am not sure about the concrete background of HIVE-21189.
>
> Thanks,
> Okumin
>
>
> On
Hello,
In HIVE-21189 [1], the default value for hive.merge.nway.joins is set to
false. There is no record of why it was set to false, and I would like to
understand the background for the decision. Specifically I wonder if the
following situation is relevant to the decision.
Example)
MapJoinOp_1
I think such nightly builds will be useful for testing and debugging in the
future.
I also wonder if we can somehow create builds even from previous commits
(e.g., for the past few years). Such builds from previous commits don't
have to be daily builds, and I think weekly builds (or even monthly b
I am sorry for spamming -- My email address is: glap...@gmail.com
Thanks,
--- Sungwoo Park
On Fri, May 19, 2023 at 3:11 PM Sungwoo Park wrote:
> If non-committers can join the slack channel, I would like to join, too.
> An invitation will be appreciated very much (glapa...@gma
If non-committers can join the slack channel, I would like to join, too. An
invitation will be appreciated very much (glapa...@gmail.com).
Thanks,
--- Sungwoo Park
On Fri, May 19, 2023 at 2:49 PM Butao Zhang wrote:
> Hi, Hive dev
>
>
> I just saw this updated page:
> https://c
Hi,
HIVE-25170 fixes the same bug as in your pull request.
Thanks,
--- Sungwoo
On Fri, May 12, 2023 at 4:04 PM Suprith Chandrashekharachar <
suprith.chandrashekharac...@treasure-data.com> wrote:
> Hi,
>
> I opened this ticket about 2 years ago hoping to get a review. I didn't
> hear any feedba
I would like to add another question to the list of Laszlo.
4) When a specific DI framework is chosen, what kinds of new dependencies
will be introduced? (Are they conflicting with existing dependencies of
Hive?)
Regards,
--- Sungwoo Park
On Thu, Apr 13, 2023 at 4:43 PM László Bodor
wrote
Hi Stamatis,
For the correctness issue, we wanted to solve the problem ourselves and
have made a few pull requests in [1] so far. (We would like to kindly
request Hive committers to review the pull requests.) For HIVE-27226, we
are working on a solution and will create a pull request when a solu
LLAP are the new execution engines,
these tests should be migrated as well.
Sungwoo Park
[1] https://issues.apache.org/jira/browse/HIVE-26654
[2] https://issues.apache.org/jira/browse/HIVE-27226
On Wed, Apr 12, 2023 at 10:12 PM Stamatis Zampetakis
wrote:
> Hey Laszlo,
>
> Dependency
I like the proposal very much. (Then, hopefully this mailing list will
be useful to outside contributors as well.)
--- Sungwoo Park
On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
Hi everyone,
In the last Hive board report someone mentioned that the volume of Jira
notification emails to the
Sungwoo Park wrote:
Hello,
I would like to expand the list of blockers with HIVE-27138 [1] which fixes NPE
on mapjoin_filter_on_outerjoin.q.
Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution
engine and shows no problem. However, it shows a few problems when tested with
Hello,
I would like to expand the list of blockers with HIVE-27138 [1] which fixes NPE
on mapjoin_filter_on_outerjoin.q.
Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution
engine and shows no problem. However, it shows a few problems when tested with
Tez execution eng
Sungwoo Park created HIVE-27134:
---
Summary: SharedWorkOptimizer merges TableScan operators that have
different DPP parents
Key: HIVE-27134
URL: https://issues.apache.org/jira/browse/HIVE-27134
Project
ey get
reviewed.
Best regards,
Alessandro
On Tue, 14 Feb 2023 at 15:06, Sungwoo Park wrote:
Seonggon created three JIRAs a while ago which affect the result of TPC-DS
queries,
and I wonder if anyone would have time for reviewing the pull requests.
HIVE-26968: SharedWorkOptimizer merges TableScan
Hive 4.0.0,
it does not seem like a good plan to release Hive 4.0.0 that fails on some
TPC-DS queries.
Thanks!
Sungwoo Park
Sungwoo Park created HIVE-27082:
---
Summary: AggregateStatsCache.findBestMatch() in Metastore should
test the inclusion of default partition name
Key: HIVE-27082
URL: https://issues.apache.org/jira/browse/HIVE-27082
results for query
64.
Because of several bugs in shared work optimization (and parallel edge fixer),
it might make sense to set the default value of
hive.optimize.shared.work to false in HiveConf.java.
--- Sungwoo
On Fri, 18 Nov 2022, Sungwoo Park wrote:
Hello Stamatis,
We use a recent or
your findings; interesting observations.
If you can please also share the project versions that you used for running
the experiments.
Best,
Stamatis
On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park wrote:
Hello,
I ran the TPC-DS benchmark using Metastore (in the traditional way) and
Iceberg,
and
Hello,
I ran the TPC-DS benchmark using Metastore (in the traditional way) and Iceberg,
and would like to share the result for those interested in Hive using Iceberg.
The experiment used 1TB TPC-DS dataset stored as ORC.
Here are a few findings.
1. Overall, Hive-Iceberg runs slightly faster
Sungwoo Park created HIVE-26732:
---
Summary: Iceberg uses "null" and does not use the configuration
key "hive.exec.default.partition.name" for default partitions.
Key: HIVE-26732
URL: https://is
Sungwoo Park created HIVE-26668:
---
Summary: Upgrade ORC version to 1.6.11
Key: HIVE-26668
URL: https://issues.apache.org/jira/browse/HIVE-26668
Project: Hive
Issue Type: Bug
Sungwoo Park created HIVE-26660:
---
Summary: TPC-DS query 71 returns wrong results
Key: HIVE-26660
URL: https://issues.apache.org/jira/browse/HIVE-26660
Project: Hive
Issue Type: Bug
Sungwoo Park created HIVE-26659:
---
Summary: TPC-DS query 16, 69, 94 return wrong results.
Key: HIVE-26659
URL: https://issues.apache.org/jira/browse/HIVE-26659
Project: Hive
Issue Type: Bug
Sungwoo Park created HIVE-26655:
---
Summary: TPC-DS query 17 returns wrong results
Key: HIVE-26655
URL: https://issues.apache.org/jira/browse/HIVE-26655
Project: Hive
Issue Type: Bug
Sungwoo Park created HIVE-26654:
---
Summary: Test with the TPC-DS benchmark
Key: HIVE-26654
URL: https://issues.apache.org/jira/browse/HIVE-26654
Project: Hive
Issue Type: Bug
Affects
milar results have been reproduced by the Hive
team, in order to make sure that we did not make errors in our tests.
If it is okay to open a JIRA ticket that only reports failures in the
TPC-DS test, we could also perform git bi-sect to locate the commit
that begin to generate wrong results.
--- Sungwoo
>
> > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail.
> Please see the attachment for stack traces.
>
> Even thru the exception seem to be a reoccurance of the previous issue -
> existing checks + HIVE-24360 should have restricted all incorrect cases.
> I built in some debug
have automated the entire experiment, so if you would like to see the
result of testing a new commit, I would be happy to rerun the experiment
and get back to you.)
Cheers,
--- Sungwoo
On Thu, Nov 12, 2020 at 10:49 PM Zoltan Haindrich wrote:
> Hey Sungwoo!
>
> On 11/12/20 10:23 AM, S
Hi Zoltan,
I used the same hive-site.xml for the previous test (which was okay) and
the new test (which failed), so my guess is that it is perhaps due to a
commit since the previous test. Let me try later to identify the commit
that fails query 14, with the hope that identifying such a commit migh
Hi Stamatis, Mustafa, Zoltán,
This is the result of a new experiment. These are the changes that I made:
1. Reverted HIVE-24139. (It turns out that HIVE-24139 does not affect the
result of the TPC-DS benchmark.)
2. Set hive.optimize.shared.work.dppunion to false in hive-site.xml.
3. Set tez.runt
Hello,
I have tested a recent commit of the master branch using the TPC-DS
benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is:
1) create a database consisting of external tables from a 100GB TPC-DS text
dataset
2) create a database consisting of ORC tables from the previous databa
41 matches
Mail list logo