For query 22, iceberg.mr.split.size affects the number of mappers. With the
default value of 128MB, Hive creates much fewer mappers than it does on ORC
tables.
For query 64, it is due to a bug in shared work optimization. Setting
hive.optimize.shared.work.extended to false produces correct res
Hello Stamatis,
We use a recent or the latest commit in the master branch and run Hive on Tez
0.10.2.
For query 22, the slow execution seems to be related to the split size used in
IcebergInputFormat.getSplits(). We will try to create a JIRA when we make more
progress.
For query 64, the re
Hi Sungwoo,
Many thanks for sharing your findings; interesting observations.
If you can please also share the project versions that you used for running
the experiments.
Best,
Stamatis
On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park wrote:
> Hello,
>
> I ran the TPC-DS benchmark using Metastore
Hello,
I ran the TPC-DS benchmark using Metastore (in the traditional way) and Iceberg,
and would like to share the result for those interested in Hive using Iceberg.
The experiment used 1TB TPC-DS dataset stored as ORC.
Here are a few findings.
1. Overall, Hive-Iceberg runs slightly faster