Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-28 Thread Sungwoo Park
For query 22, iceberg.mr.split.size affects the number of mappers. With the default value of 128MB, Hive creates much fewer mappers than it does on ORC tables. For query 64, it is due to a bug in shared work optimization. Setting hive.optimize.shared.work.extended to false produces correct res

Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-18 Thread Sungwoo Park
Hello Stamatis, We use a recent or the latest commit in the master branch and run Hive on Tez 0.10.2. For query 22, the slow execution seems to be related to the split size used in IcebergInputFormat.getSplits(). We will try to create a JIRA when we make more progress. For query 64, the re

Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-17 Thread Stamatis Zampetakis
Hi Sungwoo, Many thanks for sharing your findings; interesting observations. If you can please also share the project versions that you used for running the experiments. Best, Stamatis On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park wrote: > Hello, > > I ran the TPC-DS benchmark using Metastore

Result of the TPC-DS benchmark using Iceberg,

2022-11-15 Thread Sungwoo Park
Hello, I ran the TPC-DS benchmark using Metastore (in the traditional way) and Iceberg, and would like to share the result for those interested in Hive using Iceberg. The experiment used 1TB TPC-DS dataset stored as ORC. Here are a few findings. 1. Overall, Hive-Iceberg runs slightly faster