These looks pretty impressive. What execution mode were you running these? Yarn client may be?
*Query MR/sec TEZ/sec TEZ+LLAP/sec* 203.317 13.681 3.809 *Order of Magnitude* ------- 15 times 53 times *faster* My calculations on Hive 2 on Spark 1.3.1 (obviously we are comparing different bases but it is interesting as a sample) reflects the following: Table MR/sec Spark/sec Order of Magnitude faster Parquet 239.532 14.38 16 times ORC 202.333 17.77 11 times So the hybrid engine seems to make much difference which if I just consider Tez only and Tez + LLAP the gain is more than 3 times Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 18 July 2016 at 23:53, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > Also has there been simple benchmarks to compare: > > > > 1. Hive on MR > > 2. Hine on Tez > > 3. Hive on Tez with LLAP > > I ran one today, with a small BI query in my test suite against a 1Tb > data-set. > > TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s). > > *Warning*: This is not a historical view, all engines are using the same > new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical > planner and the physical scheduling is different between runs. > > The difference between pre-Stinger, Stinger and Stinger.next is much much > larger than this. > > < > https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-t > pcds/query55.sql> > > > select i_brand_id brand_id, i_brand brand, > sum(ss_ext_sales_price) ext_price > from date_dim, store_sales, item > where date_dim.d_date_sk = store_sales.ss_sold_date_sk > and store_sales.ss_item_sk = item.i_item_sk > and i_manager_id=36 > and d_moy=12 > and d_year=2001 > group by i_brand, i_brand_id > order by ext_price desc, i_brand_id > limit 100 ; > > > =================MRv2============== > > > set hive.execution.engine=mr; > > ... > 2016-07-18 22:22:57 Uploaded 1 File to: > file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22- > 43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile > 131--.hashtable (914 bytes) > > 2016-07-18 22:22:57 End of local task; Time Taken: 2.47 sec. > ... > Time taken: 203.317 seconds, Fetched: 100 row(s) > > =================Tez=============== > > > > set hive.execution.engine=tez; > set hive.llap.execution.mode=none; > > Time taken: 13.681 seconds, Fetched: 100 row(s) > > =================LLAP============== > > > set hive.llap.execution.mode=all; > > > > Task Execution Summary > --------------------------------------------------------------------------- > ------------------- > VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS > OUTPUT_RECORDS > --------------------------------------------------------------------------- > ------------------- > Map 1 1016.00 0 0 93,123,704 > 9,048 > Map 4 0.00 0 0 10,000 > 31 > Map 5 0.00 0 0 296,344 > 2,675 > Reducer 2 207.00 0 0 9,048 > 100 > Reducer 3 0.00 0 0 100 > 0 > --------------------------------------------------------------------------- > ------------------- > > > Query Execution Summary > --------------------------------------------------------------------------- > ------------------- > OPERATION DURATION > --------------------------------------------------------------------------- > ------------------- > Compile Query 1.64s > Prepare Plan 0.32s > Submit Plan 0.57s > Start DAG 0.21s > Run DAG 1.02s > --------------------------------------------------------------------------- > ------------------- > > > Time taken: 3.809 seconds, Fetched: 100 row(s) > > > Annoyingly now, the 1.64s to compile the query is a huge fraction, since > it only takes 1.02s to execute the join+aggregate over 93 million rows. > > Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing > once we merge HIVE-13995 into master. > > > More about the historical view, the new Vectorization codepaths are a big > part of this speed up, when you compare historically or against an > incompletely vectorized format like Parquet (HIVE-8128 looks abandoned). > > set hive.vectorized.execution.mapjoin.native.enabled=false; > > > Time taken: 34.372 seconds, Fetched: 100 row(s) > hive> > > > Cheers, > Gopal > > > > > > > > > >