[ https://issues.apache.org/jira/browse/HIVE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539287#comment-14539287 ]
Rui Li commented on HIVE-10671: ------------------------------- Why does each table have 2 sizes? The following is the output of the same command on my cluster: {code} [root@node13-1 ~]# hadoop fs -du -h /user/hive/warehouse/tpch_flat_orc_320.db 2.4 G /user/hive/warehouse/tpch_flat_orc_320.db/customer 53.8 G /user/hive/warehouse/tpch_flat_orc_320.db/lineitem 1.7 K /user/hive/warehouse/tpch_flat_orc_320.db/nation 12.6 G /user/hive/warehouse/tpch_flat_orc_320.db/orders 1.2 G /user/hive/warehouse/tpch_flat_orc_320.db/part 9.2 G /user/hive/warehouse/tpch_flat_orc_320.db/partsupp 980 /user/hive/warehouse/tpch_flat_orc_320.db/region 156.8 M /user/hive/warehouse/tpch_flat_orc_320.db/supplier {code} Q22 runs for about 57s in both yarn-client and yarn-cluster mode on my side. I'll try other cases. > yarn-cluster mode offers a degraded performance from yarn-client [Spark > Branch] > ------------------------------------------------------------------------------- > > Key: HIVE-10671 > URL: https://issues.apache.org/jira/browse/HIVE-10671 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Xuefu Zhang > Assignee: Rui Li > > With Hive on Spark, users noticed that in certain cases > spark.master=yarn-client offers 2x or 3x better performance than if > spark.master=yarn-cluster. However, yarn-cluster is what we recommend and > support. Thus, we should investigate and fix the problem. One of the such > queries is TPC-H 22. -- This message was sent by Atlassian JIRA (v6.3.4#6332)