[ https://issues.apache.org/jira/browse/HIVE-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166272#comment-16166272 ]
Peter Vary commented on HIVE-17532: ----------------------------------- [~pcsaszar]: We recently discussed the same topic in HIVE-17291 There the decision was, that on a production cluster one should use dynamic allocation, and for the other's it is better to use the actual cores, than the configured one. Thanks, Peter > Hive on Spark query compilation starts Spark session > ---------------------------------------------------- > > Key: HIVE-17532 > URL: https://issues.apache.org/jira/browse/HIVE-17532 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 2.2.0 > Reporter: Peter Csaszar > Priority: Minor > > Hive on Spark query compilation starts a new Spark session when some kind of > aggregation is present: > 0: jdbc:hive2://localhost:10000/default> set hive.execution.engine=spark; > No rows affected (0.013 seconds) > 0: jdbc:hive2://localhost:10000/default> explain select distinct label0 from > iris; > INFO : Compiling > command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf): > explain select distinct label0 from iris > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf); > Time taken: *40.594* seconds > Spark job started, all consecutive explain statements are fast: > 0: jdbc:hive2://localhost:10000/default> explain select distinct a1 from iris; > INFO : Compiling > command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1): > explain select distinct label0 from iris > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1); > Time taken: *0.275* seconds > Killing the Spark job, the same query is still fast, and no new Spark job has > been started: > 0: jdbc:hive2://localhost:10000/default> explain select distinct a2 from iris; > INFO : Compiling > command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35): > explain select distinct label0 from iris > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35); > Time taken: *0.213* seconds > The code in question: > SetSparkReducerParallelism.java: > sparkSessionManager = SparkSessionManagerImpl.getInstance(); > sparkSession = SparkUtilities.getSparkSession(context.getConf(), > sparkSessionManager); > sparkMemoryAndCores = sparkSession.getMemoryAndCores(); > The created Spark session is used for getting the number of cores and memory > only. This could be determined from the configurations, without actually > starting a session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)