Peter Csaszar created HIVE-17532: ------------------------------------ Summary: Hive on Spark query compilation starts Spark session Key: HIVE-17532 URL: https://issues.apache.org/jira/browse/HIVE-17532 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.2.0 Reporter: Peter Csaszar Priority: Minor
Hive on Spark query compilation starts a new Spark session when some kind of aggregation is present: 0: jdbc:hive2://localhost:10000/default> set hive.execution.engine=spark; No rows affected (0.013 seconds) 0: jdbc:hive2://localhost:10000/default> explain select distinct label0 from iris; INFO : Compiling command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf): explain select distinct label0 from iris INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf); Time taken: *40.594* seconds Spark job started, all consecutive explain statements are fast: 0: jdbc:hive2://localhost:10000/default> explain select distinct a1 from iris; INFO : Compiling command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1): explain select distinct label0 from iris INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1); Time taken: *0.275* seconds Killing the Spark job, the same query is still fast, and no new Spark job has been started: 0: jdbc:hive2://localhost:10000/default> explain select distinct a2 from iris; INFO : Compiling command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35): explain select distinct label0 from iris INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35); Time taken: *0.213* seconds The code in question: SetSparkReducerParallelism.java: sparkSessionManager = SparkSessionManagerImpl.getInstance(); sparkSession = SparkUtilities.getSparkSession(context.getConf(), sparkSessionManager); sparkMemoryAndCores = sparkSession.getMemoryAndCores(); The created Spark session is used for getting the number of cores and memory only. This could be determined from the configurations, without actually starting a session. -- This message was sent by Atlassian JIRA (v6.4.14#64029)