[ https://issues.apache.org/jira/browse/HIVE-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630702#comment-14630702 ]
Xuefu Zhang commented on HIVE-11276: ------------------------------------ Hi [~chengxiang li], your analysis is correct. I realized after I created this JIRA that we are not uploading the jars every time even though refreshLocalResources() is called. This is fine. Also, dynamic allocation worked well with the existing implementation. Therefore, the JIRA is "not a problem". I'm going to close this one. I think we need a pre-warm containers for user sessions that executes only one query and then exits, such as those issued by Oozie. Spark session can be created right after user connects to Hive and the execution engine is Spark. This way, the remote driver and the executors will be up when the query comes. As part of that, some jars, such as hive-exec.jar, can be also uploaded to HDSF. Of course, connection will be slower. Thus, we need a configuration to turn on this. What do you think? > Optimization around job submission and adding jars [Spark Branch] > ----------------------------------------------------------------- > > Key: HIVE-11276 > URL: https://issues.apache.org/jira/browse/HIVE-11276 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Assignee: Chengxiang Li > > It seems that Hive on Spark has some room for performance improvement on job > submission. Specifically, we are calling refreshLocalResources() for every > job submission despite there is are no changes in the jar list. Since Hive on > Spark is reusing the containers in the whole user session, we might be able > to optimize that. > We do need to take into consideration the case of dynamic allocation, in > which new executors might be added. > This task is some R&D in this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)