[ https://issues.apache.org/jira/browse/HIVE-23802?focusedWorklogId=483600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483600 ]
ASF GitHub Bot logged work on HIVE-23802: ----------------------------------------- Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1206: URL: https://github.com/apache/hive/pull/1206 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 483600) Time Spent: 0.5h (was: 20m) > “merge files” job was submited to default queue when set hive.merge.tezfiles > to true > ------------------------------------------------------------------------------------ > > Key: HIVE-23802 > URL: https://issues.apache.org/jira/browse/HIVE-23802 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.1.0 > Reporter: gaozhan ding > Assignee: gaozhan ding > Priority: Major > Labels: pull-request-available > Attachments: 15940042679272.png, HIVE-23802.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > We use tez as the query engine. When hive.merge.tezfiles set to true,merge > files task, which followed by orginal task, will be submit to default queue > rather then the queue same with orginal task. > I study this issue for days and found that, every time starting a container, > "tez,queue.name" whill be unset in current session. Code are as below: > {code:java} > // TezSessionState.startSessionAndContainers() > // sessionState.getQueueName() comes from cluster wide configured queue names. > // sessionState.getConf().get("tez.queue.name") is explicitly set by user in > a session. > // TezSessionPoolManager sets tez.queue.name if user has specified one or > use the one from > // cluster wide queue names. > // There is no way to differentiate how this was set (user vs system). > // Unset this after opening the session so that reopening of session uses > the correct queue > // names i.e, if client has not died and if the user has explicitly set a > queue name > // then reopened session will use user specified queue name else default > cluster queue names. > conf.unset(TezConfiguration.TEZ_QUEUE_NAME); > {code} > So after the orgin task was submited to yarn, "tez.queue.name" will be unset. > While starting merge file task, it will try use the same session with orgin > job, but get false due to tez.queue.name was unset. Seems like we could not > unset this property. > {code:java} > // TezSessionPoolManager.canWorkWithSameSession() > if (!session.isDefault()) { > String queueName = session.getQueueName(); > String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME); > LOG.info("Current queue name is " + queueName + " incoming queue name is " > + confQueueName); > return (queueName == null) ? confQueueName == null : > queueName.equals(confQueueName); > } else { > // this session should never be a default session unless something has > messed up. > throw new HiveException("The pool session " + session + " should have been > returned to the pool"); > } > {code} > !15940042679272.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)