[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127901#comment-14127901 ]
Josh Elser commented on HIVE-7950: ---------------------------------- I think I finally got to the bottom of this, and it is broken with Tez-0.5.0. TezTask needs to be altered (as described by the previous discussion) to add the necessary StorageHandler resources to the DAG using {code} dag.addTaskLocalFiles(localResources); {code} For the case of the AccumuloStorageHandler, this adds the jars necessary to connect to Accumulo to {{commonTaskLocalFiles}} in {{DAG}}. Then, {{TezTask}} will proceed to eventually submit the DAG to be run. {code} try { // ready to start execution on the cluster sessionState.getSession().addAppMasterLocalFiles(resourceMap); dagClient = sessionState.getSession().submitDAG(dag); } catch (SessionNotRunning nr) { console.printInfo("Tez session was closed. Reopening..."); // close the old one, but keep the tmp files around TezSessionPoolManager.getInstance().closeAndOpen(sessionState, this.conf); console.printInfo("Session re-established."); dagClient = sessionState.getSession().submitDAG(dag); } {code} Consider the case where we had a Session already created for the user, but the underlying application has exited, say due to a timeout. In the try block, we try to submit our DAG to run. In doing so, TezClient creates a DAGPlan from the DAG {code} DAGPlan dagPlan = dag.createDag(amConfig.getTezConfiguration()); {code} When we create a {{DAGPlan}} from the {{DAG}}, we modify the {{DAG}} instance, adding the local resources to each {{Vertex}} in the {{DAG}}. Then, we identify that the underlying application has already died, and that we need to {{closeAndOpen}} a new Session. So, we get the {{SessionNotRunning}} exception, pop out to the catch block, and end up creating another {{DAGPlan}} from the {{DAG}} _that was already altered by the last attempt to submit it_. As I'm looking at it, I don't think there's anything I can do at the Hive level to fix this because {{TezClient}} will always try to add duplicate resources to the {{Vertex}}'s in a {{DAG}} which throws an Exception and tanks the query. > StorageHandler resources aren't added to Tez Session if already Session is > already Open > --------------------------------------------------------------------------------------- > > Key: HIVE-7950 > URL: https://issues.apache.org/jira/browse/HIVE-7950 > Project: Hive > Issue Type: Bug > Components: StorageHandler, Tez > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 0.14.0 > > Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff > > > Was trying to run some queries using the AccumuloStorageHandler when using > the Tez execution engine. Some things that classes which were added to > tmpjars weren't making it into the container. When a Tez Session is already > open, as is the normal case when simply using the `hive` command, the > resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)