[ https://issues.apache.org/jira/browse/HIVE-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120416#comment-17120416 ]
Mustafa Iman commented on HIVE-23175: ------------------------------------- [~ashutoshc] there is also one non-static method exposed in TEZ-4137, InputInitializerContext#getVertexConfiguration: {code:java} this.conf = new Configuration(initializerContext.getVertexConfiguration()); {code} I cannot get this patch working without the tez side. > Skip serializing hadoop and tez config on HS side > ------------------------------------------------- > > Key: HIVE-23175 > URL: https://issues.apache.org/jira/browse/HIVE-23175 > Project: Hive > Issue Type: Improvement > Components: Tez > Reporter: Mustafa Iman > Assignee: Mustafa Iman > Priority: Major > Labels: pull-request-available > Attachments: HIVE-23175.1.patch, HIVE-23175.2.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HiveServer spends a lot of time serializing configuration objects. We can > skip putting hadoop and tez config xml files in payload assuming that the > configs are the same on both HS and Task side. This depends on Tez to load > local xml configs when creating config objects > [https://issues.apache.org/jira/browse/TEZ-4137] > Ideally we should be able to skip hive-site.xml too. However, if we skip > hive-site.xml at that stage, then we make wrong choices at tez dag build > stage due to missing configs. > In the ideal version of this, we should not be both looking up configs and > putting new configs from and to the same config object at DAG and Vertex > build phases. Instead we should be looking up from a HS2's HiveConf object > and writing to a brand new JobConf for each vertex. That way we would not > have any unnecessary item in the jobconf for any vertex. However Dag and > Vertex build stages (TezTask#build) and a lot of other components called from > there treat a single config object both the source of HS2 side config and the > target JobConf that they are putting vertex level options into. It is very > hard to separate these concerns now. > With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It > should improve the transmit latency. However, most significant gains are at > CPU time while compressing job configs as the config objects are much smaller > now. -- This message was sent by Atlassian Jira (v8.3.4#803005)