[ https://issues.apache.org/jira/browse/HIVE-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mustafa Iman updated HIVE-23175: -------------------------------- Attachment: HIVE-23175.2.patch Status: Patch Available (was: In Progress) > Skip serializing hadoop and tez config on HS side > ------------------------------------------------- > > Key: HIVE-23175 > URL: https://issues.apache.org/jira/browse/HIVE-23175 > Project: Hive > Issue Type: Improvement > Components: Tez > Reporter: Mustafa Iman > Assignee: Mustafa Iman > Priority: Major > Attachments: HIVE-23175.1.patch, HIVE-23175.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HiveServer spends a lot of time serializing configuration objects. We can > skip putting hadoop and tez config xml files in payload assuming that the > configs are the same on both HS and Task side. This depends on Tez to load > local xml configs when creating config objects > [https://issues.apache.org/jira/browse/TEZ-4137] > Ideally we should be able to skip hive-site.xml too. However, if we skip > hive-site.xml at that stage, then we make wrong choices at tez dag build > stage due to missing configs. > In the ideal version of this, we should not be both looking up configs and > putting new configs from and to the same config object at DAG and Vertex > build phases. Instead we should be looking up from a HS2's HiveConf object > and writing to a brand new JobConf for each vertex. That way we would not > have any unnecessary item in the jobconf for any vertex. However Dag and > Vertex build stages (TezTask#build) and a lot of other components called from > there treat a single config object both the source of HS2 side config and the > target JobConf that they are putting vertex level options into. It is very > hard to separate these concerns now. > With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It > should improve the transmit latency. However, most significant gains are at > CPU time while compressing job configs as the config objects are much smaller > now. -- This message was sent by Atlassian Jira (v8.3.4#803005)