[ https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904758#comment-13904758 ]
Brock Noland commented on HIVE-860: ----------------------------------- bq. Are you proposing to change the contents of the hive-exec.jar as distributed with Hive or just as pushed to Hadoop for running a job? both bq. If it's the former won't it mean that any project that includes hive-exec.jar in it's pom.xml will have to change its pom to explicitly include all of the extra jars now in the fat jar? Nope. The previously shaded jars are listed as dependencies in the source pom file and thus they will be pulled in transitively by depending on hive-exec. I have verified this locally. That is after a mvn install before the patch all the currently shaded jars are removed from the published pom file so they are not pulled in transitively. After the patch, the only jar which is shaded is kryo, and it is the only one which is removed from the published pom. That is to say the other dependencies remain in the pom for clients. This is inline which my expectations. > Persistent distributed cache > ---------------------------- > > Key: HIVE-860 > URL: https://issues.apache.org/jira/browse/HIVE-860 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.12.0 > Reporter: Zheng Shao > Assignee: Brock Noland > Fix For: 0.13.0 > > Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, > HIVE-860.patch, HIVE-860.patch, HIVE-860.patch > > > DistributedCache is shared across multiple jobs, if the hdfs file name is the > same. > We need to make sure Hive put the same file into the same location every time > and do not overwrite if the file content is the same. > We can achieve 2 different results: > A1. Files added with the same name, timestamp, and md5 in the same session > will have a single copy in distributed cache. > A2. Filed added with the same name, timestamp, and md5 will have a single > copy in distributed cache. > A2 has a bigger benefit in sharing but may raise a question on when Hive > should clean it up in hdfs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)