[ https://issues.apache.org/jira/browse/HADOOP-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HADOOP-11680. --------------------------------------- Resolution: Duplicate I'm going to close this as a dupe of HADOOP-10115, especially since that was just committed. > Deduplicate jars in convenience binary distribution > --------------------------------------------------- > > Key: HADOOP-11680 > URL: https://issues.apache.org/jira/browse/HADOOP-11680 > Project: Hadoop Common > Issue Type: Improvement > Components: build > Reporter: Sean Busbey > Assignee: Sean Busbey > > Pulled from discussion on HADOOP-11656 Colin wrote: > {quote} > bq. Andrew wrote: One additional note related to this, we can spend a lot of > time right now distributing 100s of MBs of jar dependencies when launching a > YARN job. Maybe this is ameliorated by the new shared distributed cache, but > I've heard this come up quite a bit as a complaint. If we could meaningfully > slim down our client, it could lead to a nice win. > I'm frustrated that nobody responded to my earlier suggestion that we > de-duplicate jars. This would drastically reduce the size of our install, and > without rearchitecting anything. > In fact I was so frustrated that I decided to write a program to do it myself > and measure the delta. Here it is: > Before: > {code} > du -h /h > 249M /h > {code} > After: > {code} > du -h /h > 140M /h > {code} > Seems like deduplicating jars would be a much better project than splitting > into a client jar, if we really cared about this. > <snip> > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)