[jira] [Created] (HADOOP-11680) Deduplicate jars in convenience binary distribution

Sean Busbey (JIRA) Thu, 05 Mar 2015 11:57:54 -0800

Sean Busbey created HADOOP-11680:
------------------------------------

             Summary: Deduplicate jars in convenience binary distribution
                 Key: HADOOP-11680
                 URL: https://issues.apache.org/jira/browse/HADOOP-11680
             Project: Hadoop Common
          Issue Type: Improvement
          Components: build
            Reporter: Sean Busbey
            Assignee: Sean Busbey



Pulled from discussion on HADOOP-11656 Colin wrote:

{quote}
bq. Andrew wrote: One additional note related to this, we can spend a lot of 
time right now distributing 100s of MBs of jar dependencies when launching a 
YARN job. Maybe this is ameliorated by the new shared distributed cache, but 
I've heard this come up quite a bit as a complaint. If we could meaningfully 
slim down our client, it could lead to a nice win.

I'm frustrated that nobody responded to my earlier suggestion that we 
de-duplicate jars. This would drastically reduce the size of our install, and 
without rearchitecting anything.
In fact I was so frustrated that I decided to write a program to do it myself 
and measure the delta. Here it is:

Before:
{code}
du -h /h
249M    /h
{code}
After:
{code}
du -h /h
140M    /h
{code}

Seems like deduplicating jars would be a much better project than splitting 
into a client jar, if we really cared about this.
<snip>
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11680) Deduplicate jars in convenience binary distribution

Reply via email to