Following up on this one, the hadoop-tools/ module is already in trunk, distcp v2 addition could start.
Thanks. Alejandro On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli < vino...@hortonworks.com> wrote: > Alright, I think we've discussed enough on this and everybody seems to > agree > about a top level hadoop-tools module. > > Time to get into the action. I've filed HADOOP-7624. Amareshwari we can > track the rest of the implementation related details and questions for your > specific answers there. > > Thanks everyone for putting in your thoughts here. > +Vinod > > > On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottingh...@ebay.com > >wrote: > > > If hadoop-tools will be built as part of hadoop-common, then none of > these > > tools should be allowed to have a dependency on hdfs or mapreduce. > > Conversely is also true, when tools do have any such dependency, they > > cannot be bult as part of hadoop-common. > > We cannot have circular dependencies like that. > > > > That is probably obvious, but I'm just saying... > > > > Joep > > ________________________________________ > > From: Amareshwari Sri Ramadasu [amar...@yahoo-inc.com] > > Sent: Wednesday, September 07, 2011 9:33 PM > > To: mapreduce-...@hadoop.apache.org > > Cc: common-dev@hadoop.apache.org > > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > > > It is good to have hadoop-tools module separately. But as I asked before > we > > need to answer some questions here. I'm trying to answer them myself. > > Comments are welcome. > > > > > > 1. Should the patches for tools be created against Hadoop Common? > > Here, I meant should Hadoop common mailing list be used Or should we have > a > > separate mailing list for Tools? I agree with Vinod here, that we can > tie > > it Hadoop-common jira/mailing lists. > > > > > > 2. What will happen to the tools test automation? Will it run as > part > > of Hadoop Common tests? > > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop > > common if use Hadoop common mailing list for this. > > Also, I propose every patch build of HDFS and MAPREDUCE should also run > > tools tests to make sure nothing is broken. That would ease the > maintenance > > of hadoop-tools module. I presume tools test should not take much time > (some > > thing like not more than 30 minutes). > > > > > > 3. Will it introduce a dependency from MapReduce to Common? Or is > this > > > taken care in Mavenization? > > I'm not sure about this whether Mavenization can take care of it. > > > > Thanks > > Amareshwari > > > > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jrottingh...@ebay.com> wrote: > > > > Does a separate hadoop-tools module imply that there will be a separate > > Jenkins build as well? > > > > Thanks, > > > > Joep > > ________________________________________ > > From: Alejandro Abdelnur [t...@cloudera.com] > > Sent: Wednesday, September 07, 2011 11:35 AM > > To: mapreduce-...@hadoop.apache.org > > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > > > Makes sense > > > > On Wed, Sep 7, 2011 at 11:32 AM, <milind.bhandar...@emc.com> wrote: > > > > > +1 for separate hadoop-tools module. However, if a tool is broken at > > > release time, and no one comes forward to fix it, it should be removed. > > > (i.e. Unlike contrib modules, where build and test failures were > > > tolerated.) > > > > > > - milind > > > > > > On 9/7/11 11:27 AM, "Mahadev Konar" <maha...@hortonworks.com> wrote: > > > > > > >I like the idea of having tools as a seperate module and I dont think > > > >that it will be a dumping ground unless we choose to make one of it. > > > > > > > >+1 for hadoop tools module under trunk. > > > > > > > >thanks > > > >mahadev > > > > > > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur < > t...@cloudera.com> > > > >wrote: > > > >> Agreed, we should not have a dumping ground. IMO, what it would go > > into > > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for > > > >>FsShell as > > > >> well) are effectively hadoop CLI utilities. Having them in a > separate > > > >>module > > > >> rather in than in the core module (common, hdfs, mapreduce) does not > > > >>mean > > > >> that they are secondary things, just modularization. Also it will > help > > > >>to > > > >> get those tools to use public interfaces of the core module, and > when > > we > > > >> finally have a clean hadoop-client layer, those tools should only > > > >>depend on > > > >> that. > > > >> > > > >> Finally, the fact that tools would end up under trunk/hadoop-tools, > it > > > >>does > > > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the > > > >> same/different tools > > > >> > > > >> +1 for hadoop-tools/ (not binding) > > > >> > > > >> Thanks. > > > >> > > > >> > > > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric...@gmail.com> > wrote: > > > >> > > > >>> Mapreduce and HDFS are distinct function of Hadoop. They are > loosely > > > >>> coupled. If we have tools aggregator module, it will not have as > > > >>> clear distinct function as other Hadoop modules. Hence, it is > > > >>> possible for a tool to be depend on both HDFS and map reduce. If > > > >>> something broke in tools module, it is unclear which subproject's > > > >>> responsibility to maintain tools function. Therefore, it is safer > to > > > >>> send tools to incubator or apache extra rather than deposit the > > > >>> utility tools in tools subcategory. There are many short lived > > > >>> projects that attempts to associate themselves with Hadoop but not > > > >>> being maintained. It would be better to spin off those utility > > > >>> projects than use Hadoop as a dumping ground. > > > >>> > > > >>> The previous discussion for removing contrib, most people were in > > > >>> favor of doing so, and only a few contrib owners were reluctant to > > > >>> remove contrib. Fewer people has participated in restore > > > >>> functionality of broken contrib projects. History speaks for > itself. > > > >>> -1 (non-binding) for hadoop-tools. > > > >>> > > > >>> regards, > > > >>> Eric > > > >>> > > > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur < > > t...@cloudera.com> > > > >>> wrote: > > > >>> > Eric, > > > >>> > > > > >>> > Personally I'm fine either way. > > > >>> > > > > >>> > Still, I fail to see why a generic/categorized tools > > increase/reduce > > > >>>the > > > >>> > risk of dead code and how they make more-difficult/easier the > > > >>> > package&deployment. > > > >>> > > > > >>> > Would you please explain this? > > > >>> > > > > >>> > Thanks. > > > >>> > > > > >>> > Alejandro > > > >>> > > > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric...@gmail.com> > > wrote: > > > >>> > > > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. > > We > > > >>> don't > > > >>> >> want to repeat history for contrib again with hadoop-tools. > > Having > > > >>>a > > > >>> >> generic module like hadoop-tools increases the risk of > accumulate > > > >>>dead > > > >>> code. > > > >>> >> It would be better to categorize the hdfs or mapreduce specific > > > >>>tools > > > >>> in > > > >>> >> their respected subcategories. It is also easier to manage from > > > >>> >> package/deployment prospective. > > > >>> >> > > > >>> >> regards, > > > >>> >> Eric > > > >>> >> > > > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: > > > >>> >> > > > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer < > > a...@apache.org> > > > >>> wrote: > > > >>> >> >> > > > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: > > > >>> >> >>> We still need to answer Amareshwari's question (2) she asked > > > >>>some > > > >>> time > > > >>> >> back > > > >>> >> >>> about the automated code compilation and test execution of > the > > > >>>tools > > > >>> >> module. > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> >>>>> My #1 question is if tools is basically contrib reborn. > If > > > >>>not, > > > >>> what > > > >>> >> >>>> makes > > > >>> >> >>>>> it different? > > > >>> >> >> > > > >>> >> >> > > > >>> >> >> I'm still waiting for this answer as well. > > > >>> >> >> > > > >>> >> >> Until such, I would be pretty much against a tools > > module. > > > >>> >> Changing the name of the dumping ground doesn't make it any > less > > > >>>of a > > > >>> >> dumping ground. > > > >>> >> > > > > >>> >> > IMO if the tools module only gets stuff like distcp that's > > > >>>maintained > > > >>> >> > then it's not contrib, if it contains all the stuff from the > > > >>>current > > > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given > > that > > > >>> >> > this proposal only covers moving distcp to tools it doesn't > > sound > > > >>>like > > > >>> >> > contrib to me. > > > >>> >> > > > > >>> >> > Thanks, > > > >>> >> > Eli > > > >>> >> > > > >>> >> > > > >>> > > > > >>> > > > >> > > > > > > > > > > > > > > >