Hi folks, I just drafted a proposal which is targetted to send to PMC list and board for thoughts. Thanks Xun Liu for providing thoughts about future directions/architecture, and reviews from Keqiu Hu.
Title: "Apache Submarine for Apache Top-Level Project" https://docs.google.com/document/d/1kE_f-r-ANh9qOeapdPwQPHhaJTS7IMiqDQAS8ESi4TA/edit I plan to send it to PMC list/board next Monday, so any comments/suggestions are welcome. Thanks, Wangda On Tue, Jul 30, 2019 at 6:01 PM 俊平堵 <dujunp...@gmail.com> wrote: > Thanks Vinod for these great suggestions. I agree most of your comments > above. > "For the Apache Hadoop community, this will be treated simply as > code-change and so need a committer +1?". IIUC, this should be treated as > feature branch merge, so may be 3 committer +1 is needed here according to > https://hadoop.apache.org/bylaws.html? > > bq. Can somebody who have cycles and been on the ASF lists for a while > look into the process here? > I can check with ASF members who has experience on this if no one haven't > yet. > > Thanks, > > Junping > > Vinod Kumar Vavilapalli <vino...@apache.org> 于2019年7月29日周一 下午9:46写道: > >> Looks like there's a meaningful push behind this. >> >> Given the desire is to fork off Apache Hadoop, you'd want to make sure >> this enthusiasm turns into building a real, independent but more >> importantly a sustainable community. >> >> Given that there were two official releases off the Apache Hadoop >> project, I doubt if you'd need to go through the incubator process. Instead >> you can directly propose a new TLP at ASF board. The last few times this >> happened was with ORC, and long before that with Hive, HBase etc. Can >> somebody who have cycles and been on the ASF lists for a while look into >> the process here? >> >> For the Apache Hadoop community, this will be treated simply as >> code-change and so need a committer +1? You can be more gently by formally >> doing a vote once a process doc is written down. >> >> Back to the sustainable community point, as part of drafting this >> proposal, you'd definitely want to make sure all of the Apache Hadoop >> PMC/Committers can exercise their will to join this new project as >> PMC/Committers respectively without any additional constraints. >> >> Thanks >> +Vinod >> >> > On Jul 25, 2019, at 1:31 PM, Wangda Tan <wheele...@gmail.com> wrote: >> > >> > Thanks everybody for sharing your thoughts. I saw positive feedbacks >> from >> > 20+ contributors! >> > >> > So I think we should move it forward, any suggestions about what we >> should >> > do? >> > >> > Best, >> > Wangda >> > >> > On Mon, Jul 22, 2019 at 5:36 PM neo <n...@pingcap.com> wrote: >> > >> >> +1, This is neo from TiDB & TiKV community. >> >> Thanks Xun for bring this up. >> >> >> >> Our CNCF project's open source distributed KV storage system TiKV, >> >> Hadoop submarine's machine learning engine helps us to optimize data >> >> storage, >> >> helping us solve some problems in data hotspots and data shuffers. >> >> >> >> We are ready to improve the performance of TiDB in our open source >> >> distributed relational database TiDB and also using the hadoop >> submarine >> >> machine learning engine. >> >> >> >> I think if submarine can be independent, it will develop faster and >> better. >> >> Thanks to the hadoop community for developing submarine! >> >> >> >> Best Regards, >> >> neo >> >> www.pingcap.com / https://github.com/pingcap/tidb / >> >> https://github.com/tikv >> >> >> >> Xun Liu <liu...@apache.org> 于2019年7月22日周一 下午4:07写道: >> >> >> >>> @adam.antal >> >>> >> >>> The submarine development team has completed the following >> preparations: >> >>> 1. Established a temporary test repository on Github. >> >>> 2. Change the package name of hadoop submarine from >> org.hadoop.submarine >> >> to >> >>> org.submarine >> >>> 3. Combine the Linkedin/TonY code into the Hadoop submarine module; >> >>> 4. On the Github docked travis-ci system, all test cases have been >> >> tested; >> >>> 5. Several Hadoop submarine users completed the system test using the >> >> code >> >>> in this repository. >> >>> >> >>> 赵欣 <xinz...@seu.edu.cn> 于2019年7月22日周一 上午9:38写道: >> >>> >> >>>> Hi >> >>>> >> >>>> I am a teacher at Southeast University (https://www.seu.edu.cn/). We >> >> are >> >>>> a major in electrical engineering. Our teaching teams and students >> use >> >>>> bigoop submarine for big data analysis and automation control of >> >>> electrical >> >>>> equipment. >> >>>> >> >>>> Many thanks to the hadoop community for providing us with machine >> >>> learning >> >>>> tools like submarine. >> >>>> >> >>>> I wish hadoop submarine is getting better and better. >> >>>> >> >>>> >> >>>> ============================== >> >>>> 赵欣 >> >>>> 东南大学电气工程学院 >> >>>> >> >>>> ----------------------------------------------------- >> >>>> >> >>>> Zhao XIN >> >>>> >> >>>> School of Electrical Engineering >> >>>> >> >>>> ============================== >> >>>> 2019-07-18 >> >>>> >> >>>> >> >>>> *From:* Xun Liu <liu...@apache.org> >> >>>> *Date:* 2019-07-18 09:46 >> >>>> *To:* xinzhao <xinz...@seu.edu.cn> >> >>>> *Subject:* Fwd: Re: Any thoughts making Submarine a separate Apache >> >>>> project? >> >>>> >> >>>> >> >>>> ---------- Forwarded message --------- >> >>>> 发件人: dashuiguailu...@gmail.com <dashuiguailu...@gmail.com> >> >>>> Date: 2019年7月17日周三 下午3:17 >> >>>> Subject: Re: Re: Any thoughts making Submarine a separate Apache >> >> project? >> >>>> To: Szilard Nemeth <snem...@cloudera.com.invalid>, runlin zhang < >> >>>> runlin...@gmail.com> >> >>>> Cc: Xun Liu <liu...@apache.org>, common-dev < >> >>> common-...@hadoop.apache.org>, >> >>>> yarn-dev <yarn-...@hadoop.apache.org>, hdfs-dev < >> >>>> hdfs-dev@hadoop.apache.org>, mapreduce-dev < >> >>>> mapreduce-...@hadoop.apache.org>, submarine-dev < >> >>>> submarine-...@hadoop.apache.org> >> >>>> >> >>>> >> >>>> +1 ,Good idea, we are very much looking forward to it. >> >>>> >> >>>> ------------------------------ >> >>>> dashuiguailu...@gmail.com >> >>>> >> >>>> >> >>>> *From:* Szilard Nemeth <snem...@cloudera.com.INVALID> >> >>>> *Date:* 2019-07-17 14:55 >> >>>> *To:* runlin zhang <runlin...@gmail.com> >> >>>> *CC:* Xun Liu <liu...@apache.org>; Hadoop Common >> >>>> <common-...@hadoop.apache.org>; yarn-dev <yarn-...@hadoop.apache.org >> >; >> >>>> Hdfs-dev <hdfs-dev@hadoop.apache.org>; mapreduce-dev >> >>>> <mapreduce-...@hadoop.apache.org>; submarine-dev >> >>>> <submarine-...@hadoop.apache.org> >> >>>> *Subject:* Re: Any thoughts making Submarine a separate Apache >> project? >> >>>> +1, this is a very great idea. >> >>>> As Hadoop repository has already grown huge and contains many >> >> projects, I >> >>>> think in general it's a good idea to separate projects in the early >> >>> phase. >> >>>> >> >>>> >> >>>> On Wed, Jul 17, 2019, 08:50 runlin zhang <runlin...@gmail.com> >> wrote: >> >>>> >> >>>>> +1 ,That will be great ! >> >>>>> >> >>>>>> 在 2019年7月10日,下午3:34,Xun Liu <liu...@apache.org> 写道: >> >>>>>> >> >>>>>> Hi all, >> >>>>>> >> >>>>>> This is Xun Liu contributing to the Submarine project for deep >> >>> learning >> >>>>>> workloads running with big data workloads together on Hadoop >> >>> clusters. >> >>>>>> >> >>>>>> There are a bunch of integrations of Submarine to other projects >> >> are >> >>>>>> finished or going on, such as Apache Zeppelin, TonY, Azkaban. The >> >>> next >> >>>>> step >> >>>>>> of Submarine is going to integrate with more projects like Apache >> >>>> Arrow, >> >>>>>> Redis, MLflow, etc. & be able to handle end-to-end machine learning >> >>> use >> >>>>>> cases like model serving, notebook management, advanced training >> >>>>>> optimizations (like auto parameter tuning, memory cache >> >> optimizations >> >>>> for >> >>>>>> large datasets for training, etc.), and make it run on other >> >>> platforms >> >>>>> like >> >>>>>> Kubernetes or natively on Cloud. LinkedIn also wants to donate TonY >> >>>>> project >> >>>>>> to Apache so we can put Submarine and TonY together to the same >> >>>> codebase >> >>>>>> (Page #30. >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-tony-tensorflow-on-yarn-and-beyond#30 >> >>>>>> ). >> >>>>>> >> >>>>>> This expands the scope of the original Submarine project in >> >> exciting >> >>>> new >> >>>>>> ways. Toward that end, would it make sense to create a separate >> >>>> Submarine >> >>>>>> project at Apache? This can make faster adoption of Submarine, and >> >>>> allow >> >>>>>> Submarine to grow to a full-blown machine learning platform. >> >>>>>> >> >>>>>> There will be lots of technical details to work out, but any >> >> initial >> >>>>>> thoughts on this? >> >>>>>> >> >>>>>> Best Regards, >> >>>>>> Xun Liu >> >>>>> >> >>>>> >> >>>>> >> --------------------------------------------------------------------- >> >>>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> >>>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org >> >>