Re: [DISCUSS] Making submarine to different release model like Ozone

Eric Yang Fri, 01 Feb 2019 14:06:21 -0800

If HDFS or YARN breaks compatibility with Submarine, it will require to make 
release to catch up with the latest Hadoop changes.  On hadoop.apache.org 
website, the latest news may always have Submarine on top to repair 
compatibility with latest of Hadoop.  This may overwhelm any interesting news 
that may happen in Hadoop space.  I don’t like to see that happen, but 
unavoidable with independent release cycle.  Maybe there is a good way to avoid 
this with help of release manager to ensure that Hadoop/Submarine don’t break 
compatibility frequently.

For me to lift my veto, release managers of independent release cycles need to 
take responsibility to ensure X version of Hadoop is tested with Y version of 
Submarine.  Release managers will have to do more work to ensure the defined 
combination works.  With the greater responsibility of release management comes 
with its own reward.  Seasoned PMC may be nominated to become Apache Member, 
which will help with Submarine to enter Apache Incubator when time is right.  
Hence, I will withdraw my veto and let Submarine set its own course.

Good luck Wangda.

Regards,
Eric

From: Wangda Tan <wheele...@gmail.com>
Date: Friday, February 1, 2019 at 10:52 AM
To: Eric Yang <ey...@hortonworks.com>
Cc: Weiwei Yang <abvclo...@gmail.com>, Xun Liu <neliu...@163.com>, Hadoop 
Common <common-...@hadoop.apache.org>, "yarn-...@hadoop.apache.org" 
<yarn-...@hadoop.apache.org>, Hdfs-dev <hdfs-dev@hadoop.apache.org>, 
"mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>
Subject: Re: [DISCUSS] Making submarine to different release model like Ozone

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have separate 
releases, like Hive's storage API, OZone, etc. For loosely coupled 
sub-projects, it gonna be great (at least for most of the users) to have 
separate releases so new features can be faster consumed and iterated. From 
above feedbacks from developers and users, I think it is also what people want.

Another concern you mentioned is Submarine is aligned with Hadoop project 
goals. From feedbacks we can see, it attracts companies continue using Hadoop 
to solve their ML/DL requirements, it also created a good feedback loop, many 
issues faced, and some new functionalities added by Submarine went back to 
Hadoop. Such as localization files, directories. GPU topology related 
enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both Submarine 
and Hadoop, try to get fast release cycles for both of the projects. And for 
your suggestion about Apache incubator, we can reconsider it once Submarine 
becomes a more independent project, now it is still too small and too much 
overhead to go through the process, I don't want to stop the fast-growing 
community for months to go through incubator process for now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
Submarine is an application built for YARN framework, but it does not have 
strong dependency on YARN development.  For this kind of projects, it would be 
best to enter Apache Incubator cycles to create a new community.  Apache 
commons is the only project other than Incubator that has independent release 
cycles.  The collection is large, and the project goal is ambitious.  No one 
really knows which component works with each other in Apache commons.  Hadoop 
is a much more focused project on distributed computing framework and not 
incubation sandbox.  For alignment with Hadoop goals, and we want to prevent 
Hadoop project to be overloaded while allowing good ideas to be carried 
forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to 
allow more independent subproject release cycle in Hadoop project that does not 
align with Hadoop project goals.

Apache incubator process is highly recommended for Submarine: 
https://incubator.apache.org/policy/process.html This allows Submarine to 
develop for older version of Hadoop like Spark works with multiple versions of 
Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" 
<abvclo...@gmail.com<mailto:abvclo...@gmail.com>> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the 
community grows fast and quiet collaborative. I can see the reasons to get it 
release faster in its own cycle. And at the same time, the Ozone way works very 
well.

    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu 
<neliu...@163.com<mailto:neliu...@163.com>>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research 
Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine 
from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop 
technology stack,
    > It avoids the needs to increase the manpower investment in learning other 
container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run 
machine learning,
    > which makes the utilization of server resources more efficient, and 
reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will 
provide the service to our e-commerce department (http://www.kaola.com/) 
shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN 
cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning 
services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal 
detection,
    > 4) Music department (https://music.163.com/) requires music 
recommendation,
    > 5) Education department (http://www.youdao.com) requires voice 
recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires 
multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us 
quickly get the latest features and improvements, and it will be great helpful 
to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan 
<wheele...@gmail.com<mailto:wheele...@gmail.com>> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot 
of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine 
TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. 
We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included 
some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop 
later
    > > to address common requirements. In addition to that, we have a big 
overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to 
trunk
    > > discussion:
    > > 
https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, 
different
    > > release cycle, and different release branch. It is a great example to 
show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > 
<https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > 
<https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache 
Hadoop
    > > 
<https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > 
<https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: 
hdfs-dev-unsubscr...@hadoop.apache.org<mailto:hdfs-dev-unsubscr...@hadoop.apache.org>
    > For additional commands, e-mail: 
hdfs-dev-h...@hadoop.apache.org<mailto:hdfs-dev-h...@hadoop.apache.org>
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Reply via email to