Re: [discuss] Zeppelin support workflow

Mei Long Mon, 18 Mar 2019 10:04:58 -0700

Very cool! @Xun Liu Would you like to talk about it at our next Apache
Zeppelin community meeting?


On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung <felixcheun...@hotmail.com>
wrote:

> I like it!
>
> ________________________________
> From: Jongyoul Lee <jongy...@gmail.com>
> Sent: Monday, March 11, 2019 9:05:03 PM
> To: dev
> Subject: Re: [discuss] Zeppelin support workflow
>
> Thanks for the sharing this kind of discussion.
>
> I'm interested in it. Will see it.
>
> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <neliu...@163.com> wrote:
>
> > Hello, everyone
> >
> > Because there are more than 20 interpreters in zeppelin,  Data analysts
> > can be used to do a variety of data development,
> > A lot of data development is interdependent.
> > For example, the development of machine learning algorithms requires
> > relying on spark to preprocess data, and so on.
> >
> > Zeppelin should have built-in workflow capabilities. Instead of relying
> on
> > external software to schedule notes in zeppelin for the following
> reasons:
> >
> > 1. Now that we have upgraded from the data processing era to the
> algorithm
> > era, After zeppelin has its own workflow,
> > Will have a complete ecosystem of complete data processing and
> algorithmic
> > operations.
> > 2. zeppelin's powerful interactive processing capabilities help algorithm
> > engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control. Instead
> > of handing the algorithm to other teams(or software) to do the workflow.
> > 3. zeppelin knows more about the processing status of data than Azkaban
> > and airflow.
> > So the built-in workflow will have better performance, user experience
> and
> > control.
> >
> > Typical use case
> > Especially in machine learning, Because machine learning generally has a
> > long task execution.
> > A typical example is as follows:
> > 1) First, obtain data from HDFS through spark;
> > 2) Clean and convert the data through sparksql;
> > 3) Feature extraction of data through spark;
> > 4) Tensorflow writing algorithm through hadoop submarine;
> > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> > processing;
> > 6) Publish the training acquisition model and provide online prediction
> > services;
> > 7) Model prediction by flink;
> > 8) Receive incremental data through flink for incremental update of the
> > model;
> > Therefore, zeppelin is especially required to have the ability to arrange
> > workflows.
> >
> > I completed the draft of the zeppelin workflow system design, please
> > review, you can directly modify the document or fill in the comments.
> >
> > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> > https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> > gdoc:
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> > <
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> >
> >
> >
> > :-)
> >
> > Xun Liu
> > 2019-03-11
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: [discuss] Zeppelin support workflow

Reply via email to