Very cool! @Xun Liu Would you like to talk about it at our next Apache Zeppelin community meeting?
On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > I like it! > > ________________________________ > From: Jongyoul Lee <jongy...@gmail.com> > Sent: Monday, March 11, 2019 9:05:03 PM > To: dev > Subject: Re: [discuss] Zeppelin support workflow > > Thanks for the sharing this kind of discussion. > > I'm interested in it. Will see it. > > On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <neliu...@163.com> wrote: > > > Hello, everyone > > > > Because there are more than 20 interpreters in zeppelin, Data analysts > > can be used to do a variety of data development, > > A lot of data development is interdependent. > > For example, the development of machine learning algorithms requires > > relying on spark to preprocess data, and so on. > > > > Zeppelin should have built-in workflow capabilities. Instead of relying > on > > external software to schedule notes in zeppelin for the following > reasons: > > > > 1. Now that we have upgraded from the data processing era to the > algorithm > > era, After zeppelin has its own workflow, > > Will have a complete ecosystem of complete data processing and > algorithmic > > operations. > > 2. zeppelin's powerful interactive processing capabilities help algorithm > > engineers improve productivity and work. > > Zeppelin should give the algorithm engineer more direct control. Instead > > of handing the algorithm to other teams(or software) to do the workflow. > > 3. zeppelin knows more about the processing status of data than Azkaban > > and airflow. > > So the built-in workflow will have better performance, user experience > and > > control. > > > > Typical use case > > Especially in machine learning, Because machine learning generally has a > > long task execution. > > A typical example is as follows: > > 1) First, obtain data from HDFS through spark; > > 2) Clean and convert the data through sparksql; > > 3) Feature extraction of data through spark; > > 4) Tensorflow writing algorithm through hadoop submarine; > > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch > > processing; > > 6) Publish the training acquisition model and provide online prediction > > services; > > 7) Model prediction by flink; > > 8) Receive incremental data through flink for incremental update of the > > model; > > Therefore, zeppelin is especially required to have the ability to arrange > > workflows. > > > > I completed the draft of the zeppelin workflow system design, please > > review, you can directly modify the document or fill in the comments. > > > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > > gdoc: > > > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > > < > > > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > > > > > > > > :-) > > > > Xun Liu > > 2019-03-11 > > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >