I like it! ________________________________ From: Jongyoul Lee <jongy...@gmail.com> Sent: Monday, March 11, 2019 9:05:03 PM To: dev Subject: Re: [discuss] Zeppelin support workflow
Thanks for the sharing this kind of discussion. I'm interested in it. Will see it. On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <neliu...@163.com> wrote: > Hello, everyone > > Because there are more than 20 interpreters in zeppelin, Data analysts > can be used to do a variety of data development, > A lot of data development is interdependent. > For example, the development of machine learning algorithms requires > relying on spark to preprocess data, and so on. > > Zeppelin should have built-in workflow capabilities. Instead of relying on > external software to schedule notes in zeppelin for the following reasons: > > 1. Now that we have upgraded from the data processing era to the algorithm > era, After zeppelin has its own workflow, > Will have a complete ecosystem of complete data processing and algorithmic > operations. > 2. zeppelin's powerful interactive processing capabilities help algorithm > engineers improve productivity and work. > Zeppelin should give the algorithm engineer more direct control. Instead > of handing the algorithm to other teams(or software) to do the workflow. > 3. zeppelin knows more about the processing status of data than Azkaban > and airflow. > So the built-in workflow will have better performance, user experience and > control. > > Typical use case > Especially in machine learning, Because machine learning generally has a > long task execution. > A typical example is as follows: > 1) First, obtain data from HDFS through spark; > 2) Clean and convert the data through sparksql; > 3) Feature extraction of data through spark; > 4) Tensorflow writing algorithm through hadoop submarine; > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch > processing; > 6) Publish the training acquisition model and provide online prediction > services; > 7) Model prediction by flink; > 8) Receive incremental data through flink for incremental update of the > model; > Therefore, zeppelin is especially required to have the ability to arrange > workflows. > > I completed the draft of the zeppelin workflow system design, please > review, you can directly modify the document or fill in the comments. > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > gdoc: > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit > < > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit> > > > :-) > > Xun Liu > 2019-03-11 -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net