Hello, everyone

Because there are more than 20 interpreters in zeppelin,  Data analysts can be 
used to do a variety of data development, 
A lot of data development is interdependent. 
For example, the development of machine learning algorithms requires relying on 
spark to preprocess data, and so on.

Zeppelin should have built-in workflow capabilities. Instead of relying on 
external software to schedule notes in zeppelin for the following reasons:

1. Now that we have upgraded from the data processing era to the algorithm era, 
After zeppelin has its own workflow, 
Will have a complete ecosystem of complete data processing and algorithmic 
operations.
2. zeppelin's powerful interactive processing capabilities help algorithm 
engineers improve productivity and work. 
Zeppelin should give the algorithm engineer more direct control. Instead of 
handing the algorithm to other teams(or software) to do the workflow.
3. zeppelin knows more about the processing status of data than Azkaban and 
airflow. 
So the built-in workflow will have better performance, user experience and 
control.

Typical use case
Especially in machine learning, Because machine learning generally has a long 
task execution.
A typical example is as follows:
1) First, obtain data from HDFS through spark;
2) Clean and convert the data through sparksql;
3) Feature extraction of data through spark;
4) Tensorflow writing algorithm through hadoop submarine;
5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch 
processing;
6) Publish the training acquisition model and provide online prediction 
services;
7) Model prediction by flink;
8) Receive incremental data through flink for incremental update of the model;
Therefore, zeppelin is especially required to have the ability to arrange 
workflows.

I completed the draft of the zeppelin workflow system design, please review, 
you can directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
gdoc: 
https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
 
<https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
 

:-)

Xun Liu
2019-03-11

Reply via email to