Hi Vasiliy Morkovkin Thank you very much for your willingness to implement this feature of workflow. I will work with you with the highest priority. I am planning to update the system design documentation for workflow first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> . Please set the Watcher in ZEPPELIN-4018. This way you can get notification messages for document updates in a timely manner.
We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. If you need it, you can email me at liuxun...@gmail.com <mailto:liuxun...@gmail.com> , I will reply you the fastest. Do you think this kind of cooperation is OK? @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our system design. Thanks! :-) > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович > <morkovkin...@phystech.edu> 写道: > > Thank you for such a detailed feedback! > I am definitely interested to work on the workflow implementation with you > Xun Liu! Could you become a mentor in GSOC with this task? > Some front-end work is not a problem at all. > I'm ready to work at least 30 hours per week in the summer, while now I'd > like to take some smaller tasks to take a closer look at existing codebase > and to get familiar with your development workflow. Do you have such tasks on > mind? > > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com > <mailto:neliu...@163.com>>: > Hi Vasiliy Morkovkin > > I said my thoughts on workflow, > https://issues.apache.org/jira/browse/ZEPPELIN-4018 > <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > > Because there are more than 20 interpreters in zeppelin, > Data analysts can be used to do a variety of data development, > A lot of data development is interdependent. For example, > the development of machine learning algorithms requires relying on spark to > preprocess data, and so on. > > Now open source workflow software has Azkaban, airflow, > Azkaban is relatively simple and has been used to meet most scenarios, and > our company is using it. > Airflow looks complicated and I have not used it. > In fact, I have previously implemented workflow workflow for notes and > paragraphs in zeppelin via azkaban. > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> > > However, I think zeppelin should have built-in workflow capabilities. > Instead of relying on external software to schedule notes in zeppelin for the > following reasons: > 1. Now that we have upgraded from the data processing era to the algorithm > era, > After zeppelin has its own workflow, it will form a data loop. > > 2. zeppelin's powerful interactive processing capabilities help algorithm > engineers improve productivity and work. > Zeppelin should give the algorithm engineer more direct control. > Instead of handing the algorithm to other teams(or software) to do the > workflow. > > 3. zeppelin knows more about the processing status of data than Azkaban and > airflow. > So the built-in workflow will have better performance, user experience and > control. > > If you are interested in workflow(ZEPPELIN-4018), > I am willing to work with you to complete all system design and code > development work. > > :-) > >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:zjf...@gmail.com>> >> 写道: >> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil, >> >> Thanks for your interest in zeppelin, here's my comments about the tickets >> you interested. >> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 >> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> >> This involves 2 sides of work: frontend and backend: >> In frontend, we should use arrow js to handle the table data, include >> display it and processing it (such as aggregation) >> In backend, we should use arrow for each language, and allow them to >> exchange data in the same process. And use arrow IPC to exchange data >> across processes. >> Overall, this is a pretty large task. If you really want to do, I would >> suggest you to just take part of it. >> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 >> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> >> Regarding model serving, I don't have clear picture about this. Others >> can comment on this. >> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> >> Job scheduling is pretty important for zeppelin, I would make this as >> the highest priority for zeppelin among these tickets. airflow is one >> option, but I am open to other solutions. First we need to figure out how >> user schedule jobs in zeppelin, then choose the right framework. It would >> also involves some frontend work >> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857> >> Spark 2.4.0 supporting is already there, but scala 2.12 is not >> supported yet. It won't be a big project for GSOC IMO. >> >> 5. OLAP. >> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, >> Zeppelin can support it very well. But we could create specific interpreter >> for OLAP engine if their native api perform better than jdbc. Another thing >> I can think of improving OLAP is visualization, although Zeppelin already >> support some built-in visualization, there's still some visualization >> missing. We could provide more. >> >> 6. Auto-completions. >> We have already support ipython[1] in zeppelin which provide almost the >> same auto-completion like jupyter. But it lacks for accessing python api >> doc. This is also pretty important for python users IMO. SQL is another >> popular language in Zeppelin, but it also doesn't provide good >> code-completion experience, we can do better as well. >> >> 7. Notifications. >> I think notification can be integrated into job scheduling. Notification >> can be sent when job is failed/succeed. >> >> >> Let us know which jira you are more interested, and also please consider >> how much time you can spent on this. Again, we are very appreciated your >> interest on zeppelin and look forward your contribution. >> >> >> [1] >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >> >> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> >> >> >> >> Морковкин, Василий Владимирович <morkovkin...@phystech.edu >> <mailto:morkovkin...@phystech.edu>> 于2019年3月6日周三 >> 上午7:41写道: >> >>> Thank you for your replies! I've checked existing set of issues and found >>> several curious ones: >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very >>> nice >>> way to increase analytical processing performance using Arrow project; >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models >>> regardless of ZeppelinServer sounds quite intriguing too. Although there is >>> much to think about; >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance >>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to be >>> useful in implementing complex >>> execution workflows. >>> Those tasks are global and intriguing, requiring complex architectural >>> solutions. >>> Also I've probably found the ticket which is suitable for me to get >>> involved into the project: >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think? >>> Are there any "low hanging fruits"? >>> >>> And I have several ideas on my own. Some of them might be not relevant due >>> to the vision of the project or other reasons. Just ideas: >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite >>> logical to add more integrations with existing OLAP solutions like Pinot, >>> ClickHouse and Druid. Currently I've found integration only with Kylin; >>> - Better autocompletion. Jupyter offers not only a list of already >>> initialized variables, but also quick access to documentation. It's >>> convenient; >>> - Notifications. Some colleagues would have appreciated the notifications >>> service, which sends you messages (via mail, Slack bot or something else) >>> indicating that your long-running paragraphs has completed. >>> >>> Feedback is very appreciated :) >>> >>> It would be wonderful if someone agreed to sacrifice his time and become a >>> mentor in GSOC program! >>> >>> ---------------------------------------- >>> Best regards, Basil Morkovkin. >>> >>> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com >>> <mailto:jongy...@gmail.com>>: >>> >>>> Hello, >>>> >>>> I've confirmed I could add more issues for GSOC. Can you explain what you >>>> would like to contribute to? I can add more issues >>>> >>>> JL >>>> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com >>>> <mailto:neliu...@163.com>> wrote: >>>> >>>>> Hi, Vasiliy Morkovkin >>>>> >>>>> Welcome to the zeppelin community! :-) >>>>> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com >>>>>> <mailto:jongy...@gmail.com>> 写道: >>>>>> >>>>>> Thanks for contacting Zeppelin with your interest. >>>>>> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I have >>>>>> thought about. We always encourage to contribute Zeppelin with several >>>>>> topics including your idea. >>>>>> >>>>>> Please describe something more. >>>>>> >>>>>> Thanks. >>>>>> JL >>>>>> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org >>>>>> <mailto:m...@apache.org>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Great to see your interest to project. Thanks! >>>>>>> Looks like we need volunteers for a mentor and some backend subject >>> for >>>>>>> GSoC2019. >>>>>>> Any ideas? >>>>>>> >>>>>>> Best, >>>>>>> moon >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < >>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of >>>>> physics >>>>>>>> and technology and eager to contribute to Zeppelin in context of >>> GSOC >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of >>>>> months, >>>>>>>> using it at my job. But I have found out only one ticket (front-end >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't >>> have >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to >>> mentor >>>>>>>> these ideas within GSOC :) >>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5 >>>>> year. >>>>>>>> I also can write in Java or Python without any problems if >>> necessary. >>>>>>>> Really fond of databases and highload. Also I have experience with >>>>> some >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. >>>>>>>> >>>>>>>> Best regards, Basil Morkovkin. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>> http://madeng.net <http://madeng.net/> >>>>> >>>>> >>>> >>>> -- >>>> 이종열, Jongyoul Lee, 李宗烈 >>>> http://madeng.net <http://madeng.net/> >>>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >