Hello, everyone, I have completed the zeppelin workflow system design, please review, you can directly modify the document or fill in the comments.
JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> gdoc: https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit# <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#> :-) > 在 2019年3月8日,下午2:10,Jeff Zhang <zjf...@gmail.com> 写道: > > Hi Liu, > > See this link https://community.apache.org/gsoc.html > > > Xun Liu <neliu...@163.com> 于2019年3月8日周五 下午1:58写道: > >> Hi, Jongyoul Lee, Морковкин >> >> I queried the information about GSOS. Is it still necessary to apply for >> the zeppelin community first? >> I don't know much about GSOS. In addition to helping the project, the >> mentor >> What other work needs to be done? >> >>> 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道: >>> >>> Hi, Морковкин >>> >>> I am very happy to be your mentor for GSOC. :-) >>> I believe that by completing this work, I can also learn a lot. >>> >>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>> >>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович < >> morkovkin...@phystech.edu> 写道: >>>> >>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. >> It makes it easy to impose dependencies on the execution order of tasks. >> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ < >> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces >> the flow which is shown in the attached picture. >>>> Xun Liu, It would be great to clarify whether you agree to be a mentor >> exactly within GSOC, or without it? :) >>>> >>>> ---------------------------------------- >>>> Best regards, Basil Morkovkin >>>> >>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com <mailto: >> zjf...@gmail.com>>: >>>> >>>> Thanks Liu for taking over this, I will help review the design. >>>> >>>> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四 >> 下午4:05写道: >>>> Hi Vasiliy Morkovkin >>>> >>>> Thank you very much for your willingness to implement this feature of >> workflow. >>>> I will work with you with the highest priority. >>>> I am planning to update the system design documentation for workflow >> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> . >>>> Please set the Watcher in ZEPPELIN-4018. >>>> This way you can get notification messages for document updates in a >> timely manner. >>>> >>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. >>>> If you need it, you can email me at liuxun...@gmail.com <mailto: >> liuxun...@gmail.com> <mailto:liuxun...@gmail.com <mailto: >> liuxun...@gmail.com>> , I will reply you the fastest. >>>> Do you think this kind of cooperation is OK? >>>> >>>> >>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our >> system design. Thanks! >>>> >>>> :-) >>>> >>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович < >> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道: >>>>> >>>>> Thank you for such a detailed feedback! >>>>> I am definitely interested to work on the workflow implementation with >> you Xun Liu! Could you become a mentor in GSOC with this task? >>>>> Some front-end work is not a problem at all. >>>>> I'm ready to work at least 30 hours per week in the summer, while now >> I'd like to take some smaller tasks to take a closer look at existing >> codebase and to get familiar with your development workflow. Do you have >> such tasks on mind? >>>>> >>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto: >> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>: >>>>> Hi Vasiliy Morkovkin >>>>> >>>>> I said my thoughts on workflow, >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>>>> >>>>> Because there are more than 20 interpreters in zeppelin, >>>>> Data analysts can be used to do a variety of data development, >>>>> A lot of data development is interdependent. For example, >>>>> the development of machine learning algorithms requires relying on >> spark to preprocess data, and so on. >>>>> >>>>> Now open source workflow software has Azkaban, airflow, >>>>> Azkaban is relatively simple and has been used to meet most scenarios, >> and our company is using it. >>>>> Airflow looks complicated and I have not used it. >>>>> In fact, I have previously implemented workflow workflow for notes and >> paragraphs in zeppelin via azkaban. >>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> >> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> >>>>> >>>>> However, I think zeppelin should have built-in workflow capabilities. >>>>> Instead of relying on external software to schedule notes in zeppelin >> for the following reasons: >>>>> 1. Now that we have upgraded from the data processing era to the >> algorithm era, >>>>> After zeppelin has its own workflow, it will form a data loop. >>>>> >>>>> 2. zeppelin's powerful interactive processing capabilities help >> algorithm engineers improve productivity and work. >>>>> Zeppelin should give the algorithm engineer more direct control. >>>>> Instead of handing the algorithm to other teams(or software) to do the >> workflow. >>>>> >>>>> 3. zeppelin knows more about the processing status of data than >> Azkaban and airflow. >>>>> So the built-in workflow will have better performance, user experience >> and control. >>>>> >>>>> If you are interested in workflow(ZEPPELIN-4018), >>>>> I am willing to work with you to complete all system design and code >> development work. >>>>> >>>>> :-) >>>>> >>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto: >> zjf...@gmail.com> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道: >>>>>> >>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil, >>>>>> >>>>>> Thanks for your interest in zeppelin, here's my comments about the >> tickets >>>>>> you interested. >>>>>> >>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> >>>>>> This involves 2 sides of work: frontend and backend: >>>>>> In frontend, we should use arrow js to handle the table data, >> include >>>>>> display it and processing it (such as aggregation) >>>>>> In backend, we should use arrow for each language, and allow them to >>>>>> exchange data in the same process. And use arrow IPC to exchange data >>>>>> across processes. >>>>>> Overall, this is a pretty large task. If you really want to do, I >> would >>>>>> suggest you to just take part of it. >>>>>> >>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> >>>>>> Regarding model serving, I don't have clear picture about this. >> Others >>>>>> can comment on this. >>>>>> >>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>>>>> Job scheduling is pretty important for zeppelin, I would make this >> as >>>>>> the highest priority for zeppelin among these tickets. airflow is one >>>>>> option, but I am open to other solutions. First we need to figure out >> how >>>>>> user schedule jobs in zeppelin, then choose the right framework. It >> would >>>>>> also involves some frontend work >>>>>> >>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857>> >>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not >>>>>> supported yet. It won't be a big project for GSOC IMO. >>>>>> >>>>>> 5. OLAP. >>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, >>>>>> Zeppelin can support it very well. But we could create specific >> interpreter >>>>>> for OLAP engine if their native api perform better than jdbc. Another >> thing >>>>>> I can think of improving OLAP is visualization, although Zeppelin >> already >>>>>> support some built-in visualization, there's still some visualization >>>>>> missing. We could provide more. >>>>>> >>>>>> 6. Auto-completions. >>>>>> We have already support ipython[1] in zeppelin which provide almost >> the >>>>>> same auto-completion like jupyter. But it lacks for accessing python >> api >>>>>> doc. This is also pretty important for python users IMO. SQL is >> another >>>>>> popular language in Zeppelin, but it also doesn't provide good >>>>>> code-completion experience, we can do better as well. >>>>>> >>>>>> 7. Notifications. >>>>>> I think notification can be integrated into job scheduling. >> Notification >>>>>> can be sent when job is failed/succeed. >>>>>> >>>>>> >>>>>> Let us know which jira you are more interested, and also please >> consider >>>>>> how much time you can spent on this. Again, we are very appreciated >> your >>>>>> interest on zeppelin and look forward your contribution. >>>>>> >>>>>> >>>>>> [1] >>>>>> >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >> < >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> >> < >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >> < >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>>> >>>>>> >>>>>> >>>>>> >>>>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto: >> morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu <mailto: >> morkovkin...@phystech.edu>>> 于2019年3月6日周三 >>>>>> 上午7:41写道: >>>>>> >>>>>>> Thank you for your replies! I've checked existing set of issues and >> found >>>>>>> several curious ones: >>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very >>>>>>> nice >>>>>>> way to increase analytical processing performance using Arrow >> project; >>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models >>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although >> there is >>>>>>> much to think about; >>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance >>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> < >> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be >> useful in implementing complex >>>>>>> execution workflows. >>>>>>> Those tasks are global and intriguing, requiring complex >> architectural >>>>>>> solutions. >>>>>>> Also I've probably found the ticket which is suitable for me to get >>>>>>> involved into the project: >>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857> < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think? >>>>>>> Are there any "low hanging fruits"? >>>>>>> >>>>>>> And I have several ideas on my own. Some of them might be not >> relevant due >>>>>>> to the vision of the project or other reasons. Just ideas: >>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be >> quite >>>>>>> logical to add more integrations with existing OLAP solutions like >> Pinot, >>>>>>> ClickHouse and Druid. Currently I've found integration only with >> Kylin; >>>>>>> - Better autocompletion. Jupyter offers not only a list of already >>>>>>> initialized variables, but also quick access to documentation. It's >>>>>>> convenient; >>>>>>> - Notifications. Some colleagues would have appreciated the >> notifications >>>>>>> service, which sends you messages (via mail, Slack bot or something >> else) >>>>>>> indicating that your long-running paragraphs has completed. >>>>>>> >>>>>>> Feedback is very appreciated :) >>>>>>> >>>>>>> It would be wonderful if someone agreed to sacrifice his time and >> become a >>>>>>> mentor in GSOC program! >>>>>>> >>>>>>> ---------------------------------------- >>>>>>> Best regards, Basil Morkovkin. >>>>>>> >>>>>>> >>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com >> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto: >> jongy...@gmail.com>>>: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain >> what you >>>>>>>> would like to contribute to? I can add more issues >>>>>>>> >>>>>>>> JL >>>>>>>> >>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto: >> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>> >> wrote: >>>>>>>> >>>>>>>>> Hi, Vasiliy Morkovkin >>>>>>>>> >>>>>>>>> Welcome to the zeppelin community! :-) >>>>>>>>> >>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto: >> jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:jongy...@gmail.com>>> >> 写道: >>>>>>>>>> >>>>>>>>>> Thanks for contacting Zeppelin with your interest. >>>>>>>>>> >>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I >> have >>>>>>>>>> thought about. We always encourage to contribute Zeppelin with >> several >>>>>>>>>> topics including your idea. >>>>>>>>>> >>>>>>>>>> Please describe something more. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> JL >>>>>>>>>> >>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org >> <mailto:m...@apache.org> <mailto:m...@apache.org <mailto:m...@apache.org>>> >> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Great to see your interest to project. Thanks! >>>>>>>>>>> Looks like we need volunteers for a mentor and some backend >> subject >>>>>>> for >>>>>>>>>>> GSoC2019. >>>>>>>>>>> Any ideas? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> moon >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < >>>>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu> >> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of >>>>>>>>> physics >>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of >>>>>>> GSOC >>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of >>>>>>>>> months, >>>>>>>>>>>> using it at my job. But I have found out only one ticket >> (front-end >>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may >> have any >>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you >> don't >>>>>>> have >>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to >>>>>>> mentor >>>>>>>>>>>> these ideas within GSOC :) >>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for >> 1.5 >>>>>>>>> year. >>>>>>>>>>>> I also can write in Java or Python without any problems if >>>>>>> necessary. >>>>>>>>>>>> Really fond of databases and highload. Also I have experience >> with >>>>>>>>> some >>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, Basil Morkovkin. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < >> http://madeng.net/>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < >> http://madeng.net/>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards >>>>>> >>>>>> Jeff Zhang >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>> >> >> >> > > -- > Best Regards > > Jeff Zhang