Hi Felix Cheung Thank you for your Suggest.
> 在 2019年3月11日,上午5:47,Felix Cheung <felixcheun...@hotmail.com> 写道: > > Hi Xun, > > Thanks for your work - could you change the title of the email, I think you > will get more attention to your ask to review the design. > > > ________________________________ > From: Xun Liu <neliu...@163.com> > Sent: Sunday, March 10, 2019 12:03 AM > To: Jongyoul Lee; m...@apache.org; Jeff Zhang; Vasiliy Morkovkin > Cc: dev@zeppelin.apache.org > Subject: Re: Zeppelin in GSOC 2019 > > Hello, everyone, > > I have completed the zeppelin workflow system design, please review, you can > directly modify the document or fill in the comments. > > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 > <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > gdoc: > https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit# > > <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#> > > :-) > >> 在 2019年3月8日,下午2:10,Jeff Zhang <zjf...@gmail.com> 写道: >> >> Hi Liu, >> >> See this link https://community.apache.org/gsoc.html >> >> >> Xun Liu <neliu...@163.com> 于2019年3月8日周五 下午1:58写道: >> >>> Hi, Jongyoul Lee, Морковкин >>> >>> I queried the information about GSOS. Is it still necessary to apply for >>> the zeppelin community first? >>> I don't know much about GSOS. In addition to helping the project, the >>> mentor >>> What other work needs to be done? >>> >>>> 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道: >>>> >>>> Hi, Морковкин >>>> >>>> I am very happy to be your mentor for GSOC. :-) >>>> I believe that by completing this work, I can also learn a lot. >>>> >>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>>> >>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович < >>> morkovkin...@phystech.edu> 写道: >>>>> >>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. >>> It makes it easy to impose dependencies on the execution order of tasks. >>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ < >>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces >>> the flow which is shown in the attached picture. >>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor >>> exactly within GSOC, or without it? :) >>>>> >>>>> ---------------------------------------- >>>>> Best regards, Basil Morkovkin >>>>> >>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com <mailto: >>> zjf...@gmail.com>>: >>>>> >>>>> Thanks Liu for taking over this, I will help review the design. >>>>> >>>>> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四 >>> 下午4:05写道: >>>>> Hi Vasiliy Morkovkin >>>>> >>>>> Thank you very much for your willingness to implement this feature of >>> workflow. >>>>> I will work with you with the highest priority. >>>>> I am planning to update the system design documentation for workflow >>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> . >>>>> Please set the Watcher in ZEPPELIN-4018. >>>>> This way you can get notification messages for document updates in a >>> timely manner. >>>>> >>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. >>>>> If you need it, you can email me at liuxun...@gmail.com <mailto: >>> liuxun...@gmail.com> <mailto:liuxun...@gmail.com <mailto: >>> liuxun...@gmail.com>> , I will reply you the fastest. >>>>> Do you think this kind of cooperation is OK? >>>>> >>>>> >>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our >>> system design. Thanks! >>>>> >>>>> :-) >>>>> >>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович < >>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道: >>>>>> >>>>>> Thank you for such a detailed feedback! >>>>>> I am definitely interested to work on the workflow implementation with >>> you Xun Liu! Could you become a mentor in GSOC with this task? >>>>>> Some front-end work is not a problem at all. >>>>>> I'm ready to work at least 30 hours per week in the summer, while now >>> I'd like to take some smaller tasks to take a closer look at existing >>> codebase and to get familiar with your development workflow. Do you have >>> such tasks on mind? >>>>>> >>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto: >>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>: >>>>>> Hi Vasiliy Morkovkin >>>>>> >>>>>> I said my thoughts on workflow, >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>>>>> >>>>>> Because there are more than 20 interpreters in zeppelin, >>>>>> Data analysts can be used to do a variety of data development, >>>>>> A lot of data development is interdependent. For example, >>>>>> the development of machine learning algorithms requires relying on >>> spark to preprocess data, and so on. >>>>>> >>>>>> Now open source workflow software has Azkaban, airflow, >>>>>> Azkaban is relatively simple and has been used to meet most scenarios, >>> and our company is using it. >>>>>> Airflow looks complicated and I have not used it. >>>>>> In fact, I have previously implemented workflow workflow for notes and >>> paragraphs in zeppelin via azkaban. >>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> >>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> >>>>>> >>>>>> However, I think zeppelin should have built-in workflow capabilities. >>>>>> Instead of relying on external software to schedule notes in zeppelin >>> for the following reasons: >>>>>> 1. Now that we have upgraded from the data processing era to the >>> algorithm era, >>>>>> After zeppelin has its own workflow, it will form a data loop. >>>>>> >>>>>> 2. zeppelin's powerful interactive processing capabilities help >>> algorithm engineers improve productivity and work. >>>>>> Zeppelin should give the algorithm engineer more direct control. >>>>>> Instead of handing the algorithm to other teams(or software) to do the >>> workflow. >>>>>> >>>>>> 3. zeppelin knows more about the processing status of data than >>> Azkaban and airflow. >>>>>> So the built-in workflow will have better performance, user experience >>> and control. >>>>>> >>>>>> If you are interested in workflow(ZEPPELIN-4018), >>>>>> I am willing to work with you to complete all system design and code >>> development work. >>>>>> >>>>>> :-) >>>>>> >>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto: >>> zjf...@gmail.com> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil, >>>>>>> >>>>>>> Thanks for your interest in zeppelin, here's my comments about the >>> tickets >>>>>>> you interested. >>>>>>> >>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> >>>>>>> This involves 2 sides of work: frontend and backend: >>>>>>> In frontend, we should use arrow js to handle the table data, >>> include >>>>>>> display it and processing it (such as aggregation) >>>>>>> In backend, we should use arrow for each language, and allow them to >>>>>>> exchange data in the same process. And use arrow IPC to exchange data >>>>>>> across processes. >>>>>>> Overall, this is a pretty large task. If you really want to do, I >>> would >>>>>>> suggest you to just take part of it. >>>>>>> >>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> >>>>>>> Regarding model serving, I don't have clear picture about this. >>> Others >>>>>>> can comment on this. >>>>>>> >>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>>>>>> Job scheduling is pretty important for zeppelin, I would make this >>> as >>>>>>> the highest priority for zeppelin among these tickets. airflow is one >>>>>>> option, but I am open to other solutions. First we need to figure out >>> how >>>>>>> user schedule jobs in zeppelin, then choose the right framework. It >>> would >>>>>>> also involves some frontend work >>>>>>> >>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>> >>>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not >>>>>>> supported yet. It won't be a big project for GSOC IMO. >>>>>>> >>>>>>> 5. OLAP. >>>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, >>>>>>> Zeppelin can support it very well. But we could create specific >>> interpreter >>>>>>> for OLAP engine if their native api perform better than jdbc. Another >>> thing >>>>>>> I can think of improving OLAP is visualization, although Zeppelin >>> already >>>>>>> support some built-in visualization, there's still some visualization >>>>>>> missing. We could provide more. >>>>>>> >>>>>>> 6. Auto-completions. >>>>>>> We have already support ipython[1] in zeppelin which provide almost >>> the >>>>>>> same auto-completion like jupyter. But it lacks for accessing python >>> api >>>>>>> doc. This is also pretty important for python users IMO. SQL is >>> another >>>>>>> popular language in Zeppelin, but it also doesn't provide good >>>>>>> code-completion experience, we can do better as well. >>>>>>> >>>>>>> 7. Notifications. >>>>>>> I think notification can be integrated into job scheduling. >>> Notification >>>>>>> can be sent when job is failed/succeed. >>>>>>> >>>>>>> >>>>>>> Let us know which jira you are more interested, and also please >>> consider >>>>>>> how much time you can spent on this. Again, we are very appreciated >>> your >>>>>>> interest on zeppelin and look forward your contribution. >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>> < >>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> >>> < >>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>> < >>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto: >>> morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu <mailto: >>> morkovkin...@phystech.edu>>> 于2019年3月6日周三 >>>>>>> 上午7:41写道: >>>>>>> >>>>>>>> Thank you for your replies! I've checked existing set of issues and >>> found >>>>>>>> several curious ones: >>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very >>>>>>>> nice >>>>>>>> way to increase analytical processing performance using Arrow >>> project; >>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models >>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although >>> there is >>>>>>>> much to think about; >>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance >>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> < >>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be >>> useful in implementing complex >>>>>>>> execution workflows. >>>>>>>> Those tasks are global and intriguing, requiring complex >>> architectural >>>>>>>> solutions. >>>>>>>> Also I've probably found the ticket which is suitable for me to get >>>>>>>> involved into the project: >>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think? >>>>>>>> Are there any "low hanging fruits"? >>>>>>>> >>>>>>>> And I have several ideas on my own. Some of them might be not >>> relevant due >>>>>>>> to the vision of the project or other reasons. Just ideas: >>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be >>> quite >>>>>>>> logical to add more integrations with existing OLAP solutions like >>> Pinot, >>>>>>>> ClickHouse and Druid. Currently I've found integration only with >>> Kylin; >>>>>>>> - Better autocompletion. Jupyter offers not only a list of already >>>>>>>> initialized variables, but also quick access to documentation. It's >>>>>>>> convenient; >>>>>>>> - Notifications. Some colleagues would have appreciated the >>> notifications >>>>>>>> service, which sends you messages (via mail, Slack bot or something >>> else) >>>>>>>> indicating that your long-running paragraphs has completed. >>>>>>>> >>>>>>>> Feedback is very appreciated :) >>>>>>>> >>>>>>>> It would be wonderful if someone agreed to sacrifice his time and >>> become a >>>>>>>> mentor in GSOC program! >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Best regards, Basil Morkovkin. >>>>>>>> >>>>>>>> >>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com >>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto: >>> jongy...@gmail.com>>>: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain >>> what you >>>>>>>>> would like to contribute to? I can add more issues >>>>>>>>> >>>>>>>>> JL >>>>>>>>> >>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto: >>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>> >>> wrote: >>>>>>>>> >>>>>>>>>> Hi, Vasiliy Morkovkin >>>>>>>>>> >>>>>>>>>> Welcome to the zeppelin community! :-) >>>>>>>>>> >>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto: >>> jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:jongy...@gmail.com>>> >>> 写道: >>>>>>>>>>> >>>>>>>>>>> Thanks for contacting Zeppelin with your interest. >>>>>>>>>>> >>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I >>> have >>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with >>> several >>>>>>>>>>> topics including your idea. >>>>>>>>>>> >>>>>>>>>>> Please describe something more. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> JL >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org >>> <mailto:m...@apache.org> <mailto:m...@apache.org <mailto:m...@apache.org>>> >>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Great to see your interest to project. Thanks! >>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend >>> subject >>>>>>>> for >>>>>>>>>>>> GSoC2019. >>>>>>>>>>>> Any ideas? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> moon >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < >>>>>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu> >>> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of >>>>>>>>>> physics >>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of >>>>>>>> GSOC >>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of >>>>>>>>>> months, >>>>>>>>>>>>> using it at my job. But I have found out only one ticket >>> (front-end >>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may >>> have any >>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you >>> don't >>>>>>>> have >>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to >>>>>>>> mentor >>>>>>>>>>>>> these ideas within GSOC :) >>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for >>> 1.5 >>>>>>>>>> year. >>>>>>>>>>>>> I also can write in Java or Python without any problems if >>>>>>>> necessary. >>>>>>>>>>>>> Really fond of databases and highload. Also I have experience >>> with >>>>>>>>>> some >>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, Basil Morkovkin. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < >>> http://madeng.net/>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < >>> http://madeng.net/>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> >>>>>>> Jeff Zhang >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>> >>> >>> >> >> -- >> Best Regards >> >> Jeff Zhang >