Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It makes it easy to impose dependencies on the execution order of tasks. Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ . It reproduces the flow which is shown in the attached picture. Xun Liu, It would be great to clarify whether you agree to be a mentor exactly within GSOC, or without it? :)
---------------------------------------- Best regards, Basil Morkovkin чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com>: > > Thanks Liu for taking over this, I will help review the design. > > Xun Liu <neliu...@163.com> 于2019年3月7日周四 下午4:05写道: > >> Hi Vasiliy Morkovkin >> >> Thank you very much for your willingness to implement this feature of >> workflow. >> I will work with you with the highest priority. >> I am planning to update the system design documentation for workflow >> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> . >> Please set the Watcher in ZEPPELIN-4018. >> This way you can get notification messages for document updates in a >> timely manner. >> >> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. >> If you need it, you can email me at liuxun...@gmail.com <mailto: >> liuxun...@gmail.com> , I will reply you the fastest. >> Do you think this kind of cooperation is OK? >> >> >> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our >> system design. Thanks! >> >> :-) >> >> > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович < >> morkovkin...@phystech.edu> 写道: >> > >> > Thank you for such a detailed feedback! >> > I am definitely interested to work on the workflow implementation with >> you Xun Liu! Could you become a mentor in GSOC with this task? >> > Some front-end work is not a problem at all. >> > I'm ready to work at least 30 hours per week in the summer, while now >> I'd like to take some smaller tasks to take a closer look at existing >> codebase and to get familiar with your development workflow. Do you have >> such tasks on mind? >> > >> > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto: >> neliu...@163.com>>: >> > Hi Vasiliy Morkovkin >> > >> > I said my thoughts on workflow, >> https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> >> > >> > Because there are more than 20 interpreters in zeppelin, >> > Data analysts can be used to do a variety of data development, >> > A lot of data development is interdependent. For example, >> > the development of machine learning algorithms requires relying on >> spark to preprocess data, and so on. >> > >> > Now open source workflow software has Azkaban, airflow, >> > Azkaban is relatively simple and has been used to meet most scenarios, >> and our company is using it. >> > Airflow looks complicated and I have not used it. >> > In fact, I have previously implemented workflow workflow for notes and >> paragraphs in zeppelin via azkaban. >> > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> >> > >> > However, I think zeppelin should have built-in workflow capabilities. >> > Instead of relying on external software to schedule notes in zeppelin >> for the following reasons: >> > 1. Now that we have upgraded from the data processing era to the >> algorithm era, >> > After zeppelin has its own workflow, it will form a data loop. >> > >> > 2. zeppelin's powerful interactive processing capabilities help >> algorithm engineers improve productivity and work. >> > Zeppelin should give the algorithm engineer more direct control. >> > Instead of handing the algorithm to other teams(or software) to do the >> workflow. >> > >> > 3. zeppelin knows more about the processing status of data than Azkaban >> and airflow. >> > So the built-in workflow will have better performance, user experience >> and control. >> > >> > If you are interested in workflow(ZEPPELIN-4018), >> > I am willing to work with you to complete all system design and code >> development work. >> > >> > :-) >> > >> >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto: >> zjf...@gmail.com>> 写道: >> >> >> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil, >> >> >> >> Thanks for your interest in zeppelin, here's my comments about the >> tickets >> >> you interested. >> >> >> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651> >> >> This involves 2 sides of work: frontend and backend: >> >> In frontend, we should use arrow js to handle the table data, >> include >> >> display it and processing it (such as aggregation) >> >> In backend, we should use arrow for each language, and allow them to >> >> exchange data in the same process. And use arrow IPC to exchange data >> >> across processes. >> >> Overall, this is a pretty large task. If you really want to do, I >> would >> >> suggest you to just take part of it. >> >> >> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994> >> >> Regarding model serving, I don't have clear picture about this. >> Others >> >> can comment on this. >> >> >> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> >> >> Job scheduling is pretty important for zeppelin, I would make this >> as >> >> the highest priority for zeppelin among these tickets. airflow is one >> >> option, but I am open to other solutions. First we need to figure out >> how >> >> user schedule jobs in zeppelin, then choose the right framework. It >> would >> >> also involves some frontend work >> >> >> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857> >> >> Spark 2.4.0 supporting is already there, but scala 2.12 is not >> >> supported yet. It won't be a big project for GSOC IMO. >> >> >> >> 5. OLAP. >> >> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, >> >> Zeppelin can support it very well. But we could create specific >> interpreter >> >> for OLAP engine if their native api perform better than jdbc. Another >> thing >> >> I can think of improving OLAP is visualization, although Zeppelin >> already >> >> support some built-in visualization, there's still some visualization >> >> missing. We could provide more. >> >> >> >> 6. Auto-completions. >> >> We have already support ipython[1] in zeppelin which provide almost >> the >> >> same auto-completion like jupyter. But it lacks for accessing python >> api >> >> doc. This is also pretty important for python users IMO. SQL is another >> >> popular language in Zeppelin, but it also doesn't provide good >> >> code-completion experience, we can do better as well. >> >> >> >> 7. Notifications. >> >> I think notification can be integrated into job scheduling. >> Notification >> >> can be sent when job is failed/succeed. >> >> >> >> >> >> Let us know which jira you are more interested, and also please >> consider >> >> how much time you can spent on this. Again, we are very appreciated >> your >> >> interest on zeppelin and look forward your contribution. >> >> >> >> >> >> [1] >> >> >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >> < >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >> > >> >> >> >> >> >> >> >> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto: >> morkovkin...@phystech.edu>> 于2019年3月6日周三 >> >> 上午7:41写道: >> >> >> >>> Thank you for your replies! I've checked existing set of issues and >> found >> >>> several curious ones: >> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very >> >>> nice >> >>> way to increase analytical processing performance using Arrow project; >> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models >> >>> regardless of ZeppelinServer sounds quite intriguing too. Although >> there is >> >>> much to think about; >> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 < >> https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance >> >>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to >> be useful in implementing complex >> >>> execution workflows. >> >>> Those tasks are global and intriguing, requiring complex architectural >> >>> solutions. >> >>> Also I've probably found the ticket which is suitable for me to get >> >>> involved into the project: >> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 < >> https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think? >> >>> Are there any "low hanging fruits"? >> >>> >> >>> And I have several ideas on my own. Some of them might be not >> relevant due >> >>> to the vision of the project or other reasons. Just ideas: >> >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite >> >>> logical to add more integrations with existing OLAP solutions like >> Pinot, >> >>> ClickHouse and Druid. Currently I've found integration only with >> Kylin; >> >>> - Better autocompletion. Jupyter offers not only a list of already >> >>> initialized variables, but also quick access to documentation. It's >> >>> convenient; >> >>> - Notifications. Some colleagues would have appreciated the >> notifications >> >>> service, which sends you messages (via mail, Slack bot or something >> else) >> >>> indicating that your long-running paragraphs has completed. >> >>> >> >>> Feedback is very appreciated :) >> >>> >> >>> It would be wonderful if someone agreed to sacrifice his time and >> become a >> >>> mentor in GSOC program! >> >>> >> >>> ---------------------------------------- >> >>> Best regards, Basil Morkovkin. >> >>> >> >>> >> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com <mailto: >> jongy...@gmail.com>>: >> >>> >> >>>> Hello, >> >>>> >> >>>> I've confirmed I could add more issues for GSOC. Can you explain >> what you >> >>>> would like to contribute to? I can add more issues >> >>>> >> >>>> JL >> >>>> >> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto: >> neliu...@163.com>> wrote: >> >>>> >> >>>>> Hi, Vasiliy Morkovkin >> >>>>> >> >>>>> Welcome to the zeppelin community! :-) >> >>>>> >> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto: >> jongy...@gmail.com>> 写道: >> >>>>>> >> >>>>>> Thanks for contacting Zeppelin with your interest. >> >>>>>> >> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I >> have >> >>>>>> thought about. We always encourage to contribute Zeppelin with >> several >> >>>>>> topics including your idea. >> >>>>>> >> >>>>>> Please describe something more. >> >>>>>> >> >>>>>> Thanks. >> >>>>>> JL >> >>>>>> >> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org >> <mailto:m...@apache.org>> wrote: >> >>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> Great to see your interest to project. Thanks! >> >>>>>>> Looks like we need volunteers for a mentor and some backend >> subject >> >>> for >> >>>>>>> GSoC2019. >> >>>>>>> Any ideas? >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> moon >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < >> >>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of >> >>>>> physics >> >>>>>>>> and technology and eager to contribute to Zeppelin in context of >> >>> GSOC >> >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of >> >>>>> months, >> >>>>>>>> using it at my job. But I have found out only one ticket >> (front-end >> >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have >> any >> >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't >> >>> have >> >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to >> >>> mentor >> >>>>>>>> these ideas within GSOC :) >> >>>>>>>> Currently I am in a position of Scala developer (back-end) for >> 1.5 >> >>>>> year. >> >>>>>>>> I also can write in Java or Python without any problems if >> >>> necessary. >> >>>>>>>> Really fond of databases and highload. Also I have experience >> with >> >>>>> some >> >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. >> >>>>>>>> >> >>>>>>>> Best regards, Basil Morkovkin. >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> 이종열, Jongyoul Lee, 李宗烈 >> >>>>>> http://madeng.net <http://madeng.net/> >> >>>>> >> >>>>> >> >>>> >> >>>> -- >> >>>> 이종열, Jongyoul Lee, 李宗烈 >> >>>> http://madeng.net <http://madeng.net/> >> >>>> >> >>> >> >> >> >> >> >> -- >> >> Best Regards >> >> >> >> Jeff Zhang >> > >> >> > > -- > Best Regards > > Jeff Zhang >