Hi, Jongyoul Lee, Морковкин I queried the information about GSOS. Is it still necessary to apply for the zeppelin community first? I don't know much about GSOS. In addition to helping the project, the mentor What other work needs to be done?
> 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道: > > Hi, Морковкин > > I am very happy to be your mentor for GSOC. :-) > I believe that by completing this work, I can also learn a lot. > > Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 > <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > >> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович >> <morkovkin...@phystech.edu> 写道: >> >> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It >> makes it easy to impose dependencies on the execution order of tasks. Check >> this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ >> <https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces the >> flow which is shown in the attached picture. >> Xun Liu, It would be great to clarify whether you agree to be a mentor >> exactly within GSOC, or without it? :) >> >> ---------------------------------------- >> Best regards, Basil Morkovkin >> >> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com >> <mailto:zjf...@gmail.com>>: >> >> Thanks Liu for taking over this, I will help review the design. >> >> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四 下午4:05写道: >> Hi Vasiliy Morkovkin >> >> Thank you very much for your willingness to implement this feature of >> workflow. >> I will work with you with the highest priority. >> I am planning to update the system design documentation for workflow first >> at https://issues.apache.org/jira/browse/ZEPPELIN-4018 >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> . >> Please set the Watcher in ZEPPELIN-4018. >> This way you can get notification messages for document updates in a timely >> manner. >> >> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. >> If you need it, you can email me at liuxun...@gmail.com >> <mailto:liuxun...@gmail.com> <mailto:liuxun...@gmail.com >> <mailto:liuxun...@gmail.com>> , I will reply you the fastest. >> Do you think this kind of cooperation is OK? >> >> >> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our >> system design. Thanks! >> >> :-) >> >>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович >>> <morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道: >>> >>> Thank you for such a detailed feedback! >>> I am definitely interested to work on the workflow implementation with you >>> Xun Liu! Could you become a mentor in GSOC with this task? >>> Some front-end work is not a problem at all. >>> I'm ready to work at least 30 hours per week in the summer, while now I'd >>> like to take some smaller tasks to take a closer look at existing codebase >>> and to get familiar with your development workflow. Do you have such tasks >>> on mind? >>> >>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com >>> <mailto:neliu...@163.com> <mailto:neliu...@163.com >>> <mailto:neliu...@163.com>>>: >>> Hi Vasiliy Morkovkin >>> >>> I said my thoughts on workflow, >>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>> >>> Because there are more than 20 interpreters in zeppelin, >>> Data analysts can be used to do a variety of data development, >>> A lot of data development is interdependent. For example, >>> the development of machine learning algorithms requires relying on spark to >>> preprocess data, and so on. >>> >>> Now open source workflow software has Azkaban, airflow, >>> Azkaban is relatively simple and has been used to meet most scenarios, and >>> our company is using it. >>> Airflow looks complicated and I have not used it. >>> In fact, I have previously implemented workflow workflow for notes and >>> paragraphs in zeppelin via azkaban. >>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> >>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> >>> >>> However, I think zeppelin should have built-in workflow capabilities. >>> Instead of relying on external software to schedule notes in zeppelin for >>> the following reasons: >>> 1. Now that we have upgraded from the data processing era to the algorithm >>> era, >>> After zeppelin has its own workflow, it will form a data loop. >>> >>> 2. zeppelin's powerful interactive processing capabilities help algorithm >>> engineers improve productivity and work. >>> Zeppelin should give the algorithm engineer more direct control. >>> Instead of handing the algorithm to other teams(or software) to do the >>> workflow. >>> >>> 3. zeppelin knows more about the processing status of data than Azkaban and >>> airflow. >>> So the built-in workflow will have better performance, user experience and >>> control. >>> >>> If you are interested in workflow(ZEPPELIN-4018), >>> I am willing to work with you to complete all system design and code >>> development work. >>> >>> :-) >>> >>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:zjf...@gmail.com> >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道: >>>> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil, >>>> >>>> Thanks for your interest in zeppelin, here's my comments about the tickets >>>> you interested. >>>> >>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> >>>> This involves 2 sides of work: frontend and backend: >>>> In frontend, we should use arrow js to handle the table data, include >>>> display it and processing it (such as aggregation) >>>> In backend, we should use arrow for each language, and allow them to >>>> exchange data in the same process. And use arrow IPC to exchange data >>>> across processes. >>>> Overall, this is a pretty large task. If you really want to do, I would >>>> suggest you to just take part of it. >>>> >>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> >>>> Regarding model serving, I don't have clear picture about this. Others >>>> can comment on this. >>>> >>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> >>>> Job scheduling is pretty important for zeppelin, I would make this as >>>> the highest priority for zeppelin among these tickets. airflow is one >>>> option, but I am open to other solutions. First we need to figure out how >>>> user schedule jobs in zeppelin, then choose the right framework. It would >>>> also involves some frontend work >>>> >>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857> >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857>> >>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not >>>> supported yet. It won't be a big project for GSOC IMO. >>>> >>>> 5. OLAP. >>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, >>>> Zeppelin can support it very well. But we could create specific interpreter >>>> for OLAP engine if their native api perform better than jdbc. Another thing >>>> I can think of improving OLAP is visualization, although Zeppelin already >>>> support some built-in visualization, there's still some visualization >>>> missing. We could provide more. >>>> >>>> 6. Auto-completions. >>>> We have already support ipython[1] in zeppelin which provide almost the >>>> same auto-completion like jupyter. But it lacks for accessing python api >>>> doc. This is also pretty important for python users IMO. SQL is another >>>> popular language in Zeppelin, but it also doesn't provide good >>>> code-completion experience, we can do better as well. >>>> >>>> 7. Notifications. >>>> I think notification can be integrated into job scheduling. Notification >>>> can be sent when job is failed/succeed. >>>> >>>> >>>> Let us know which jira you are more interested, and also please consider >>>> how much time you can spent on this. Again, we are very appreciated your >>>> interest on zeppelin and look forward your contribution. >>>> >>>> >>>> [1] >>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>>> >>>> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> >>>> >>>> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support >>>> >>>> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>> >>>> >>>> >>>> >>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu >>>> <mailto:morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu >>>> <mailto:morkovkin...@phystech.edu>>> 于2019年3月6日周三 >>>> 上午7:41写道: >>>> >>>>> Thank you for your replies! I've checked existing set of issues and found >>>>> several curious ones: >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very >>>>> nice >>>>> way to increase analytical processing performance using Arrow project; >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models >>>>> regardless of ZeppelinServer sounds quite intriguing too. Although there >>>>> is >>>>> much to think about; >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance >>>>> https://airflow.apache.org/ <https://airflow.apache.org/> >>>>> <https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be >>>>> useful in implementing complex >>>>> execution workflows. >>>>> Those tasks are global and intriguing, requiring complex architectural >>>>> solutions. >>>>> Also I've probably found the ticket which is suitable for me to get >>>>> involved into the project: >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857> >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 >>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think? >>>>> Are there any "low hanging fruits"? >>>>> >>>>> And I have several ideas on my own. Some of them might be not relevant due >>>>> to the vision of the project or other reasons. Just ideas: >>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite >>>>> logical to add more integrations with existing OLAP solutions like Pinot, >>>>> ClickHouse and Druid. Currently I've found integration only with Kylin; >>>>> - Better autocompletion. Jupyter offers not only a list of already >>>>> initialized variables, but also quick access to documentation. It's >>>>> convenient; >>>>> - Notifications. Some colleagues would have appreciated the notifications >>>>> service, which sends you messages (via mail, Slack bot or something else) >>>>> indicating that your long-running paragraphs has completed. >>>>> >>>>> Feedback is very appreciated :) >>>>> >>>>> It would be wonderful if someone agreed to sacrifice his time and become a >>>>> mentor in GSOC program! >>>>> >>>>> ---------------------------------------- >>>>> Best regards, Basil Morkovkin. >>>>> >>>>> >>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com >>>>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com >>>>> <mailto:jongy...@gmail.com>>>: >>>>> >>>>>> Hello, >>>>>> >>>>>> I've confirmed I could add more issues for GSOC. Can you explain what you >>>>>> would like to contribute to? I can add more issues >>>>>> >>>>>> JL >>>>>> >>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com >>>>>> <mailto:neliu...@163.com> <mailto:neliu...@163.com >>>>>> <mailto:neliu...@163.com>>> wrote: >>>>>> >>>>>>> Hi, Vasiliy Morkovkin >>>>>>> >>>>>>> Welcome to the zeppelin community! :-) >>>>>>> >>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com >>>>>>>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com >>>>>>>> <mailto:jongy...@gmail.com>>> 写道: >>>>>>>> >>>>>>>> Thanks for contacting Zeppelin with your interest. >>>>>>>> >>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I have >>>>>>>> thought about. We always encourage to contribute Zeppelin with several >>>>>>>> topics including your idea. >>>>>>>> >>>>>>>> Please describe something more. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> JL >>>>>>>> >>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org >>>>>>>> <mailto:m...@apache.org> <mailto:m...@apache.org >>>>>>>> <mailto:m...@apache.org>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Great to see your interest to project. Thanks! >>>>>>>>> Looks like we need volunteers for a mentor and some backend subject >>>>> for >>>>>>>>> GSoC2019. >>>>>>>>> Any ideas? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> moon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < >>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu> >>>>>>>>> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of >>>>>>> physics >>>>>>>>>> and technology and eager to contribute to Zeppelin in context of >>>>> GSOC >>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of >>>>>>> months, >>>>>>>>>> using it at my job. But I have found out only one ticket (front-end >>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any >>>>>>>>>> ideas for new features or improvements in Zeppelin, but you don't >>>>> have >>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to >>>>> mentor >>>>>>>>>> these ideas within GSOC :) >>>>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5 >>>>>>> year. >>>>>>>>>> I also can write in Java or Python without any problems if >>>>> necessary. >>>>>>>>>> Really fond of databases and highload. Also I have experience with >>>>>>> some >>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. >>>>>>>>>> >>>>>>>>>> Best regards, Basil Morkovkin. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ >>>>>>>> <http://madeng.net/>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> 이종열, Jongyoul Lee, 李宗烈 >>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ >>>>>> <http://madeng.net/>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >