Hi Liu, See this link https://community.apache.org/gsoc.html
Xun Liu <neliu...@163.com> 于2019年3月8日周五 下午1:58写道: > Hi, Jongyoul Lee, Морковкин > > I queried the information about GSOS. Is it still necessary to apply for > the zeppelin community first? > I don't know much about GSOS. In addition to helping the project, the > mentor > What other work needs to be done? > > > 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道: > > > > Hi, Морковкин > > > > I am very happy to be your mentor for GSOC. :-) > > I believe that by completing this work, I can also learn a lot. > > > > Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> > > > >> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович < > morkovkin...@phystech.edu> 写道: > >> > >> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. > It makes it easy to impose dependencies on the execution order of tasks. > Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ < > https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces > the flow which is shown in the attached picture. > >> Xun Liu, It would be great to clarify whether you agree to be a mentor > exactly within GSOC, or without it? :) > >> > >> ---------------------------------------- > >> Best regards, Basil Morkovkin > >> > >> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com <mailto: > zjf...@gmail.com>>: > >> > >> Thanks Liu for taking over this, I will help review the design. > >> > >> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四 > 下午4:05写道: > >> Hi Vasiliy Morkovkin > >> > >> Thank you very much for your willingness to implement this feature of > workflow. > >> I will work with you with the highest priority. > >> I am planning to update the system design documentation for workflow > first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> < > https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018>> . > >> Please set the Watcher in ZEPPELIN-4018. > >> This way you can get notification messages for document updates in a > timely manner. > >> > >> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. > >> If you need it, you can email me at liuxun...@gmail.com <mailto: > liuxun...@gmail.com> <mailto:liuxun...@gmail.com <mailto: > liuxun...@gmail.com>> , I will reply you the fastest. > >> Do you think this kind of cooperation is OK? > >> > >> > >> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our > system design. Thanks! > >> > >> :-) > >> > >>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович < > morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道: > >>> > >>> Thank you for such a detailed feedback! > >>> I am definitely interested to work on the workflow implementation with > you Xun Liu! Could you become a mentor in GSOC with this task? > >>> Some front-end work is not a problem at all. > >>> I'm ready to work at least 30 hours per week in the summer, while now > I'd like to take some smaller tasks to take a closer look at existing > codebase and to get familiar with your development workflow. Do you have > such tasks on mind? > >>> > >>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto: > neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>: > >>> Hi Vasiliy Morkovkin > >>> > >>> I said my thoughts on workflow, > https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> < > https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018>> > >>> > >>> Because there are more than 20 interpreters in zeppelin, > >>> Data analysts can be used to do a variety of data development, > >>> A lot of data development is interdependent. For example, > >>> the development of machine learning algorithms requires relying on > spark to preprocess data, and so on. > >>> > >>> Now open source workflow software has Azkaban, airflow, > >>> Azkaban is relatively simple and has been used to meet most scenarios, > and our company is using it. > >>> Airflow looks complicated and I have not used it. > >>> In fact, I have previously implemented workflow workflow for notes and > paragraphs in zeppelin via azkaban. > >>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> > <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> > >>> > >>> However, I think zeppelin should have built-in workflow capabilities. > >>> Instead of relying on external software to schedule notes in zeppelin > for the following reasons: > >>> 1. Now that we have upgraded from the data processing era to the > algorithm era, > >>> After zeppelin has its own workflow, it will form a data loop. > >>> > >>> 2. zeppelin's powerful interactive processing capabilities help > algorithm engineers improve productivity and work. > >>> Zeppelin should give the algorithm engineer more direct control. > >>> Instead of handing the algorithm to other teams(or software) to do the > workflow. > >>> > >>> 3. zeppelin knows more about the processing status of data than > Azkaban and airflow. > >>> So the built-in workflow will have better performance, user experience > and control. > >>> > >>> If you are interested in workflow(ZEPPELIN-4018), > >>> I am willing to work with you to complete all system design and code > development work. > >>> > >>> :-) > >>> > >>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto: > zjf...@gmail.com> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道: > >>>> > >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < > https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> < > https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi < > https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil, > >>>> > >>>> Thanks for your interest in zeppelin, here's my comments about the > tickets > >>>> you interested. > >>>> > >>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 < > https://issues.apache.org/jira/browse/ZEPPELIN-3651> < > https://issues.apache.org/jira/browse/ZEPPELIN-3651 < > https://issues.apache.org/jira/browse/ZEPPELIN-3651>> > >>>> This involves 2 sides of work: frontend and backend: > >>>> In frontend, we should use arrow js to handle the table data, > include > >>>> display it and processing it (such as aggregation) > >>>> In backend, we should use arrow for each language, and allow them to > >>>> exchange data in the same process. And use arrow IPC to exchange data > >>>> across processes. > >>>> Overall, this is a pretty large task. If you really want to do, I > would > >>>> suggest you to just take part of it. > >>>> > >>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 < > https://issues.apache.org/jira/browse/ZEPPELIN-3994> < > https://issues.apache.org/jira/browse/ZEPPELIN-3994 < > https://issues.apache.org/jira/browse/ZEPPELIN-3994>> > >>>> Regarding model serving, I don't have clear picture about this. > Others > >>>> can comment on this. > >>>> > >>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> < > https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018>> > >>>> Job scheduling is pretty important for zeppelin, I would make this > as > >>>> the highest priority for zeppelin among these tickets. airflow is one > >>>> option, but I am open to other solutions. First we need to figure out > how > >>>> user schedule jobs in zeppelin, then choose the right framework. It > would > >>>> also involves some frontend work > >>>> > >>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 < > https://issues.apache.org/jira/browse/ZEPPELIN-3857> < > https://issues.apache.org/jira/browse/ZEPPELIN-3857 < > https://issues.apache.org/jira/browse/ZEPPELIN-3857>> > >>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not > >>>> supported yet. It won't be a big project for GSOC IMO. > >>>> > >>>> 5. OLAP. > >>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, > >>>> Zeppelin can support it very well. But we could create specific > interpreter > >>>> for OLAP engine if their native api perform better than jdbc. Another > thing > >>>> I can think of improving OLAP is visualization, although Zeppelin > already > >>>> support some built-in visualization, there's still some visualization > >>>> missing. We could provide more. > >>>> > >>>> 6. Auto-completions. > >>>> We have already support ipython[1] in zeppelin which provide almost > the > >>>> same auto-completion like jupyter. But it lacks for accessing python > api > >>>> doc. This is also pretty important for python users IMO. SQL is > another > >>>> popular language in Zeppelin, but it also doesn't provide good > >>>> code-completion experience, we can do better as well. > >>>> > >>>> 7. Notifications. > >>>> I think notification can be integrated into job scheduling. > Notification > >>>> can be sent when job is failed/succeed. > >>>> > >>>> > >>>> Let us know which jira you are more interested, and also please > consider > >>>> how much time you can spent on this. Again, we are very appreciated > your > >>>> interest on zeppelin and look forward your contribution. > >>>> > >>>> > >>>> [1] > >>>> > http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support > < > http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> > < > http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support > < > http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support > >> > >>>> > >>>> > >>>> > >>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto: > morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu <mailto: > morkovkin...@phystech.edu>>> 于2019年3月6日周三 > >>>> 上午7:41写道: > >>>> > >>>>> Thank you for your replies! I've checked existing set of issues and > found > >>>>> several curious ones: > >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 < > https://issues.apache.org/jira/browse/ZEPPELIN-3651> < > https://issues.apache.org/jira/browse/ZEPPELIN-3651 < > https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very > >>>>> nice > >>>>> way to increase analytical processing performance using Arrow > project; > >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 < > https://issues.apache.org/jira/browse/ZEPPELIN-3994> < > https://issues.apache.org/jira/browse/ZEPPELIN-3994 < > https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models > >>>>> regardless of ZeppelinServer sounds quite intriguing too. Although > there is > >>>>> much to think about; > >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018> < > https://issues.apache.org/jira/browse/ZEPPELIN-4018 < > https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance > >>>>> https://airflow.apache.org/ <https://airflow.apache.org/> < > https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be > useful in implementing complex > >>>>> execution workflows. > >>>>> Those tasks are global and intriguing, requiring complex > architectural > >>>>> solutions. > >>>>> Also I've probably found the ticket which is suitable for me to get > >>>>> involved into the project: > >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 < > https://issues.apache.org/jira/browse/ZEPPELIN-3857> < > https://issues.apache.org/jira/browse/ZEPPELIN-3857 < > https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think? > >>>>> Are there any "low hanging fruits"? > >>>>> > >>>>> And I have several ideas on my own. Some of them might be not > relevant due > >>>>> to the vision of the project or other reasons. Just ideas: > >>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be > quite > >>>>> logical to add more integrations with existing OLAP solutions like > Pinot, > >>>>> ClickHouse and Druid. Currently I've found integration only with > Kylin; > >>>>> - Better autocompletion. Jupyter offers not only a list of already > >>>>> initialized variables, but also quick access to documentation. It's > >>>>> convenient; > >>>>> - Notifications. Some colleagues would have appreciated the > notifications > >>>>> service, which sends you messages (via mail, Slack bot or something > else) > >>>>> indicating that your long-running paragraphs has completed. > >>>>> > >>>>> Feedback is very appreciated :) > >>>>> > >>>>> It would be wonderful if someone agreed to sacrifice his time and > become a > >>>>> mentor in GSOC program! > >>>>> > >>>>> ---------------------------------------- > >>>>> Best regards, Basil Morkovkin. > >>>>> > >>>>> > >>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com > <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto: > jongy...@gmail.com>>>: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> I've confirmed I could add more issues for GSOC. Can you explain > what you > >>>>>> would like to contribute to? I can add more issues > >>>>>> > >>>>>> JL > >>>>>> > >>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto: > neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>> > wrote: > >>>>>> > >>>>>>> Hi, Vasiliy Morkovkin > >>>>>>> > >>>>>>> Welcome to the zeppelin community! :-) > >>>>>>> > >>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto: > jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:jongy...@gmail.com>>> > 写道: > >>>>>>>> > >>>>>>>> Thanks for contacting Zeppelin with your interest. > >>>>>>>> > >>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I > have > >>>>>>>> thought about. We always encourage to contribute Zeppelin with > several > >>>>>>>> topics including your idea. > >>>>>>>> > >>>>>>>> Please describe something more. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> JL > >>>>>>>> > >>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org > <mailto:m...@apache.org> <mailto:m...@apache.org <mailto:m...@apache.org>>> > wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> Great to see your interest to project. Thanks! > >>>>>>>>> Looks like we need volunteers for a mentor and some backend > subject > >>>>> for > >>>>>>>>> GSoC2019. > >>>>>>>>> Any ideas? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> moon > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < > >>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu> > <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of > >>>>>>> physics > >>>>>>>>>> and technology and eager to contribute to Zeppelin in context of > >>>>> GSOC > >>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of > >>>>>>> months, > >>>>>>>>>> using it at my job. But I have found out only one ticket > (front-end > >>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may > have any > >>>>>>>>>> ideas for new features or improvements in Zeppelin, but you > don't > >>>>> have > >>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to > >>>>> mentor > >>>>>>>>>> these ideas within GSOC :) > >>>>>>>>>> Currently I am in a position of Scala developer (back-end) for > 1.5 > >>>>>>> year. > >>>>>>>>>> I also can write in Java or Python without any problems if > >>>>> necessary. > >>>>>>>>>> Really fond of databases and highload. Also I have experience > with > >>>>>>> some > >>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. > >>>>>>>>>> > >>>>>>>>>> Best regards, Basil Morkovkin. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> 이종열, Jongyoul Lee, 李宗烈 > >>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < > http://madeng.net/>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> 이종열, Jongyoul Lee, 李宗烈 > >>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ < > http://madeng.net/>> > >>>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Best Regards > >>>> > >>>> Jeff Zhang > >>> > >> > >> > >> > >> -- > >> Best Regards > >> > >> Jeff Zhang > > > > > -- Best Regards Jeff Zhang