Hi, Морковкин I am very happy to be your mentor for GSOC. :-) I believe that by completing this work, I can also learn a lot.
Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович > <morkovkin...@phystech.edu> 写道: > > Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It > makes it easy to impose dependencies on the execution order of tasks. Check > this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ > <https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces the > flow which is shown in the attached picture. > Xun Liu, It would be great to clarify whether you agree to be a mentor > exactly within GSOC, or without it? :) > > ---------------------------------------- > Best regards, Basil Morkovkin > > чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com > <mailto:zjf...@gmail.com>>: > > Thanks Liu for taking over this, I will help review the design. > > Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四 下午4:05写道: > Hi Vasiliy Morkovkin > > Thank you very much for your willingness to implement this feature of > workflow. > I will work with you with the highest priority. > I am planning to update the system design documentation for workflow first at > https://issues.apache.org/jira/browse/ZEPPELIN-4018 > <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > <https://issues.apache.org/jira/browse/ZEPPELIN-4018 > <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> . > Please set the Watcher in ZEPPELIN-4018. > This way you can get notification messages for document updates in a timely > manner. > > We can communicate all the questions in the ZEPPELIN-4018 JIRA comments. > If you need it, you can email me at liuxun...@gmail.com > <mailto:liuxun...@gmail.com> <mailto:liuxun...@gmail.com > <mailto:liuxun...@gmail.com>> , I will reply you the fastest. > Do you think this kind of cooperation is OK? > > > @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our > system design. Thanks! > > :-) > > > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович > > <morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道: > > > > Thank you for such a detailed feedback! > > I am definitely interested to work on the workflow implementation with you > > Xun Liu! Could you become a mentor in GSOC with this task? > > Some front-end work is not a problem at all. > > I'm ready to work at least 30 hours per week in the summer, while now I'd > > like to take some smaller tasks to take a closer look at existing codebase > > and to get familiar with your development workflow. Do you have such tasks > > on mind? > > > > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com > > <mailto:neliu...@163.com> <mailto:neliu...@163.com > > <mailto:neliu...@163.com>>>: > > Hi Vasiliy Morkovkin > > > > I said my thoughts on workflow, > > https://issues.apache.org/jira/browse/ZEPPELIN-4018 > > <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > > <https://issues.apache.org/jira/browse/ZEPPELIN-4018 > > <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> > > > > Because there are more than 20 interpreters in zeppelin, > > Data analysts can be used to do a variety of data development, > > A lot of data development is interdependent. For example, > > the development of machine learning algorithms requires relying on spark to > > preprocess data, and so on. > > > > Now open source workflow software has Azkaban, airflow, > > Azkaban is relatively simple and has been used to meet most scenarios, and > > our company is using it. > > Airflow looks complicated and I have not used it. > > In fact, I have previously implemented workflow workflow for notes and > > paragraphs in zeppelin via azkaban. > > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> > > <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> > > > > However, I think zeppelin should have built-in workflow capabilities. > > Instead of relying on external software to schedule notes in zeppelin for > > the following reasons: > > 1. Now that we have upgraded from the data processing era to the algorithm > > era, > > After zeppelin has its own workflow, it will form a data loop. > > > > 2. zeppelin's powerful interactive processing capabilities help algorithm > > engineers improve productivity and work. > > Zeppelin should give the algorithm engineer more direct control. > > Instead of handing the algorithm to other teams(or software) to do the > > workflow. > > > > 3. zeppelin knows more about the processing status of data than Azkaban and > > airflow. > > So the built-in workflow will have better performance, user experience and > > control. > > > > If you are interested in workflow(ZEPPELIN-4018), > > I am willing to work with you to complete all system design and code > > development work. > > > > :-) > > > >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:zjf...@gmail.com> > >> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道: > >> > >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil, > >> > >> Thanks for your interest in zeppelin, here's my comments about the tickets > >> you interested. > >> > >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> > >> This involves 2 sides of work: frontend and backend: > >> In frontend, we should use arrow js to handle the table data, include > >> display it and processing it (such as aggregation) > >> In backend, we should use arrow for each language, and allow them to > >> exchange data in the same process. And use arrow IPC to exchange data > >> across processes. > >> Overall, this is a pretty large task. If you really want to do, I would > >> suggest you to just take part of it. > >> > >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> > >> Regarding model serving, I don't have clear picture about this. Others > >> can comment on this. > >> > >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> > >> Job scheduling is pretty important for zeppelin, I would make this as > >> the highest priority for zeppelin among these tickets. airflow is one > >> option, but I am open to other solutions. First we need to figure out how > >> user schedule jobs in zeppelin, then choose the right framework. It would > >> also involves some frontend work > >> > >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857> > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 > >> <https://issues.apache.org/jira/browse/ZEPPELIN-3857>> > >> Spark 2.4.0 supporting is already there, but scala 2.12 is not > >> supported yet. It won't be a big project for GSOC IMO. > >> > >> 5. OLAP. > >> Regarding OLAP, as long as the OLAP engine provide Jdbc interface, > >> Zeppelin can support it very well. But we could create specific interpreter > >> for OLAP engine if their native api perform better than jdbc. Another thing > >> I can think of improving OLAP is visualization, although Zeppelin already > >> support some built-in visualization, there's still some visualization > >> missing. We could provide more. > >> > >> 6. Auto-completions. > >> We have already support ipython[1] in zeppelin which provide almost the > >> same auto-completion like jupyter. But it lacks for accessing python api > >> doc. This is also pretty important for python users IMO. SQL is another > >> popular language in Zeppelin, but it also doesn't provide good > >> code-completion experience, we can do better as well. > >> > >> 7. Notifications. > >> I think notification can be integrated into job scheduling. Notification > >> can be sent when job is failed/succeed. > >> > >> > >> Let us know which jira you are more interested, and also please consider > >> how much time you can spent on this. Again, we are very appreciated your > >> interest on zeppelin and look forward your contribution. > >> > >> > >> [1] > >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support > >> > >> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> > >> > >> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support > >> > >> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>> > >> > >> > >> > >> Морковкин, Василий Владимирович <morkovkin...@phystech.edu > >> <mailto:morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu > >> <mailto:morkovkin...@phystech.edu>>> 于2019年3月6日周三 > >> 上午7:41写道: > >> > >>> Thank you for your replies! I've checked existing set of issues and found > >>> several curious ones: > >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651> > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very > >>> nice > >>> way to increase analytical processing performance using Arrow project; > >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994> > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models > >>> regardless of ZeppelinServer sounds quite intriguing too. Although there > >>> is > >>> much to think about; > >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018> > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance > >>> https://airflow.apache.org/ <https://airflow.apache.org/> > >>> <https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be > >>> useful in implementing complex > >>> execution workflows. > >>> Those tasks are global and intriguing, requiring complex architectural > >>> solutions. > >>> Also I've probably found the ticket which is suitable for me to get > >>> involved into the project: > >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857> > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 > >>> <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think? > >>> Are there any "low hanging fruits"? > >>> > >>> And I have several ideas on my own. Some of them might be not relevant due > >>> to the vision of the project or other reasons. Just ideas: > >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite > >>> logical to add more integrations with existing OLAP solutions like Pinot, > >>> ClickHouse and Druid. Currently I've found integration only with Kylin; > >>> - Better autocompletion. Jupyter offers not only a list of already > >>> initialized variables, but also quick access to documentation. It's > >>> convenient; > >>> - Notifications. Some colleagues would have appreciated the notifications > >>> service, which sends you messages (via mail, Slack bot or something else) > >>> indicating that your long-running paragraphs has completed. > >>> > >>> Feedback is very appreciated :) > >>> > >>> It would be wonderful if someone agreed to sacrifice his time and become a > >>> mentor in GSOC program! > >>> > >>> ---------------------------------------- > >>> Best regards, Basil Morkovkin. > >>> > >>> > >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com > >>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com > >>> <mailto:jongy...@gmail.com>>>: > >>> > >>>> Hello, > >>>> > >>>> I've confirmed I could add more issues for GSOC. Can you explain what you > >>>> would like to contribute to? I can add more issues > >>>> > >>>> JL > >>>> > >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com > >>>> <mailto:neliu...@163.com> <mailto:neliu...@163.com > >>>> <mailto:neliu...@163.com>>> wrote: > >>>> > >>>>> Hi, Vasiliy Morkovkin > >>>>> > >>>>> Welcome to the zeppelin community! :-) > >>>>> > >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com > >>>>>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com > >>>>>> <mailto:jongy...@gmail.com>>> 写道: > >>>>>> > >>>>>> Thanks for contacting Zeppelin with your interest. > >>>>>> > >>>>>> I added FE topics for GSOC because FE is the most urgent issue I have > >>>>>> thought about. We always encourage to contribute Zeppelin with several > >>>>>> topics including your idea. > >>>>>> > >>>>>> Please describe something more. > >>>>>> > >>>>>> Thanks. > >>>>>> JL > >>>>>> > >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org > >>>>>> <mailto:m...@apache.org> <mailto:m...@apache.org > >>>>>> <mailto:m...@apache.org>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> Great to see your interest to project. Thanks! > >>>>>>> Looks like we need volunteers for a mentor and some backend subject > >>> for > >>>>>>> GSoC2019. > >>>>>>> Any ideas? > >>>>>>> > >>>>>>> Best, > >>>>>>> moon > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin < > >>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu> > >>>>>>> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of > >>>>> physics > >>>>>>>> and technology and eager to contribute to Zeppelin in context of > >>> GSOC > >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of > >>>>> months, > >>>>>>>> using it at my job. But I have found out only one ticket (front-end > >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any > >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't > >>> have > >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to > >>> mentor > >>>>>>>> these ideas within GSOC :) > >>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5 > >>>>> year. > >>>>>>>> I also can write in Java or Python without any problems if > >>> necessary. > >>>>>>>> Really fond of databases and highload. Also I have experience with > >>>>> some > >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark. > >>>>>>>> > >>>>>>>> Best regards, Basil Morkovkin. > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> 이종열, Jongyoul Lee, 李宗烈 > >>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ > >>>>>> <http://madeng.net/>> > >>>>> > >>>>> > >>>> > >>>> -- > >>>> 이종열, Jongyoul Lee, 李宗烈 > >>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ > >>>> <http://madeng.net/>> > >>>> > >>> > >> > >> > >> -- > >> Best Regards > >> > >> Jeff Zhang > > > > > > -- > Best Regards > > Jeff Zhang