Hi Felix Cheung

Thank you for your Suggest.


> 在 2019年3月11日,上午5:47,Felix Cheung <felixcheun...@hotmail.com> 写道:
> 
> Hi Xun,
> 
> Thanks for your work - could you change the title of the email, I think you 
> will get more attention to your ask to review the design.
> 
> 
> ________________________________
> From: Xun Liu <neliu...@163.com>
> Sent: Sunday, March 10, 2019 12:03 AM
> To: Jongyoul Lee; m...@apache.org; Jeff Zhang; Vasiliy Morkovkin
> Cc: dev@zeppelin.apache.org
> Subject: Re: Zeppelin in GSOC 2019
> 
> Hello, everyone,
> 
> I have completed the zeppelin workflow system design, please review, you can 
> directly modify the document or fill in the comments.
> 
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 
> <https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc: 
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#
>  
> <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#>
> 
> :-)
> 
>> 在 2019年3月8日,下午2:10,Jeff Zhang <zjf...@gmail.com> 写道:
>> 
>> Hi Liu,
>> 
>> See this link https://community.apache.org/gsoc.html
>> 
>> 
>> Xun Liu <neliu...@163.com> 于2019年3月8日周五 下午1:58写道:
>> 
>>> Hi, Jongyoul Lee, Морковкин
>>> 
>>> I queried the information about GSOS. Is it still necessary to apply for
>>> the zeppelin community first?
>>> I don't know much about GSOS. In addition to helping the project, the
>>> mentor
>>> What other work needs to be done?
>>> 
>>>> 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道:
>>>> 
>>>> Hi, Морковкин
>>>> 
>>>> I am very happy to be your mentor for GSOC. :-)
>>>> I believe that by completing this work, I can also learn a lot.
>>>> 
>>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>>> 
>>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>>> morkovkin...@phystech.edu> 写道:
>>>>> 
>>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>>> It makes it easy to impose dependencies on the execution order of tasks.
>>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>>> the flow which is shown in the attached picture.
>>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor
>>> exactly within GSOC, or without it? :)
>>>>> 
>>>>> ----------------------------------------
>>>>> Best regards, Basil Morkovkin
>>>>> 
>>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com <mailto:
>>> zjf...@gmail.com>>:
>>>>> 
>>>>> Thanks Liu for taking over this, I will help review the design.
>>>>> 
>>>>> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四
>>> 下午4:05写道:
>>>>> Hi Vasiliy Morkovkin
>>>>> 
>>>>> Thank you very much for your willingness to implement this feature of
>>> workflow.
>>>>> I will work with you with the highest priority.
>>>>> I am planning to update the system design documentation for workflow
>>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>>>>> Please set the Watcher in ZEPPELIN-4018.
>>>>> This way you can get notification messages for document updates in a
>>> timely manner.
>>>>> 
>>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>>>>> If you need it, you can email me at liuxun...@gmail.com <mailto:
>>> liuxun...@gmail.com> <mailto:liuxun...@gmail.com <mailto:
>>> liuxun...@gmail.com>> , I will reply you the fastest.
>>>>> Do you think this kind of cooperation is OK?
>>>>> 
>>>>> 
>>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>>> system design. Thanks!
>>>>> 
>>>>> :-)
>>>>> 
>>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道:
>>>>>> 
>>>>>> Thank you for such a detailed feedback!
>>>>>> I am definitely interested to work on the workflow implementation with
>>> you Xun Liu! Could you become a mentor in GSOC with this task?
>>>>>> Some front-end work is not a problem at all.
>>>>>> I'm ready to work at least 30 hours per week in the summer, while now
>>> I'd like to take some smaller tasks to take a closer look at existing
>>> codebase and to get familiar with your development workflow. Do you have
>>> such tasks on mind?
>>>>>> 
>>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto:
>>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>:
>>>>>> Hi Vasiliy Morkovkin
>>>>>> 
>>>>>> I said my thoughts on workflow,
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>> 
>>>>>> Because there are more than 20 interpreters in zeppelin,
>>>>>> Data analysts can be used to do a variety of data development,
>>>>>> A lot of data development is interdependent. For example,
>>>>>> the development of machine learning algorithms requires relying on
>>> spark to preprocess data, and so on.
>>>>>> 
>>>>>> Now open source workflow software has Azkaban, airflow,
>>>>>> Azkaban is relatively simple and has been used to meet most scenarios,
>>> and our company is using it.
>>>>>> Airflow looks complicated and I have not used it.
>>>>>> In fact, I have previously implemented workflow workflow for notes and
>>> paragraphs in zeppelin via azkaban.
>>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
>>>>>> 
>>>>>> However, I think zeppelin should have built-in workflow capabilities.
>>>>>> Instead of relying on external software to schedule notes in zeppelin
>>> for the following reasons:
>>>>>> 1. Now that we have upgraded from the data processing era to the
>>> algorithm era,
>>>>>> After zeppelin has its own workflow, it will form a data loop.
>>>>>> 
>>>>>> 2. zeppelin's powerful interactive processing capabilities help
>>> algorithm engineers improve productivity and work.
>>>>>> Zeppelin should give the algorithm engineer more direct control.
>>>>>> Instead of handing the algorithm to other teams(or software) to do the
>>> workflow.
>>>>>> 
>>>>>> 3. zeppelin knows more about the processing status of data than
>>> Azkaban and airflow.
>>>>>> So the built-in workflow will have better performance, user experience
>>> and control.
>>>>>> 
>>>>>> If you are interested in workflow(ZEPPELIN-4018),
>>>>>> I am willing to work with you to complete all system design and code
>>> development work.
>>>>>> 
>>>>>> :-)
>>>>>> 
>>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:
>>> zjf...@gmail.com> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道:
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>>>>> 
>>>>>>> Thanks for your interest in zeppelin, here's my comments about the
>>> tickets
>>>>>>> you interested.
>>>>>>> 
>>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>>>> This involves 2 sides of work: frontend and backend:
>>>>>>> In frontend, we should use arrow js to handle the table data,
>>> include
>>>>>>> display it and processing it (such as aggregation)
>>>>>>> In backend, we should use arrow for each language, and allow them to
>>>>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>>>>> across processes.
>>>>>>> Overall, this is a pretty large task. If you really want to do, I
>>> would
>>>>>>> suggest you to just take part of it.
>>>>>>> 
>>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>>>> Regarding model serving, I don't have clear picture about this.
>>> Others
>>>>>>> can comment on this.
>>>>>>> 
>>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>>> Job scheduling is pretty important for zeppelin, I would make this
>>> as
>>>>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>>>>> option, but I am open to other solutions. First we need to figure out
>>> how
>>>>>>> user schedule jobs in zeppelin, then choose the right framework. It
>>> would
>>>>>>> also involves some frontend work
>>>>>>> 
>>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>>>>> supported yet. It won't be a big project for GSOC IMO.
>>>>>>> 
>>>>>>> 5. OLAP.
>>>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>>>>> Zeppelin can support it very well. But we could create specific
>>> interpreter
>>>>>>> for OLAP engine if their native api perform better than jdbc. Another
>>> thing
>>>>>>> I can think of improving OLAP is visualization, although Zeppelin
>>> already
>>>>>>> support some built-in visualization, there's still some visualization
>>>>>>> missing. We could provide more.
>>>>>>> 
>>>>>>> 6. Auto-completions.
>>>>>>> We have already support ipython[1] in zeppelin which provide almost
>>> the
>>>>>>> same auto-completion like jupyter. But it lacks for accessing python
>>> api
>>>>>>> doc. This is also pretty important for python users IMO. SQL is
>>> another
>>>>>>> popular language in Zeppelin, but it also doesn't provide good
>>>>>>> code-completion experience, we can do better as well.
>>>>>>> 
>>>>>>> 7. Notifications.
>>>>>>> I think notification can be integrated into job scheduling.
>>> Notification
>>>>>>> can be sent when job is failed/succeed.
>>>>>>> 
>>>>>>> 
>>>>>>> Let us know which jira you are more interested, and also please
>>> consider
>>>>>>> how much time you can spent on this. Again, we are very appreciated
>>> your
>>>>>>> interest on zeppelin and look forward your contribution.
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto:
>>> morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu <mailto:
>>> morkovkin...@phystech.edu>>> 于2019年3月6日周三
>>>>>>> 上午7:41写道:
>>>>>>> 
>>>>>>>> Thank you for your replies! I've checked existing set of issues and
>>> found
>>>>>>>> several curious ones:
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>>>>> nice
>>>>>>>> way to increase analytical processing performance using Arrow
>>> project;
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
>>> there is
>>>>>>>> much to think about;
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
>>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
>>> useful in implementing complex
>>>>>>>> execution workflows.
>>>>>>>> Those tasks are global and intriguing, requiring complex
>>> architectural
>>>>>>>> solutions.
>>>>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>>>>> involved into the project:
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>>>>> Are there any "low hanging fruits"?
>>>>>>>> 
>>>>>>>> And I have several ideas on my own. Some of them might be not
>>> relevant due
>>>>>>>> to the vision of the project or other reasons. Just ideas:
>>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
>>> quite
>>>>>>>> logical to add more integrations with existing OLAP solutions like
>>> Pinot,
>>>>>>>> ClickHouse and Druid. Currently I've found integration only with
>>> Kylin;
>>>>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>>>>> initialized variables, but also quick access to documentation. It's
>>>>>>>> convenient;
>>>>>>>> - Notifications. Some colleagues would have appreciated the
>>> notifications
>>>>>>>> service, which sends you messages (via mail, Slack bot or something
>>> else)
>>>>>>>> indicating that your long-running paragraphs has completed.
>>>>>>>> 
>>>>>>>> Feedback is very appreciated :)
>>>>>>>> 
>>>>>>>> It would be wonderful if someone agreed to sacrifice his time and
>>> become a
>>>>>>>> mentor in GSOC program!
>>>>>>>> 
>>>>>>>> ----------------------------------------
>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com
>>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:
>>> jongy...@gmail.com>>>:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain
>>> what you
>>>>>>>>> would like to contribute to? I can add more issues
>>>>>>>>> 
>>>>>>>>> JL
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto:
>>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>
>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>>>>> 
>>>>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto:
>>> jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:jongy...@gmail.com>>>
>>> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>>>>> 
>>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>>> have
>>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with
>>> several
>>>>>>>>>>> topics including your idea.
>>>>>>>>>>> 
>>>>>>>>>>> Please describe something more.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> JL
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org
>>> <mailto:m...@apache.org> <mailto:m...@apache.org <mailto:m...@apache.org>>>
>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend
>>> subject
>>>>>>>> for
>>>>>>>>>>>> GSoC2019.
>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> moon
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>
>>> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>>>>> physics
>>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>>>>> GSOC
>>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>>>>> months,
>>>>>>>>>>>>> using it at my job. But I have found out only one ticket
>>> (front-end
>>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
>>> have any
>>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
>>> don't
>>>>>>>> have
>>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>>>>> mentor
>>>>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
>>> 1.5
>>>>>>>>>> year.
>>>>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>>>>> necessary.
>>>>>>>>>>>>> Really fond of databases and highload. Also I have experience
>>> with
>>>>>>>>>> some
>>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>>> http://madeng.net/>>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>>> http://madeng.net/>>
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
> 


Reply via email to