Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It
makes it easy to impose dependencies on the execution order of tasks. Check
this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ . It
reproduces the flow which is shown in the attached picture.
Xun Liu, It would be great to clarify whether you agree to be a mentor
exactly within GSOC, or without it? :)

----------------------------------------
Best regards, Basil Morkovkin

чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com>:

>
> Thanks Liu for taking over this, I will help review the design.
>
> Xun Liu <neliu...@163.com> 于2019年3月7日周四 下午4:05写道:
>
>> Hi Vasiliy Morkovkin
>>
>> Thank you very much for your willingness to implement this feature of
>> workflow.
>> I will work with you with the highest priority.
>> I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> .
>> Please set the Watcher in ZEPPELIN-4018.
>> This way you can get notification messages for document updates in a
>> timely manner.
>>
>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>> If you need it, you can email me at liuxun...@gmail.com <mailto:
>> liuxun...@gmail.com> , I will reply you the fastest.
>> Do you think this kind of cooperation is OK?
>>
>>
>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!
>>
>> :-)
>>
>> > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin...@phystech.edu> 写道:
>> >
>> > Thank you for such a detailed feedback!
>> > I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
>> > Some front-end work is not a problem at all.
>> > I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>> >
>> > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto:
>> neliu...@163.com>>:
>> > Hi Vasiliy Morkovkin
>> >
>> > I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>> >
>> > Because there are more than 20 interpreters in zeppelin,
>> > Data analysts can be used to do a variety of data development,
>> > A lot of data development is interdependent. For example,
>> > the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>> >
>> > Now open source workflow software has Azkaban, airflow,
>> > Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company is using it.
>> > Airflow looks complicated and I have not used it.
>> > In fact, I have previously implemented workflow workflow for notes and
>> paragraphs in zeppelin via azkaban.
>> > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>> >
>> > However, I think zeppelin should have built-in workflow capabilities.
>> > Instead of relying on external software to schedule notes in zeppelin
>> for the following reasons:
>> > 1. Now that we have upgraded from the data processing era to the
>> algorithm era,
>> > After zeppelin has its own workflow, it will form a data loop.
>> >
>> > 2. zeppelin's powerful interactive processing capabilities help
>> algorithm engineers improve productivity and work.
>> > Zeppelin should give the algorithm engineer more direct control.
>> > Instead of handing the algorithm to other teams(or software) to do the
>> workflow.
>> >
>> > 3. zeppelin knows more about the processing status of data than Azkaban
>> and airflow.
>> > So the built-in workflow will have better performance, user experience
>> and control.
>> >
>> > If you are interested in workflow(ZEPPELIN-4018),
>> > I am willing to work with you to complete all system design and code
>> development work.
>> >
>> > :-)
>> >
>> >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:
>> zjf...@gmail.com>> 写道:
>> >>
>> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil,
>> >>
>> >> Thanks for your interest in zeppelin, here's my comments about the
>> tickets
>> >> you interested.
>> >>
>> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>
>> >>    This involves 2 sides of work: frontend and backend:
>> >>    In frontend, we should use arrow js to handle the table data,
>> include
>> >> display it and processing it (such as aggregation)
>> >>    In backend, we should use arrow for each language, and allow them to
>> >> exchange data in the same process. And use arrow IPC to exchange data
>> >> across processes.
>> >>   Overall, this is a pretty large task. If you really want to do, I
>> would
>> >> suggest you to just take part of it.
>> >>
>> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>> >>    Regarding model serving, I don't have clear picture about this.
>> Others
>> >> can comment on this.
>> >>
>> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>> >>    Job scheduling is pretty important for zeppelin, I would make this
>> as
>> >> the highest priority for zeppelin among these tickets. airflow is one
>> >> option, but I am open to other solutions. First we need to figure out
>> how
>> >> user schedule jobs in zeppelin, then choose the right framework. It
>> would
>> >> also involves some frontend work
>> >>
>> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>
>> >>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
>> >> supported yet. It won't be a big project for GSOC IMO.
>> >>
>> >> 5. OLAP.
>> >>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>> >> Zeppelin can support it very well. But we could create specific
>> interpreter
>> >> for OLAP engine if their native api perform better than jdbc. Another
>> thing
>> >> I can think of improving OLAP is visualization, although Zeppelin
>> already
>> >> support some built-in visualization, there's still some visualization
>> >> missing. We could provide more.
>> >>
>> >> 6. Auto-completions.
>> >>   We have already support ipython[1]  in zeppelin which provide almost
>> the
>> >> same auto-completion like jupyter. But it lacks for accessing python
>> api
>> >> doc. This is also pretty important for python users IMO. SQL is another
>> >> popular language in Zeppelin, but it also doesn't provide good
>> >> code-completion experience, we can do better as well.
>> >>
>> >> 7. Notifications.
>> >>   I think notification can be integrated into job scheduling.
>> Notification
>> >> can be sent when job is failed/succeed.
>> >>
>> >>
>> >> Let us know which jira you are more interested, and also please
>> consider
>> >> how much time you can spent on this. Again, we are very appreciated
>> your
>> >> interest on zeppelin and look forward your contribution.
>> >>
>> >>
>> >> [1]
>> >>
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> >
>> >>
>> >>
>> >>
>> >> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto:
>> morkovkin...@phystech.edu>> 于2019年3月6日周三
>> >> 上午7:41写道:
>> >>
>> >>> Thank you for your replies! I've checked existing set of issues and
>> found
>> >>> several curious ones:
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very
>> >>> nice
>> >>> way to increase analytical processing performance using Arrow project;
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models
>> >>> regardless of ZeppelinServer sounds quite intriguing too. Although
>> there is
>> >>> much to think about;
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance
>> >>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to
>> be useful in implementing complex
>> >>> execution workflows.
>> >>> Those tasks are global and intriguing, requiring complex architectural
>> >>> solutions.
>> >>> Also I've probably found the ticket which is suitable for me to get
>> >>> involved into the project:
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think?
>> >>> Are there any "low hanging fruits"?
>> >>>
>> >>> And I have several ideas on my own. Some of them might be not
>> relevant due
>> >>> to the vision of the project or other reasons. Just ideas:
>> >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>> >>> logical to add more integrations with existing OLAP solutions like
>> Pinot,
>> >>> ClickHouse and Druid. Currently I've found integration only with
>> Kylin;
>> >>> - Better autocompletion. Jupyter offers not only a list of already
>> >>> initialized variables, but also quick access to documentation. It's
>> >>> convenient;
>> >>> - Notifications. Some colleagues would have appreciated the
>> notifications
>> >>> service, which sends you messages (via mail, Slack bot or something
>> else)
>> >>> indicating that your long-running paragraphs has completed.
>> >>>
>> >>> Feedback is very appreciated :)
>> >>>
>> >>> It would be wonderful if someone agreed to sacrifice his time and
>> become a
>> >>> mentor in GSOC program!
>> >>>
>> >>> ----------------------------------------
>> >>> Best regards, Basil Morkovkin.
>> >>>
>> >>>
>> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com <mailto:
>> jongy...@gmail.com>>:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> I've confirmed I could add more issues for GSOC. Can you explain
>> what you
>> >>>> would like to contribute to? I can add more issues
>> >>>>
>> >>>> JL
>> >>>>
>> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto:
>> neliu...@163.com>> wrote:
>> >>>>
>> >>>>> Hi, Vasiliy Morkovkin
>> >>>>>
>> >>>>> Welcome to the zeppelin community! :-)
>> >>>>>
>> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto:
>> jongy...@gmail.com>> 写道:
>> >>>>>>
>> >>>>>> Thanks for contacting Zeppelin with your interest.
>> >>>>>>
>> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>> have
>> >>>>>> thought about. We always encourage to contribute Zeppelin with
>> several
>> >>>>>> topics including your idea.
>> >>>>>>
>> >>>>>> Please describe something more.
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> JL
>> >>>>>>
>> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org
>> <mailto:m...@apache.org>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> Great to see your interest to project. Thanks!
>> >>>>>>> Looks like we need volunteers for a mentor and some backend
>> subject
>> >>> for
>> >>>>>>> GSoC2019.
>> >>>>>>> Any ideas?
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> moon
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>> >>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>> >>>>> physics
>> >>>>>>>> and technology and eager to contribute to Zeppelin in context of
>> >>> GSOC
>> >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>> >>>>> months,
>> >>>>>>>> using it at my job. But I have found out only one ticket
>> (front-end
>> >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have
>> any
>> >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>> >>> have
>> >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>> >>> mentor
>> >>>>>>>> these ideas within GSOC :)
>> >>>>>>>> Currently I am in a position of Scala developer (back-end) for
>> 1.5
>> >>>>> year.
>> >>>>>>>> I also can write in Java or Python without any problems if
>> >>> necessary.
>> >>>>>>>> Really fond of databases and highload. Also I have experience
>> with
>> >>>>> some
>> >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>> >>>>>>>>
>> >>>>>>>> Best regards, Basil Morkovkin.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> 이종열, Jongyoul Lee, 李宗烈
>> >>>>>> http://madeng.net <http://madeng.net/>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> 이종열, Jongyoul Lee, 李宗烈
>> >>>> http://madeng.net <http://madeng.net/>
>> >>>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Best Regards
>> >>
>> >> Jeff Zhang
>> >
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>
  • Re: Zeppelin ... moon soo Lee
    • Re: Zepp... Jongyoul Lee
      • Re: ... Xun Liu
        • ... Jongyoul Lee
          • ... Морковкин , Василий Владимирович
            • ... Jeff Zhang
              • ... Xun Liu
              • ... Морковкин , Василий Владимирович
              • ... Xun Liu
              • ... Jeff Zhang
              • ... Морковкин , Василий Владимирович
              • ... Xun Liu
              • ... Xun Liu
              • ... Jeff Zhang
              • ... Xun Liu
              • ... Felix Cheung
              • ... Xun Liu

Reply via email to