Re: Zeppelin in GSOC 2019

Морковкин , Василий Владимирович Wed, 06 Mar 2019 14:04:47 -0800

Thank you for such a detailed feedback!
I am definitely interested to work on the workflow implementation with you
Xun Liu! Could you become a mentor in GSOC with this task?
Some front-end work is not a problem at all.
I'm ready to work at least 30 hours per week in the summer, while now I'd
like to take some smaller tasks to take a closer look at existing codebase
and to get familiar with your development workflow. Do you have such tasks
on mind?


ср, 6 мар. 2019 г. в 05:23, Xun Liu <[email protected]>:

> Hi Vasiliy Morkovkin
>
> I said my thoughts on workflow,
> https://issues.apache.org/jira/browse/ZEPPELIN-4018
>
> Because there are more than 20 interpreters in zeppelin,
> Data analysts can be used to do a variety of data development,
> A lot of data development is interdependent. For example,
> the development of machine learning algorithms requires relying on spark
> to preprocess data, and so on.
>
> Now open source workflow software has Azkaban, airflow,
> Azkaban is relatively simple and has been used to meet most scenarios, and
> our company is using it.
> Airflow looks complicated and I have not used it.
> In fact, I have previously implemented workflow workflow for notes and
> paragraphs in zeppelin via azkaban.
> https://youtu.be/2r6q-2Tq7hk?t=33
>
> However, I think zeppelin should have built-in workflow capabilities.
> Instead of relying on external software to schedule notes in zeppelin for
> the following reasons:
> 1. Now that we have upgraded from the data processing era to the algorithm
> era,
> After zeppelin has its own workflow, it will form a data loop.
>
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control.
> Instead of handing the algorithm to other teams(or software) to do the
> workflow.
>
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> If you are interested in workflow(ZEPPELIN-4018),
> I am willing to work with you to complete all system design and code
> development work.
>
> :-)
>
> 在 2019年3月6日，上午9:32，Jeff Zhang <[email protected]> 写道：
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi Basil,
>
> Thanks for your interest in zeppelin, here's my comments about the tickets
> you interested.
>
> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651
>    This involves 2 sides of work: frontend and backend:
>    In frontend, we should use arrow js to handle the table data, include
> display it and processing it (such as aggregation)
>    In backend, we should use arrow for each language, and allow them to
> exchange data in the same process. And use arrow IPC to exchange data
> across processes.
>   Overall, this is a pretty large task. If you really want to do, I would
> suggest you to just take part of it.
>
> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994
>    Regarding model serving, I don't have clear picture about this. Others
> can comment on this.
>
> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018
>    Job scheduling is pretty important for zeppelin, I would make this as
> the highest priority for zeppelin among these tickets. airflow is one
> option, but I am open to other solutions. First we need to figure out how
> user schedule jobs in zeppelin, then choose the right framework. It would
> also involves some frontend work
>
> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857
>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> supported yet. It won't be a big project for GSOC IMO.
>
> 5. OLAP.
>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> Zeppelin can support it very well. But we could create specific interpreter
> for OLAP engine if their native api perform better than jdbc. Another thing
> I can think of improving OLAP is visualization, although Zeppelin already
> support some built-in visualization, there's still some visualization
> missing. We could provide more.
>
> 6. Auto-completions.
>   We have already support ipython[1]  in zeppelin which provide almost the
> same auto-completion like jupyter. But it lacks for accessing python api
> doc. This is also pretty important for python users IMO. SQL is another
> popular language in Zeppelin, but it also doesn't provide good
> code-completion experience, we can do better as well.
>
> 7. Notifications.
>   I think notification can be integrated into job scheduling. Notification
> can be sent when job is failed/succeed.
>
>
> Let us know which jira you are more interested, and also please consider
> how much time you can spent on this. Again, we are very appreciated your
> interest on zeppelin and look forward your contribution.
>
>
> [1]
>
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>
>
>
> Морковкин, Василий Владимирович <[email protected]> 于2019年3月6日周三
> 上午7:41写道：
>
> Thank you for your replies! I've checked existing set of issues and found
> several curious ones:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very
> nice
> way to increase analytical processing performance using Arrow project;
> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
> regardless of ZeppelinServer sounds quite intriguing too. Although there is
> much to think about;
> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
> https://airflow.apache.org/ seems to be useful in implementing complex
> execution workflows.
> Those tasks are global and intriguing, requiring complex architectural
> solutions.
> Also I've probably found the ticket which is suitable for me to get
> involved into the project:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
> Are there any "low hanging fruits"?
>
> And I have several ideas on my own. Some of them might be not relevant due
> to the vision of the project or other reasons. Just ideas:
> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
> logical to add more integrations with existing OLAP solutions like Pinot,
> ClickHouse and Druid. Currently I've found integration only with Kylin;
> - Better autocompletion. Jupyter offers not only a list of already
> initialized variables, but also quick access to documentation. It's
> convenient;
> - Notifications. Some colleagues would have appreciated the notifications
> service, which sends you messages (via mail, Slack bot or something else)
> indicating that your long-running paragraphs has completed.
>
> Feedback is very appreciated :)
>
> It would be wonderful if someone agreed to sacrifice his time and become a
> mentor in GSOC program!
>
> ----------------------------------------
> Best regards, Basil Morkovkin.
>
>
> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <[email protected]>:
>
> Hello,
>
> I've confirmed I could add more issues for GSOC. Can you explain what you
> would like to contribute to? I can add more issues
>
> JL
>
> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <[email protected]> wrote:
>
> Hi, Vasiliy Morkovkin
>
> Welcome to the zeppelin community! :-)
>
> 在 2019年3月5日，上午11:49，Jongyoul Lee <[email protected]> 写道：
>
> Thanks for contacting Zeppelin with your interest.
>
> I added FE topics for GSOC because FE is the most urgent issue I have
> thought about. We always encourage to contribute Zeppelin with several
> topics including your idea.
>
> Please describe something more.
>
> Thanks.
> JL
>
> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <[email protected]> wrote:
>
> Hi,
>
> Great to see your interest to project. Thanks!
> Looks like we need volunteers for a mentor and some backend subject
>
> for
>
> GSoC2019.
> Any ideas?
>
> Best,
> moon
>
>
>
>
> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> [email protected]>
> wrote:
>
> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>
> physics
>
> and technology and eager to contribute to Zeppelin in context of
>
> GSOC
>
> 2019. I've become a real fan of Zeppelin over the past couple of
>
> months,
>
> using it at my job. But I have found out only one ticket (front-end
> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> ideas for new features or improvements in Zeppelin, but you don't
>
> have
>
> enough hands on them. It would be wonderful if anyone agreed to
>
> mentor
>
> these ideas within GSOC :)
> Currently I am in a position of Scala developer (back-end) for 1.5
>
> year.
>
> I also can write in Java or Python without any problems if
>
> necessary.
>
> Really fond of databases and highload. Also I have experience with
>
> some
>
> other great Apache projects like Cassandra, Kafka and Spark.
>
> Best regards, Basil Morkovkin.
>
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

Re: Zeppelin in GSOC 2019

Reply via email to