Re: Zeppelin in GSOC 2019

Xun Liu Tue, 05 Mar 2019 18:24:48 -0800

Hi Vasiliy Morkovkin

I said my thoughts on workflow, 
https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018>


Because there are more than 20 interpreters in zeppelin, 
Data analysts can be used to do a variety of data development,
A lot of data development is interdependent. For example, 
the development of machine learning algorithms requires relying on spark to 
preprocess data, and so on.

Now open source workflow software has Azkaban, airflow,
Azkaban is relatively simple and has been used to meet most scenarios, and our 
company is using it.
Airflow looks complicated and I have not used it.
In fact, I have previously implemented workflow workflow for notes and 
paragraphs in zeppelin via azkaban.
https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> 

However, I think zeppelin should have built-in workflow capabilities. 
Instead of relying on external software to schedule notes in zeppelin for the 
following reasons:
1. Now that we have upgraded from the data processing era to the algorithm era,
After zeppelin has its own workflow, it will form a data loop.

2. zeppelin's powerful interactive processing capabilities help algorithm 
engineers improve productivity and work.
Zeppelin should give the algorithm engineer more direct control.
Instead of handing the algorithm to other teams(or software) to do the workflow.

3. zeppelin knows more about the processing status of data than Azkaban and 
airflow.
So the built-in workflow will have better performance, user experience and 
control.

If you are interested in workflow(ZEPPELIN-4018), 
I am willing to work with you to complete all system design and code 
development work.

:-)

> 在 2019年3月6日，上午9:32，Jeff Zhang <zjf...@gmail.com> 写道：
> 
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi Basil,
> 
> Thanks for your interest in zeppelin, here's my comments about the tickets
> you interested.
> 
> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651
>    This involves 2 sides of work: frontend and backend:
>    In frontend, we should use arrow js to handle the table data, include
> display it and processing it (such as aggregation)
>    In backend, we should use arrow for each language, and allow them to
> exchange data in the same process. And use arrow IPC to exchange data
> across processes.
>   Overall, this is a pretty large task. If you really want to do, I would
> suggest you to just take part of it.
> 
> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994
>    Regarding model serving, I don't have clear picture about this. Others
> can comment on this.
> 
> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018
>    Job scheduling is pretty important for zeppelin, I would make this as
> the highest priority for zeppelin among these tickets. airflow is one
> option, but I am open to other solutions. First we need to figure out how
> user schedule jobs in zeppelin, then choose the right framework. It would
> also involves some frontend work
> 
> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857
>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> supported yet. It won't be a big project for GSOC IMO.
> 
> 5. OLAP.
>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> Zeppelin can support it very well. But we could create specific interpreter
> for OLAP engine if their native api perform better than jdbc. Another thing
> I can think of improving OLAP is visualization, although Zeppelin already
> support some built-in visualization, there's still some visualization
> missing. We could provide more.
> 
> 6. Auto-completions.
>   We have already support ipython[1]  in zeppelin which provide almost the
> same auto-completion like jupyter. But it lacks for accessing python api
> doc. This is also pretty important for python users IMO. SQL is another
> popular language in Zeppelin, but it also doesn't provide good
> code-completion experience, we can do better as well.
> 
> 7. Notifications.
>   I think notification can be integrated into job scheduling. Notification
> can be sent when job is failed/succeed.
> 
> 
> Let us know which jira you are more interested, and also please consider
> how much time you can spent on this. Again, we are very appreciated your
> interest on zeppelin and look forward your contribution.
> 
> 
> [1]
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> 
> 
> 
> Морковкин, Василий Владимирович <morkovkin...@phystech.edu> 于2019年3月6日周三
> 上午7:41写道：
> 
>> Thank you for your replies! I've checked existing set of issues and found
>> several curious ones:
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very
>> nice
>> way to increase analytical processing performance using Arrow project;
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
>> regardless of ZeppelinServer sounds quite intriguing too. Although there is
>> much to think about;
>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
>> https://airflow.apache.org/ seems to be useful in implementing complex
>> execution workflows.
>> Those tasks are global and intriguing, requiring complex architectural
>> solutions.
>> Also I've probably found the ticket which is suitable for me to get
>> involved into the project:
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
>> Are there any "low hanging fruits"?
>> 
>> And I have several ideas on my own. Some of them might be not relevant due
>> to the vision of the project or other reasons. Just ideas:
>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>> logical to add more integrations with existing OLAP solutions like Pinot,
>> ClickHouse and Druid. Currently I've found integration only with Kylin;
>> - Better autocompletion. Jupyter offers not only a list of already
>> initialized variables, but also quick access to documentation. It's
>> convenient;
>> - Notifications. Some colleagues would have appreciated the notifications
>> service, which sends you messages (via mail, Slack bot or something else)
>> indicating that your long-running paragraphs has completed.
>> 
>> Feedback is very appreciated :)
>> 
>> It would be wonderful if someone agreed to sacrifice his time and become a
>> mentor in GSOC program!
>> 
>> ----------------------------------------
>> Best regards, Basil Morkovkin.
>> 
>> 
>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com>:
>> 
>>> Hello,
>>> 
>>> I've confirmed I could add more issues for GSOC. Can you explain what you
>>> would like to contribute to? I can add more issues
>>> 
>>> JL
>>> 
>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com> wrote:
>>> 
>>>> Hi, Vasiliy Morkovkin
>>>> 
>>>> Welcome to the zeppelin community! :-)
>>>> 
>>>>> 在 2019年3月5日，上午11:49，Jongyoul Lee <jongy...@gmail.com> 写道：
>>>>> 
>>>>> Thanks for contacting Zeppelin with your interest.
>>>>> 
>>>>> I added FE topics for GSOC because FE is the most urgent issue I have
>>>>> thought about. We always encourage to contribute Zeppelin with several
>>>>> topics including your idea.
>>>>> 
>>>>> Please describe something more.
>>>>> 
>>>>> Thanks.
>>>>> JL
>>>>> 
>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Great to see your interest to project. Thanks!
>>>>>> Looks like we need volunteers for a mentor and some backend subject
>> for
>>>>>> GSoC2019.
>>>>>> Any ideas?
>>>>>> 
>>>>>> Best,
>>>>>> moon
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>> morkovkin...@phystech.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>> physics
>>>>>>> and technology and eager to contribute to Zeppelin in context of
>> GSOC
>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>> months,
>>>>>>> using it at my job. But I have found out only one ticket (front-end
>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>> have
>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>> mentor
>>>>>>> these ideas within GSOC :)
>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
>>>> year.
>>>>>>> I also can write in Java or Python without any problems if
>> necessary.
>>>>>>> Really fond of databases and highload. Also I have experience with
>>>> some
>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>> 
>>>>>>> Best regards, Basil Morkovkin.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>> http://madeng.net
>>>> 
>>>> 
>>> 
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>> 
>> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: Zeppelin in GSOC 2019

Reply via email to