RE: [DISCUSS] Update Roadmap

Joel Van Veluwen Wed, 23 Mar 2016 16:11:22 -0700

Hi Nikolay,

I raised this with MapR and there doesn’t appear to be plans to add Zeppelin to 
5.1


https://community.mapr.com/message/40332

We are deploying it manually and everything is pretty stable – but it will vary 
depending on your environment.

Cheers,


Joel Van Veluwen
QUANTIUM
Level 25, 8 Chifley
8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8224 8981
M: +61 403 153 265
F: +61 2 9292 6444

W: quantium.com.au<http://www.quantium.com.au>

________________________________

linkedin.com/company/quantium<http://www.linkedin.com/company/quantium>
facebook.com/QuantiumAustralia<http://www.facebook.com/QuantiumAustralia>
twitter.com/QuantiumAU<http://www.twitter.com/QuantiumAU>

The contents of this email, including attachments, may be confidential 
information. If you are not the intended recipient, any use, disclosure or 
copying of the information is unauthorised. If you have received this email in 
error, we would be grateful if you would notify us immediately by email reply, 
phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from 
your system.

From: Nikolay Voronchikhin [mailto:nvoronchik...@gmail.com]
Sent: Tuesday, 22 March 2016 11:39 AM
To: users@zeppelin.incubator.apache.org
Subject: Re: [DISCUSS] Update Roadmap

Hi Zeppelin Users and Developers,

Do you know if MapR will be adding Zeppelin to its roadmap for the next version 
after MapR 5.1?

We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell, 
PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL notebook.
We are looking for an Apache Project that focuses on a Drill Notebook UI that 
performs better than the Drill Web Console UI itself.

Sincerely,
Nikolay Voronchikhin
Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco
https://www.linkedin.com/in/nvoronchikhin
E-mail: nvoronchik...@gmail.com<mailto:nvoronchik...@gmail.com>
Mobile: 951-288-2778


On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary 
<rconl...@gmail.com<mailto:rconl...@gmail.com>> wrote:
Dear All,

I think direction setting is important for Enterprise readiness. I have a 
little bit of an overview of Ambari Views, which is very similar in nature to 
Zeppelin. Please let me explain:

Hive View - interacts with Hive
Pig View - interacts with Pig
Workflow Designer - interacts with Oozie

We have a very similar architecture in Zeppelin where we interact with these 
systems through Interpreters. The usage will also be similar, as both with 
interact with Hadoop clusters or in some cases Spark with Yarn on HDFS. Our 
priorities should include:

- Design & implement for multi-tenancy
- Auditability from Data/State and Lineage perspective
- Ability to share Notebooks/Data/State across users, preferably through 
SparkContext sharing
- Security between Zeppelin and the other systems, not limited to Spark through 
Kerberos. (@Rick +1)

I will share an initial draft of the thoughts I have in mind, in the next 
couple of days.

Thanks,
Rohit.



On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:
Shabeel, thanks for the feedback about rest api and custom id. that might help 
avoid multiple rest api calls.

Thanks everyone for valuable feedback. Looks like all we're going to the same 
direction. I have updated wiki.
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
Please take a look.

I'm sure there're many missing details in this roadmap. I must say something 
not on this roadmap doesn't mean community is not working on or can't be 
included in the Zeppelin. Roadmap represents more like community interest and 
overall direction.
We're not changing roadmap everyday, but that doesn't mean roadmap is set in 
stone and never be changed. We can improve it continuously.

Please feel free to fork the this mail thread for any further discussion on 
specific subject. (e.g. job scheduling)

Thanks,
moon

On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed 
<shabeels...@gmail.com<mailto:shabeels...@gmail.com>> wrote:
Also we need better rest api support for creating and fetching the notebooks 
and paragraphs.
for example if I can set custom defined notebookid and paragraphid , we can 
avoid multiple rest api calls.

http://localhost:8080/#/notebook/<notebookid>/paragraph/<paragraphid>?asIframe
should return me error if notebook or paragraph deos not exists.

and while creating notebook or paragraph I should be able to mention my custom 
ids.

Regards
Shabeel

On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang 
<wangzhong....@gmail.com<mailto:wangzhong....@gmail.com>> wrote:
+1 on @rick. quality is really important... I am still encountering bugs 
consistently

On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV 
<tejasrivas...@gmail.com<mailto:tejasrivas...@gmail.com>> wrote:
+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim 
<bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote:
I see in the Enterprise section that multi-tenancy will be included, will this 
have user impersonation too? In this way, the user executing will be the user 
owning the process.

On Mar 1, 2016, at 12:51 AM, Shabeel Syed 
<shabeels...@gmail.com<mailto:shabeels...@gmail.com>> wrote:

+1

Hi Tamas,
   Pluggable external visualization is really a GREAT feature to have. I'm 
looking forward to this :)

Regards
Shabeel

On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
<tamas.szur...@odigeo.com<mailto:tamas.szur...@odigeo.com>> wrote:
Hey,

Really promising roadmap.

I'd only push more visualization options. I agree built in visualization is 
needed with limited charting options but I think we also need somehow 'inject' 
external js visualizations also.


For scheduling Zeppelin notebooks  we use 
https://github.com/airbnb/airflow<https://github.com/airbnb/airflow> through 
the job rest api. It's an enterprise ready and very robust solution right now.

Tamas

On 1 March 2016 at 09:12, Eran Witkon 
<eranwit...@gmail.com<mailto:eranwit...@gmail.com>> wrote:
One point to clarify, I don't want to suggest Oozie in specific, I want to 
think about which features we develop and which ones we integrate external, 
preferred Apache, technology? We don't think about building our own storage 
services so why build our own scheduler?
Eran
On Tue, 1 Mar 2016 at 09:49 moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:
@Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
Now I can see a lot of demands around enterprise level job scheduling. Either 
external or built-in, I completely agree having enterprise level job scheduling 
support on the roadmap.
ZEPPELIN-137<https://issues.apache.org/jira/browse/ZEPPELIN-137>, 
ZEPPELIN-531<https://issues.apache.org/jira/browse/ZEPPELIN-531> are related 
issues i can find in our JIRA.

@Vinayak
Regarding importing notebook from github, Zeppelin has pluggable notebook 
storage layer (see related 
package<https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
 So, github notebook sync can be implemented easily.

@Shabeel
Right, we need better manage management to prevent such OOM.
And i think table is one of the most frequently used way of displaying data. So 
definitely, we'll need more features like filter, sort, etc.
After this roadmap discussion, discussion for the next release will follow. 
Then we'll get idea when those features will be available.

@Prasad
Thanks for mentioning HA and DR. They're really important subject for 
enterprise use. Definitely Zeppelin will need to address them.
And displaying meta information of notebook on top level page is good idea.

It's really great to hear many opinions and ideas.
And thanks @Rick for sharing valuable view to Zeppelin project.

Thanks,
moon


On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz 
<rah...@gmail.com<mailto:rah...@gmail.com>> wrote:
Hi,
For one, I know that there is rudimentary scheduling built into Zeppelin 
already (at least I fixed a bug in the test for a scheduling feature a few 
months ago).
But another point is, that Zeppelin should also focus on quality, 
reproduceability and portability.
Although this doesn't offer exciting new features, it would make development 
much easier.
Cross-platform testability, Tests that pass when run sequentially, 
compatibility with Firefox, and many more open issues that make it so much 
harder to enhance Zeppelin and add features should be addressed soon, 
preferably before more features are added. Already Zeppelin is suffering - in 
my opinion - from quite a lot of feature creep, and we should avoid putting in 
the kitchen sink, at the cost of quality and maintainability. Instead 
modularity (ZEPPELIN-533 in particular) should be targeted.
Oozie, in my opinion, is a dead end - it may de-facto still be in use on many 
clusters, but it's not getting the love it needs, and I wouldn't bet on it, 
when it comes to integrating scheduling. Instead, any external tool should be 
able to use the REST-API to trigger executions, if you want external scheduling.
So, in conclusion, if we take Moon's list as a list of descending priorities, I 
fully agree, under the condition that code quality is included as a subset of 
enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is what 
we really want) with user and group rights assignment on the notebook level. We 
probably also need Knox-integration (ODP-Members looking at integrating 
Zeppelin should consider contributing this), and integration of something like 
Spree (https://github.com/hammerlab/spree) to be able to profile jobs.
I'm hopeful that soon I can resume contributing some quality-oriented code, to 
drive this "necessary evil" forward ;)

On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder 
<sourav.mazumde...@gmail.com<mailto:sourav.mazumde...@gmail.com>> wrote:
I do agree with Vinayak. It need not be coupled with Oozie.
Rather one should be able to call it from any scheduler typically used in 
enterprise level. May be support for BPML.
I believe the existing ability to call/execute a Zeppelin Notebook or a 
specific paragraph within a notebook using REST API should take care of this 
requirement to some extent.
Regards,
Sourav

On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal 
<vinayakagrawa...@gmail.com<mailto:vinayakagrawa...@gmail.com>> wrote:
@Eran Witkon,
Thanks for the suggestion Eran. I concur with your thought.
If Zepplin can be integrated with oozie, that would be wonderful. Users will 
also be able to leverage their Oozie skills.
This would be promising for now.
However, in the future Hadoop might not necessarily be installed in Spark 
Cluster and Oozie (since its installs with Hadoop Distribution) might not be 
available.
So perhaps we should give a thought about this feature for the future. Should 
it depend on oozie or should Zeppelin have its owns scheduling?
As Benjamin has iterated, Databrick notebook has this as a core notebook 
feature.

Also, would anybody give any suggestions regarding "sync with github" feature?
-Exporting notebook to Github
-Importing notebook from Github

Thanks
Vinayak


On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon 
<eranwit...@gmail.com<mailto:eranwit...@gmail.com>> wrote:
@Vinayak Agrawal I would suggest adding the ability to connect zeppelin to 
existing scheduling tools\workflow tools such as  https://oozie.apache.org/. 
this requires betters hooks and status reporting but doesn't make zeppeling and 
ETL\scheduler tool by itself/


On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal 
<vinayakagrawa...@gmail.com<mailto:vinayakagrawa...@gmail.com>> wrote:
Moon,
The new roadmap looks very promising. I am very happy to see security in the 
list.
I have some suggestions regarding Enterprise Ready features:

1. Job Scheduler - Can this be improved?
Currently the scheduler can be used with Cron expression or a pre-set time. But 
in an enterprise solution, a notebook might be one piece of the workflow. Can 
we look towards the functionality of scheduling notebook's based on other 
notebooks finishing their job successfully?
This requirement would arise in any ETL workflow, where all the downstream 
users wait for the ETL notebook to finish successfully. Only after that, other 
business oriented notebooks can be executed.
2. Importing a notebook - Is there a current requirement or future plan to 
implement a feature that allows import-notebook-from-github? This would allow 
users to share notebooks seamlessly.
Thanks
Vinayak

On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:
Zhong Wang,
Right, Folder support would be quite useful. Thanks for the opinion.
Hope i can finish the work 
pr-190<https://github.com/apache/incubator-zeppelin/pull/190>.

Sourav,
Regarding concurrent running, Zeppelin doesn't have limitation of run 
paragraph/query concurrently. Interpreter can implement it's own scheduling 
policy. For example, SparkSQL interpreter and ShellInterpreter can already run 
paragraph/query concurrently.

SparkInterpreter is implemented with FIFO scheduler considering nature of scala 
compiler. That's why user can not run multiple paragraph concurrently when they 
work with SparkInterpreter.
But as Zhong Wang mentioned, pr-703 enables each notebook will have separate 
scala compiler so paragraphs run concurrently, while they're in different 
notebooks.
Thanks for the feedback!

Best,
moon
On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang 
<wangzhong....@gmail.com<mailto:wangzhong....@gmail.com>> wrote:
Sourav: I think this newly merged PR can help you 
https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder 
<sourav.mazumde...@gmail.com<mailto:sourav.mazumde...@gmail.com>> wrote:
Hi Moon,
This looks great.
My only suggestion would be to include a PR/feature - Support for Running 
Concurrent paragraphs/queries in Zeppelin.

Right now if more than one user tries to run paragraphs in multiple notebooks 
concurrently through a single Zeppelin instance (and single interpreter 
instance) the performance is very slow. It is obvious that the queue gets built 
up within the zeppelin process and interpreter process in that scenario as the 
time taken to move the status from start to pending and pending to running is 
very high compared to the actual running time of a paragraph.
Without this the multi tenancy support would be meaningless as no one can 
practically use it in a situation where multiple users are trying to connect to 
the same instance of Zeppelin (and the related interpreter). A possible 
solution would be to spawn separate instance of the same interpreter at every 
notebook/user level.
Regards,
Sourav
On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:
Hi Zeppelin users and developers,

The roadmap we have published at
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
is almost 9 month old, and it doesn't reflect where the community goes anymore. 
It's time to update.

Based on mailing list, jira issues, pullrequests, feedbacks from users, 
conferences and meetings, I could summarize the major interest of users and 
developers in 7 categories. Enterprise ready, Usability improvement, 
Pluggability, Documentation, Backend integration, Notebook storage, and 
Visualization.

And i could list related subjects under each categories.

  *   Enterprise ready

     *   Authentication

        *   Shiro authentication 
ZEPPELIN-548<https://issues.apache.org/jira/browse/ZEPPELIN-548>

     *   Authorization

        *   Notebook authorization 
PR-681<https://github.com/apache/incubator-zeppelin/pull/681>

     *   Security
     *   Multi-tenancy
     *   Stability

  *   Usability Improvement

     *   UX improvement
     *   Better Table data support

        *   Download data as csv, etc 
PR-725<https://github.com/apache/incubator-zeppelin/pull/725>, 
PR-714<https://github.com/apache/incubator-zeppelin/pull/714>, 
PR-6<https://github.com/apache/incubator-zeppelin/pull/6>, 
PR-89<https://github.com/apache/incubator-zeppelin/pull/89>

        *   Featureful table data display (pagenation, etc)

  *   Pluggability 
ZEPPELIN-533<https://issues.apache.org/jira/browse/ZEPPELIN-533>

     *   Pluggable visualization

     *   Dynamic Interpreter, notebook, visualization loading

     *   Repository and registry for pluggable components

  *   Improve documentation

     *   Improve contents and readability
     *   more tutorials, examples

  *   Interpreter

     *   Generic JDBC Interpreter
     *   (spark)R Interpreter
     *   Cluster manager for interpreter 
(Proposal<https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
     *   more interpreters

  *   Notebook storage

     *   Versioning 
ZEPPELIN-540<http://issues.apache.org/jira/browse/ZEPPELIN-540>
     *   more notebook storages

  *   Visualization

     *   More visualizations 
PR-152<https://github.com/apache/incubator-zeppelin/pull/152>, 
PR-728<https://github.com/apache/incubator-zeppelin/pull/728>, 
PR-336<https://github.com/apache/incubator-zeppelin/pull/336>, 
PR-321<https://github.com/apache/incubator-zeppelin/pull/321>

     *   Customize graph (show/hide label, color, etc)
It will help anyone quickly get overall interest of project and the direction. 
And based on this roadmap, we can discuss and re-define the next release 0.6.0 
scope and it's schedule.

What do you think? Any feedback would be appreciated.

Thanks,
moon




--
Vinayak Agrawal

"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson



--
Vinayak Agrawal
Big Data Analytics
IBM
"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson

RE: [DISCUSS] Update Roadmap

Reply via email to