Hi Luciano,

I've also got a lot of questions about "Productize the notebook" every time
i meet users use Zeppelin in their work.

I think it's actually about two different problems that Zeppelin need to
address.

*1) Provide way that interactive notebook becomes part of production data
pipeline.*

Although Zeppelin does have quite convenient cron-like scheduler for each
Note, built-in cron scheduler is not ready for serious use in the
production. Because it lacks some features like actions after success/fail,
fault-tolerance, history, and so on. I think community is working on
improving it, and it's going to take some time.
 Meanwhile, any external enterprise level job scheduler can run Note or
Paragraph via REST api. But we don't have any guide and examples for it,
what are the REST APIs user can use for this purpose, and how to use them
in various cases (e.g. with authentication on, dynamic form parameters,
etc). I think a lot of things need to be improved to make zeppelin easier
to be part of production pipeline.

*2) Provide stable way of run spark paragraphs.*

Another barrier of using notebook in production pipeline is Scala REPL in
SparkInterpreter. SparkInterpreter uses Scala REPL to provide interactive
scala session and Scala REPL will eventually hit OOME as it compiles and
runs statements. Current workaround in zeppelin is cron-scheduler inside of
notebook has checkbox that can restart the Note after scheduler runs it.
Of course that option does not apply when external scheduler runs job
through REST api.

I think what Luciano suggesting, "Export Spark Paragraph as Spark
application" is interesting. If Spark Paragraphs can be easily packaged
into jar (spark application) that can be one of way to address 1) and 2).
In case of user already have stable way to schedule spark application jar.

Actually, Flink interactive shell works in similar way internally as far as
i know. i.e. package compiled class into jar and submit.

One idea for prototyping is,
How about make a interpreter inside of spark interpreter group, say it's
%spark.build or some better name.

And if user runs some command like

%spark.build
package

then it builds spark application jar based on spark paragraph in the Note.
I think it can be the simplest user interface for the prototype.

Thanks,
moon

On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <jer...@objectadjective.com>
wrote:

> Luciano, I think this would be a terrific feature. I've heard the exact
> same workflow you've describe in all of the research we've done.
>
> ...........................
>
> Jeremy Anderson
> Founder, Object Adjective
> 415.493.8489
> jer...@objectadjective.com
> objectadjective.com <http://about.me/jeremyanderson>
>
>
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed.
>
> On 16 September 2016 at 12:19, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
> > While talking with a few different users, I have been seeing the use case
> > of using iterative development in Notebooks or Spark Shell and then
> copying
> > and pasting the final solution to a formal application repeating itself
> > very often.
> >
> > I was wondering if an "Export Spark Paragraphs as a Spark Application
> > (jar)" would be a feature that Zeppelin community would think it's
> useful.
> > But keep in mind there are some limitation here : we would be constrained
> > to Spark related paragraphs, etc...  but even so, I think there are
> > multiple scenarios where I see that the ability to have an application
> that
> > directly runs on Spark to be very useful.
> >
> > If the community is interested, let's use this thread to discuss any
> > specific requirements or suggestions that others might have, and after a
> > few days I would like to start prototyping this functionality.
> >
> > Thoughts ?
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>

Reply via email to