Alexander, thanks for interest to implementing this feature.

I think there're some alternatives to enabling/disabling this feature

1) Run all interpreter in separate process
2) Let user select which interpreter will be run in separate process
3) Let interpreter choose, it is going to run in separate process or not.
4) Let user select, but interpreter provide default selection.

What do you guys think? To me, 4) which gives user flexibility as well as
simplicity.


And i can easily think some possible improvements after the first step,

a) Run interpreter process not only in local machine but on remote machine
(will it be helpful for anything work with Yarn?)
b) Option to keep separate process running when zeppelin terminates, so
zeppelin can reconnect when it restarted.
c) Implement remote interpreter in different language. (eg, pyspark)

So, I think if IPC implementation can have a possibility to RPC and various
language support, then it'll be better for future.


Best,
moon



On Thu, Jan 15, 2015 at 8:58 PM, Alex B <[email protected]> wrote:

> I think I'd like to volunteer to implement this feature.
>
> My the perspective is: we solve 2 immediate problems and at the end
> have a maturing enough interpreter API to be able so add pyspark
> support.
>
> Immediate problem we solve are:
>  - A multiple interpreters running right now mix stdout/err
>  - in case of JVMs there also is a  Classloader collision problem,
> which does not allow SparkSQL to work with spark 1.2
>
> Suggested solution:
> To separate each interpreter to a it's own process.
>
> This means bringing to the codebase things like:
>  - API for managing the runtime state of that process
>  - then IPC implementation itself (thrift?)
>  - basic ClassLoading for JVM based interpreters
>
> Please, let me know if there is something I have missed here!
>
> --
> Kind regards,
> Alexander
>
> > On 13 Jan 2015, at 20:26, moon soo Lee <[email protected]> wrote:
> >
> > Hi guys,
> >
> > I'm bringing an issue https://github.com/NFLabs/zeppelin/issues/278 to
> this
> > mailing list for discussion.
> >
> > Zeppelin creates interpreter instance with each separate classloader to
> > avoid interfere(dependency conflictions, singletons, static members) with
> > other interpreter instance. It was working well until now but i can see
> > some limitations.
> >
> > a) When multiple interpreter instances are running concurrently, they can
> > not avoid interfere of their stdin/stdout/stderr.
> > b) When interpreter's one dependency is designed(== hardcoded) to use
> > Application classloader, it won't work within Zeppelin because Zeppelin
> > loads interpreter's dependency jars in it's threadcontext classloader,
> not
> > Application classloader.
> >
> > Run interpreter in separate process is the solution i can think.
> > In detail, because of interpreter is abstracted by it's public methods,
> > everything will be simply done if we can call those method remotely by
> some
> > sort of RPC mechanism.
> >
> > Therefore
> >
> > a) Main entry point and run script to run interpreter in separate process
> > b) RPC mechanism between Zeppelin and separate interpreter process
> > c) Option to enabling/disabling this capability.
> >
> > are major tasks i'm thinking.
> >
> > What do you guys think? Please share if there're some idea.
> >
> > Best,
> > moon
>

Reply via email to