Quick update: I'v started to hack around a PoC for this feature here https://github.com/bzz/zeppelin-multiprocess-interpreter-poc , it's very early stage but please feel free to check it out and provide any feedback.
On enabling\disabling this feature: if it will be easy to provide such choice (which we are not sure yet) then there is no reason not to implement 4. On Sun, Jan 18, 2015 at 9:46 AM, moon soo Lee <[email protected]> wrote: > Alexander, thanks for interest to implementing this feature. > > I think there're some alternatives to enabling/disabling this feature > > 1) Run all interpreter in separate process > 2) Let user select which interpreter will be run in separate process > 3) Let interpreter choose, it is going to run in separate process or not. > 4) Let user select, but interpreter provide default selection. > > What do you guys think? To me, 4) which gives user flexibility as well as > simplicity. > > > And i can easily think some possible improvements after the first step, > > a) Run interpreter process not only in local machine but on remote machine > (will it be helpful for anything work with Yarn?) > b) Option to keep separate process running when zeppelin terminates, so > zeppelin can reconnect when it restarted. > c) Implement remote interpreter in different language. (eg, pyspark) > > So, I think if IPC implementation can have a possibility to RPC and various > language support, then it'll be better for future. > > > Best, > moon > > > > On Thu, Jan 15, 2015 at 8:58 PM, Alex B <[email protected]> wrote: > > > I think I'd like to volunteer to implement this feature. > > > > My the perspective is: we solve 2 immediate problems and at the end > > have a maturing enough interpreter API to be able so add pyspark > > support. > > > > Immediate problem we solve are: > > - A multiple interpreters running right now mix stdout/err > > - in case of JVMs there also is a Classloader collision problem, > > which does not allow SparkSQL to work with spark 1.2 > > > > Suggested solution: > > To separate each interpreter to a it's own process. > > > > This means bringing to the codebase things like: > > - API for managing the runtime state of that process > > - then IPC implementation itself (thrift?) > > - basic ClassLoading for JVM based interpreters > > > > Please, let me know if there is something I have missed here! > > > > -- > > Kind regards, > > Alexander > > > > > On 13 Jan 2015, at 20:26, moon soo Lee <[email protected]> wrote: > > > > > > Hi guys, > > > > > > I'm bringing an issue https://github.com/NFLabs/zeppelin/issues/278 to > > this > > > mailing list for discussion. > > > > > > Zeppelin creates interpreter instance with each separate classloader to > > > avoid interfere(dependency conflictions, singletons, static members) > with > > > other interpreter instance. It was working well until now but i can see > > > some limitations. > > > > > > a) When multiple interpreter instances are running concurrently, they > can > > > not avoid interfere of their stdin/stdout/stderr. > > > b) When interpreter's one dependency is designed(== hardcoded) to use > > > Application classloader, it won't work within Zeppelin because Zeppelin > > > loads interpreter's dependency jars in it's threadcontext classloader, > > not > > > Application classloader. > > > > > > Run interpreter in separate process is the solution i can think. > > > In detail, because of interpreter is abstracted by it's public methods, > > > everything will be simply done if we can call those method remotely by > > some > > > sort of RPC mechanism. > > > > > > Therefore > > > > > > a) Main entry point and run script to run interpreter in separate > process > > > b) RPC mechanism between Zeppelin and separate interpreter process > > > c) Option to enabling/disabling this capability. > > > > > > are major tasks i'm thinking. > > > > > > What do you guys think? Please share if there're some idea. > > > > > > Best, > > > moon > > >
