Hi David & All, The 'spark-kernel/torii' is a "good to have" tool. Pardon me, I am not the judge in any way.After going through this thread and the referred links, it seems, by giving it a decent publicity in Apache Spark (may be provide link, etc.),would be sufficient enough for its survival and evolution, instead going through the entire 'apache incubation'. I am not undermining the incubation any way. But for the prep work needed (license/trademark, project rename, package rename) to make 'spark-kernel' incubation eligible and once in incubation, it needs to keep up with the progress of Apache Zeppelin (which is incubating already).
Oh! that also makes me ask, can apache zeppelin & spark-kernel/torri be combined into one ?! Either way, count me in for any help required with 'spark-kernel/torii'. Thanking you.With RegardsSree On Monday, November 30, 2015 4:13 PM, Julien Le Dem <jul...@dremio.com> wrote: Sorry for the late reply. FYI there is an opensource project called torii already: https://vestorly.github.io/torii/ Whether there is a trademark or not, I'd recommend a name that does not collide with another project. On Wed, Nov 25, 2015 at 9:00 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > Thanks for all your feedback, we have updated the proposal with the > following : > > - Renamed the project to Torii > - Added new mentors that volunteered during the discussion > > Below is an updated proposal, which I will be calling for a vote shortly. > > = Torii = > > == Abstract == > Torii provides applications with a mechanism to interactively and remotely > access Apache Spark. > > == Proposal == > Torii enables interactive applications to access Apache Spark clusters. > More specifically: > * Applications can send code-snippets and libraries for execution by Spark > * Applications can be deployed separately from Spark clusters and > communicate with the Torii using the provided Torii client > * Execution results and streaming data can be sent back to calling > applications > * Applications no longer have to be network connected to the workers on a > Spark cluster because the Torii acts as each application’s proxy > * Work has started on enabling Torii to support languages in addition to > Scala, namely Python (with PySpark), R (with SparkR), and SQL (with > SparkSQL) > > == Background & Rationale == > Apache Spark provides applications with a fast and general purpose > distributed computing engine that supports static and streaming data, > tabular and graph representations of data, and an extensive library of > machine learning libraries. Consequently, a wide variety of applications > will be written for Spark and there will be interactive applications that > require relatively frequent function evaluations, and batch-oriented > applications that require one-shot or only occasional evaluation. > > Apache Spark provides two mechanisms for applications to connect with > Spark. The primary mechanism launches applications on Spark clusters using > spark-submit ( > http://spark.apache.org/docs/latest/submitting-applications.html); this > requires developers to bundle their application code plus any dependencies > into JAR files, and then submit them to Spark. A second mechanism is an > ODBC/JDBC API ( > > http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine > ) > which enables applications to issue SQL queries against SparkSQL. > > Our experience when developing interactive applications, such as analytic > applications integrated with Notebooks, to run against Spark was that the > spark-submit mechanism was overly cumbersome and slow (requiring JAR > creation and forking processes to run spark-submit), and the SQL interface > was too limiting and did not offer easy access to components other than > SparkSQL, such as streaming. The most promising mechanism provided by > Apache Spark was the command-line shell ( > http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell > ) > which enabled us to execute code snippets and dynamically control the tasks > submitted to a Spark cluster. Spark does not provide the command-line > shell as a consumable service but it provided us with the starting point > from which we developed Torii. > > == Current Status == > Torii was first developed by a small team working on an internal-IBM > Spark-related project in July 2014. In recognition of its likely general > utility to Spark users and developers, in November 2014 the Torii project > was moved to GitHub and made available under the Apache License V2. > > == Meritocracy == > The current developers are familiar with the meritocratic open source > development process at Apache. As the project has gathered interest at > GitHub the developers have actively started a process to invite additional > developers into the project, and we have at least one new developer who is > ready to contribute code to the project. > > == Community == > We started building a community around Torii project when we moved it to > GitHub about one year ago. Since then we have grown to about 70 people, and > there are regular requests and suggestions from the community. We believe > that providing Apache Spark application developers with a general-purpose > and interactive API holds a lot of community potential, especially > considering possible tie-in’s with Notebooks and data science community. > > == Core Developers == > The core developers of the project are currently all from IBM, from the IBM > Emerging Technology team and from IBM’s recently formed Spark Technology > Center. > > == Alignment == > Apache, as the home of Apache Spark, is the most natural home for the Torii > project because it was designed to work with Apache Spark and to provide > capabilities for interactive applications and data science tools not > provided by Spark itself. > > The Torii also has an affinity with Jupyter (jupyter.org) because it uses > the Jupyter protocol for communications, and so Jupyter Notebooks can > directly use the Torii as a kernel for communicating with Apache Spark. > However, we believe that the Torii provides a general-purpose mechanism > enabling a wider variety of applications than just Notebooks to access > Spark, and so the Torii’s greatest affinity is with Apache and Apache > Spark. > > == Known Risks == > > === Orphaned products === > We believe the Torii project has a low-risk of abandonment due to interest > in its continuing existence from several parties. More specifically, the > Torii provides a capability that is not provided by Apache Spark today but > it enables a wider range of applications to leverage Spark. For example, > IBM uses (and is considering) the Torii in several offerings including its > IBM Analytics for Apache Spark product in the Bluemix Cloud. There are also > a couple of other commercial users who are using or considering its use in > their offerings. Furthermore, Jupyter Notebooks are used by data scientists > and Spark is gaining popularity as an analytic engine for them. Jupyter > Notebooks are very easily enabled with the Torii and so there is another > constituency for it. > > === Inexperience with Open Source === > The Torii project has been running as an open-source project (albeit with > only IBM committers) for the past several months. The project has an active > issue tracker and due to the interest indicated by the nature and volume of > requests and comments, the team has publicly stated it is beginning to > build a process so they can accept third-party contributions to the > project. > > === Relationships with Other Apache Products === > The Torii has a clear affinity with the Apache Spark project because it is > designed to provide capabilities for interactive applications and data > science tools not provided by Spark itself. The Torii can be a back-end for > the Zeppelin project currently incubating at Apache. There is interest from > the Torii community to develop this capability and an experimental branch > has been started. > > === Homogeneous Developers === > The current group of developers working on Torii are all from IBM although > the group is in the process of expanding its membership to include members > of the GitHub community who are not from IBM and who have been active in > the Torii community in GutHub. > > === Reliance on Salaried Developers === > The initial committers are full-time employees at IBM although not all work > on the project full-time. > > === Excessive Fascination with the Apache Brand === > We believe the Torii benefits Apache Spark application developers, and we > are interested in an Apache Torii project to benefit these developers by > engaging a larger community, facilitating closer ties with the existing > Spark project, and yes, gaining more visibility for the Torii as a > solution. > > === Documentation === > Comprehensive documentation including “Getting Started”, API specifications > and a Roadmap are available from the GitHub project, see > https://github.com/ibm-et/Torii/wiki. > > === Initial Source === > The source code resides at https://github.com/ibm-et/Torii. > > === External Dependencies === > The Torii depends upon a number of Apache projects: > * Spark > * Hadoop > * Ivy > * Commons > > The Torii also depends upon a number of other open source projects: > * ZeroMQ (LGPL with Static Linking Exception, > http://zeromq.org/area:licensing) > * Akka (MIT) > * JOpt Simple (MIT) > * Spring Framework Core (Apache v2) > * Play (Apache v2) > * SLF4J (MIT) > * Scala > * Scalatest (Apache v2) > * Scalactic (Apache v2) > * Mockito (MIT) > > == Required Resources == > > === Mailing lists === > > * priv...@torii.incubator.apache.org (with moderated subscriptions) > * comm...@torii.incubator.apache.org > * d...@torii.incubator.apache.org > > === Git Repository === > > * https://git-wip-us.apache.org/repos/asf/incubator-torii.git > > === Issue Tracking === > > * A JIRA issue tracker: https://issues.apache.org/jira/browse/TORII > > == Initial Committers == > > * Leugim Bustelo (lbustelo AT us DOT ibm DOT com) > * Jakob Odersky (odersky AT us DOT ibm DOT com) > * Luciano Resende (lresende AT apache DOT org) > * Robert Senkbeil (rcsenkbe AT us DOT ibm DOT com) > * Corey Stubbs (cstubbs AT us DOT ibm DOT com) > * Miao Wang (wangmiao AT us DOT ibm DOT com) > * Sean Welleck (swelleck AT us DOT ibm DOT com) > > === Affiliations === > All of the initial committers are employed by IBM. > > == Sponsors == > > === Champion === > * Sam Ruby (rubys AT apache DOT org) > > === Nominated Mentors === > * Luciano Resende (lresende AT apache DOT org) > * Reynold Xin (rxin AT apache DOT org) > * Hitesh Shah (hitesh AT apache DOT org) > * Julien Le Dem (julien AT apache DOT org) > > === Sponsoring Entity === > > The Apache Incubator. > > > > -- > Luciano Resende > http://people.apache.org/~lresende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > -- Julien