Re: [DISCUSS] Spark-Kernel Incubator Proposal

Alexander Bezzubov Thu, 12 Nov 2015 21:14:53 -0800

Hi,

it looks pretty interesting, especially a part about integration with
Zeppelin as another Scala interpreter implementation.


AFAIK there was a discussion on including Spark-Kernel to spark core
https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
possibility of becoming a sub-project one.

Would be interesting to know as indeed it looks very aligned with Apache
Spark.

--
Alex

On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:

> Just a quick (or maybe not :) ) question...
>
> Given the tight coupling to the Apache Spark project, were there any
> considerations or discussions with the Spark community regarding including
> the Spark-Kernel functionality outright in Spark, or the possibility of
> becoming a subproject?
>
> I'm just curious. I don't think an answer one way or another would
> necessarily block incubation.
>
> -Taylor
>
> > On Nov 12, 2015, at 7:17 PM, da...@fallside.com wrote:
> >
> > Hello, we would like to start a discussion on accepting the Spark-Kernel,
> > a mechanism for applications to interactively and remotely access Apache
> > Spark, into the Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/SparkKernelProposal, and it is
> appended
> > to this email.
> >
> > We are looking for additional mentors to help with this project, and we
> > would much appreciate your guidance and advice.
> >
> > Thank-you in advance,
> > David Fallside
> >
> >
> >
> > = Spark-Kernel Proposal =
> >
> > == Abstract ==
> > Spark-Kernel provides applications with a mechanism to interactively and
> > remotely access Apache Spark.
> >
> > == Proposal ==
> > The Spark-Kernel enables interactive applications to access Apache Spark
> > clusters. More specifically:
> > * Applications can send code-snippets and libraries for execution by
> Spark
> > * Applications can be deployed separately from Spark clusters and
> > communicate with the Spark-Kernel using the provided Spark-Kernel client
> > * Execution results and streaming data can be sent back to calling
> > applications
> > * Applications no longer have to be network connected to the workers on a
> > Spark cluster because the Spark-Kernel acts as each application’s proxy
> > * Work has started on enabling Spark-Kernel to support languages in
> > addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> > (with SparkSQL)
> >
> > == Background & Rationale ==
> > Apache Spark provides applications with a fast and general purpose
> > distributed computing engine that supports static and streaming data,
> > tabular and graph representations of data, and an extensive library of
> > machine learning libraries. Consequently, a wide variety of applications
> > will be written for Spark and there will be interactive applications that
> > require relatively frequent function evaluations, and batch-oriented
> > applications that require one-shot or only occasional evaluation.
> >
> > Apache Spark provides two mechanisms for applications to connect with
> > Spark. The primary mechanism launches applications on Spark clusters
> using
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html); this
> > requires developers to bundle their application code plus any
> dependencies
> > into JAR files, and then submit them to Spark. A second mechanism is an
> > ODBC/JDBC API
> > (
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> > which enables applications to issue SQL queries against SparkSQL.
> >
> > Our experience when developing interactive applications, such as analytic
> > applications and Jupyter Notebooks, to run against Spark was that the
> > spark-submit mechanism was overly cumbersome and slow (requiring JAR
> > creation and forking processes to run spark-submit), and the SQL
> interface
> > was too limiting and did not offer easy access to components other than
> > SparkSQL, such as streaming. The most promising mechanism provided by
> > Apache Spark was the command-line shell
> > (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> > which enabled us to execute code snippets and dynamically control the
> > tasks submitted to  a Spark cluster. Spark does not provide the
> > command-line shell as a consumable service but it provided us with the
> > starting point from which we developed the Spark-Kernel.
> >
> > == Current Status ==
> > Spark-Kernel was first developed by a small team working on an
> > internal-IBM Spark-related project in July 2014. In recognition of its
> > likely general utility to Spark users and developers, in November 2014
> the
> > Spark-Kernel project was moved to GitHub and made available under the
> > Apache License V2.
> >
> > == Meritocracy ==
> > The current developers are familiar with the meritocratic open source
> > development process at Apache. As the project has gathered interest at
> > GitHub the developers have actively started a process to invite
> additional
> > developers into the project, and we have at least one new developer who
> is
> > ready to contribute code to the project.
> >
> > == Community ==
> > We started building a community around the Spark-Kernel project when we
> > moved it to GitHub about one year ago. Since then we have grown to about
> > 70 people, and there are regular requests and suggestions from the
> > community. We believe that providing Apache Spark application developers
> > with a general-purpose and interactive API holds a lot of community
> > potential, especially considering possible tie-in’s with the Jupyter and
> > data science community.
> >
> > == Core Developers ==
> > The core developers of the project are currently all from IBM, from the
> > IBM Emerging Technology team and from IBM’s recently formed Spark
> > Technology Center.
> >
> > == Alignment ==
> > Apache, as the home of Apache Spark, is the most natural home for the
> > Spark-Kernel project because it was designed to work with Apache Spark
> and
> > to provide capabilities for interactive applications and data science
> > tools not provided by Spark itself.
> >
> > The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because
> > it uses the Jupyter protocol for communications, and so Jupyter Notebooks
> > can directly use the Spark-Kernel as a kernel for communicating with
> > Apache Spark. However, we believe that the Spark-Kernel provides a
> > general-purpose mechanism enabling a wider variety of applications than
> > just Notebooks to access Spark, and so the Spark-Kernel’s greatest
> > affinity is with Apache and Apache Spark.
> >
> > == Known Risks ==
> > === Orphaned products ===
> > We believe the Spark-Kernel project has a low-risk of abandonment due to
> > interest in its continuing existence from several parties. More
> > specifically, the Spark-Kernel provides a capability that is not provided
> > by Apache Spark today but it enables a wider range of applications to
> > leverage Spark. For example, IBM uses (and is considering) the
> > Spark-Kernel in several offerings including its IBM Analytics for Apache
> > Spark product in the Bluemix Cloud. There are also a couple of other
> > commercial users who are using or considering its use in their offerings.
> > Furthermore, Jupyter Notebooks are used by data scientists and Spark is
> > gaining popularity as an analytic engine for them. Jupyter Notebooks are
> > very easily enabled with the Spark-Kernel and so there is another
> > constituency for it.
> >
> > === Inexperience with Open Source ===
> > The Spark-Kernel project has been running as an open-source project
> > (albeit with only IBM committers) for the past several months. The
> project
> > has an active issue tracker and due to the interest indicated by the
> > nature and volume of requests and comments, the team has publicly stated
> > it is beginning to build a process so they can accept third-party
> > contributions to the project.
> >
> > === Relationships with Other Apache Products ===
> > The Spark-Kernel has a clear affinity with the Apache Spark project
> > because it is designed to  provide capabilities for interactive
> > applications and data science tools not provided by Spark itself. The
> > Spark-Kernel can be a back-end for the Zeppelin project currently
> > incubating at Apache. There is interest from the Spark-Kernel community
> to
> > develop this capability and an experimental branch has been started.
> >
> > === Homogeneous Developers ===
> > The current group of developers working on Spark-Kernel are all from IBM
> > although the group is in the process of expanding its membership to
> > include members of the GitHub community who are not from IBM and who have
> > been active in the Spark-Kernel community in GutHub.
> >
> > === Reliance on Salaried Developers ===
> > The initial committers are full-time employees at IBM although not all
> > work on the project full-time.
> >
> > === Excessive Fascination with the Apache Brand ===
> > We believe the Spark-Kernel benefits Apache Spark application developers,
> > and we are interested in an Apache Spark-Kernel project to benefit these
> > developers by engaging a larger community, facilitating closer ties with
> > the existing Spark project, and yes, gaining more visibility for the
> > Spark-Kernel as a solution.
> >
> > We have recently become aware that the project name “Spark-Kernel” may be
> > interpreted as having an association with an Apache project. If the
> > project is accepted by Apache, we suggest the project name remains the
> > same, but otherwise we will change it to one that does not imply any
> > Apache association.
> >
> > === Documentation ===
> > Comprehensive documentation including “Getting Started”, API
> > specifications and a Roadmap are available from the GitHub project, see
> > https://github.com/ibm-et/spark-kernel/wiki.
> >
> > === Initial Source ===
> > The source code resides at https://github.com/ibm-et/spark-kernel.
> >
> > === External Dependencies ===
> > The Spark-Kernel depends upon a number of Apache projects:
> > * Spark
> > * Hadoop
> > * Ivy
> > * Commons
> >
> > The Spark-Kernel also depends upon a number of other open source
> projects:
> > * JeroMQ (LGPL with Static Linking Exception,
> > http://zeromq.org/area:licensing)
> > * Akka (MIT)
> > * JOpt Simple (MIT)
> > * Spring Framework Core (Apache v2)
> > * Play (Apache v2)
> > * SLF4J (MIT)
> > * Scala
> > * Scalatest (Apache v2)
> > * Scalactic (Apache v2)
> > * Mockito (MIT)
> >
> > == Required Resources ==
> > Developer and user mailing lists
> > * priv...@spark-kernel.incubator.apache.org (with moderated
> subscriptions)
> > * comm...@spark-kernel.incubator.apache.org
> > * d...@spark-kernel.incubator.apache.org
> > * us...@spark-kernel.incubator.apache.org
> >
> > A git repository:
> > https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git
> >
> > A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL
> >
> > == Initial Committers ==
> > The initial list of committers is:
> > * Leugim Bustelo (g...@bustelos.com)
> > * Jakob Odersky (joder...@gmail.com)
> > * Luciano Resende (lrese...@apache.org)
> > * Robert Senkbeil (chip.senkb...@gmail.com)
> > * Corey Stubbs (cas5...@gmail.com)
> > * Miao Wang (wm...@hotmail.com)
> > * Sean Welleck (welle...@gmail.com)
> >
> > === Affiliations ===
> > All of the initial committers are employed by IBM.
> >
> > == Sponsors ==
> > === Champion ===
> > * Sam Ruby (IBM)
> >
> > === Nominated Mentors ===
> > * Luciano Resende
> >
> > We wish to recruit additional mentors during incubation.
> >
> > === Sponsoring Entity ===
> > The Apache Incubator.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
--
Kind regards,
Alexander.

Re: [DISCUSS] Spark-Kernel Incubator Proposal

Reply via email to