Re: [VOTE] Superset Proposal for Apache Incubator

moon soo Lee Tue, 25 Apr 2017 12:49:10 -0700

+1 (non-binding)

On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <hashut...@apache.org>
wrote:


> +1 (binding)
>
> Thanks,
> Ashutosh
>
> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com> wrote:
>
> > +1 binding
> >
> > Love to see Superset to be new incubator project.
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.f...@gmail.com> wrote:
> >
> >> Dear Apache Incubator Community,
> >>
> >> We have updated the Superset proposal
> >> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) for
> >>
> >> Apache Incubation with an additional mentor (Luke Han -
> >> luke....@apache.org),
> >> and would like to start a vote thread for acceptance into the incubator.
> >>
> >> Our team is excited to share Superset with the Apache community and we
> >> hope
> >> for the your continued support!
> >>
> >> Cheers,
> >> Jeff & the Superset Team
> >>
> >>
> >>
> >>
> >> = Superset =
> >>
> >> == Abstract ==
> >> Superset is an enterprise-ready web application for data exploration,
> data
> >> visualization and dashboarding.
> >>
> >> == Proposal ==
> >> Superset is business intelligence (BI) software that helps modern
> >> organizations visualize and interact with their data. Superset enables
> >> users explore data from a variety of databases, assemble beautiful
> >> dashboards and share their findings.  Superset works neatly with all
> >> modern
> >> SQL-speaking databases, and integrates with Druid.io to provide
> real-time,
> >> interactive, blazing fast data access to large datasets.
> >>
> >> == Background ==
> >> Data is mission critical. To succeed in this era, organizations need to
> >> provide low-friction, intuitive and interactive access to data. It is
> >> paramount for knowledge workers to be capable of answering their own
> >> questions by querying, exploring and visualizing data.
> >>
> >> The entire business intelligence industry has pivoted from a model of
> >> centralized top-down platforms driven by IT organizations to
> self-service
> >> analytics and agile workflows by any user.  This shift unblocks
> >> centralized
> >> service bottlenecks for creating data visualizations while also creating
> >> an
> >> environment that is iterative and fast-moving.  This means that business
> >> intelligence software must also be easy and delightful to use.
> >> Self-service analytics doesn’t mean that admin and governance features
> are
> >> not needed.
> >> Modern BI tools provide fine-grain access controls and auditing
> >> capabilities to understand how data is being used.  Superset is a
> solution
> >> that delivers on all of these vectors.
> >>
> >> The technology stack is also constantly morphing - vendors are
> struggling
> >> to provide cheap, quick and easy solutions to access data.  Business
> >> intelligence users are finding existing solutions lacking as these
> >> software
> >> products either disregard or react slowly to recent game-changing
> >> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js,
> >> React.js and iPython’s Jupyter for instance.
> >>
> >> == Rationale ==
> >> Business intelligence is more relevant today than at any other point in
> >> history.  Organizations are currently very limited in options for open
> >> source data visualization solutions, especially solutions that are both
> >> self-service and enterprise-ready.  Every company informing their
> >> decisions
> >> with data needs a BI tool.
> >>
> >> We believe that Superset will be a strong compliment to existing Apache
> >> Software Foundation technologies by offering scalable user interactions
> to
> >> distributed storage and computation solutions.  Users will often find
> that
> >> Superset can act as a catalyst for tooling that can visualize the
> >> byproduct
> >> of data and computation infrastructure.
> >>
> >> Superset has many key design elements that help fill a gap in current
> >> solutions for organizations:
> >>  * Easy, low friction access to data through a simple, web-based data
> >> exploration interface.  Composing charts and dashboards are intuitive.
> >> Eliminating the need to write code or SQL empowers anyone to use it.
> >>  * Access to a wide array of rich, interactive data visualization types.
> >>  * Enterprise-ready: Integration with different authentication
> mechanisms
> >> and granular permissions centered around actions and data access.
> >>  * Realtime & fast: Superset provides realtime analytics at the speed of
> >> thought on very large datasets when integrated with Druid.io.
> >>  * Broad data access: Consume data out of any SQL-speaking relational
> >> database.
> >>  * Extensible: Can be extended to talk to many noSQL databases like
> Apache
> >> Drill, Elastic Search, and other popular database engines.
> >>  * Fast loading dashboards with configurable web-scale caching.
> >>  * Plug-in framework that enables organizations to build custom
> analytical
> >> applications with new UI/UX interfaces.
> >>  * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users
> >> with more flexibility.  SQL Lab integrates with the visualization engine
> >> seamlessly.
> >>
> >> == Initial Goals ==
> >> The initial goals of the Superset project are several-fold:
> >>  * Move the existing codebase to Apache and integrate with the Apache
> >> development process.
> >>  * Redesign the user interface and interaction model for creating
> >> visualizations/dashboards and connecting to data sources
> >>  * Build robust support for security and governance of the tool
> including
> >> popular authorization modules (including Apache Ranger and Apache
> Sentry)
> >> and a more sophisticated permissions system
> >>  * Grow the extensibility of the project both in terms of enhanced
> >> connectivity to NoSQL-based data sources and creating a plug-in
> framework
> >> that enables organizations to build custom analytical applications which
> >> require a new UI/UX
> >>
> >> == Current Status ==
> >> By many standards, Superset is already a successful open source project.
> >> As
> >> of March 2017, Superset is officially used in production at about a
> dozen
> >> companies, has received contributions from over one hundred contributors
> >> on
> >> Github, 1500+ forks, and 12k+ stars.
> >>
> >> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
> >> significant contributions, and expressed their commitment to the
> project.
> >> The product is feature complete and has been viable for months. It
> already
> >> serves as the main interface for consuming data at many companies of
> >> different sizes.
> >>
> >> While the product is usable, there’s room for improvement across the
> >> board,
> >> starting with providing a smoother user experience around content
> >> creation,
> >> making sure all features work out-of-the-box on more platforms and
> >> databases, providing better user training guides and videos, having a
> >> predictable release process, and increasing the overall quality of the
> >> Superset releases.
> >>
> >> === Meritocracy ===
> >> We plan to invest in supporting a meritocracy. We will discuss the
> >> requirements in an open forum. Several companies have expressed interest
> >> in
> >> this project, and we intend to invite additional developers to
> >> participate.
> >> We will encourage and monitor community participation so that privileges
> >> can be extended to those that contribute.
> >>
> >> === Community ===
> >> The need for an enterprise-ready data visualization and exploration
> >> platform in the open source community is tremendous.  While Superset is
> >> fairly well known, recognized and used within the Druid.io community,
> >> adoption is currently limited outside of that niche. There is a huge
> >> opportunity to grow the community to hundreds if not thousands of
> >> organizations, and we are hoping that embracing “the Apache way” will
> >> accelerate the growth of our community.
> >>
> >> We have already been active at seeking and inviting contributions, and
> are
> >> planning to scale the project by investing time and growing the support
> >> structure to grow the community.
> >>
> >> === Core Developers ===
> >> The initial committers for Superset include experienced full stack,
> >> front-end and data engineers:
> >>  * Maxime Beauchemin (Airbnb)
> >>  * Alanna Scott (Airbnb)
> >>  * Bogdan Kyryliuk (Airbnb)
> >>  * Vera Liu  (Airbnb)
> >>  * Jeff Feng (Airbnb)
> >>  * Ashutosh Chauhan (Hortonworks)
> >>  * Nishant Bangarwa (Hortonworks)
> >>  * Slim Bouguerra (Hortonworks)
> >>  * Priyank Shah (Hortonworks)
> >>  * Sriharsha Chintalapani (Hortonworks)
> >>  * Daniel Dai (Hortonworks)
> >>
> >> We realize that additional employer diversity is needed, and we will
> work
> >> aggressively to recruit developers from additional companies.
> >>
> >> === Alignment ===
> >> The initial committers strongly believe that a system for interactive
> >> visualization of data will gain broader adoption as an open source,
> >> community driven project, where the community can contribute not only to
> >> the core components, but also to a growing collection of connectors,
> >> visualizations and improving integration a all potential data sources.
> >> Superset already integrates closely with Apache Hive, the Hive
> metastore,
> >> as well as most SQL-speaking databases found in modern data ecosystems.
> >>
> >> == Known Risks ==
> >>
> >> === Orphaned Products ===
> >> Superset is a vital component for both visualizing, accessing and
> >> democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
> >> component of the DataFlow product offering.  Thus, the risk of the
> project
> >> being orphaned is relatively low.  The project could be at risk if
> Airbnb
> >> changes their approach for democratizing data or if Hortonworks changes
> >> their strategy in the market.  In such an event, the committers plan to
> >> continue working on the project on their own time, thought the progress
> >> will likely be slower.  We plan to mitigate this risk by recruiting
> >> additional committers.
> >>
> >> === Inexperience with Open Source ===
> >> The initial committers include veteran Apache members (committers and
> PPMC
> >> members) and other developers who have varying degrees of experience
> with
> >> open source projects. All have been involved with source code that has
> >> been
> >> released under an open source license, and several also have experience
> >> developing code with an open source development process.
> >>
> >> === Homogenous Developers ===
> >> The initial committers are employed by Airbnb Inc. and Hortonworks. We
> are
> >> committed to recruiting additional committers from other companies.
> >>
> >> === Reliance on Salaried Developers ===
> >> It is expected that Superset development will occur on both salaried
> time
> >> and on volunteer time, after hours. The majority of initial committers
> are
> >> paid by their employer to contribute to this project. However, they are
> >> all
> >> passionate about the project, and we are confident that the project will
> >> continue even if no salaried developers contribute to the project. We
> are
> >> committed to recruiting additional committers including non-salaried
> >> developers.
> >>
> >> === Relationships with Other Apache Products ===
> >> To the knowledge of the Initial Committers, there are no direct
> >> competitors
> >> to Superset within the Apache Software Foundation.  That said, Apache
> >> Zeppelin is an indirect competitor, but it solves a different use case.
> >>
> >> Apache Zeppelin is a web-based notebook that enables interactive data
> >> analytics. It enables the creation of beautiful data-driven, interactive
> >> and collaborative documents with SQL, Scala and more.  Although a user
> can
> >> create data visualizations using this project, it leverages a notebook
> >> style user interfaces and it is geared towards the Spark community where
> >> Scala and SQL co-exist
> >>
> >> We look forward to collaborating with those communities, as well as
> other
> >> Apache communities.
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >> Superset is solving two huge challenges:
> >> The challenge of enabling every knowledge worker to make data informed
> >> decisions, particularly those who are not deeply skilled at writing SQL.
> >> The challenge of visualizing huge amounts of data interactively and in
> >> real-time
> >>
> >> Superset was first developed as a data visualization solution for
> Druid.io
> >> as a way to visualize billions of rows of data.  Since then, usage of
> >> Superset has expanded to address data visualization use cases across SQL
> >> speaking data sources as well.
> >>
> >> Our rationale for developing Superset as an Apache project is detailed
> in
> >> the Rationale Section.  We believe that the Apache brand and community
> >> process will help us attract more contributors to this project, and help
> >> grow the footprint of the project through usage at other organizations
> and
> >> within other applications.  Establishing consensus among users and
> >> developers will result in a more valuable tool for everyone.
> >>
> >> == Documentation ==
> >> References to further reading material:
> >>  * [[http://airbnb.io/superset/|Superset Documentation]]
> >>  * [[
> >> https://medium.com/airbnb-engineering/caravel-airbnb-s-data-
> >> exploration-platform-15a72aa610e5#.npqmmbu25|Blog
> >> Post:  Superset: Airbnb’s Data Exploration Platform]]
> >>  * [[
> >> https://medium.com/airbnb-engineering/superset-scaling-data-
> >> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog
> >> Post:  Superset: Scaling Data Access & Visual Insights at Airbnb]]
> >>
> >> == Initial Source ==
> >> The origin of the proposed code base can be found at
> >> https://github.com/airbnb/superset.  The code base is primarily in
> >> Python.
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >> We do not expect any complications for the submission of the Superset
> code
> >> base.  Our code is already in Github and there is only a single code
> base.
> >>
> >> == External Dependencies ==
> >> List of Python packages, from the Python Package Index (Pypi):
> >>
> >>  * boto3
> >>  * celery
> >>  * cryptography
> >>  * flask-appbuilder
> >>  * flask-cache
> >>  * flask-migrate
> >>  * flask-script
> >>  * flask-sqlalchemy
> >>  * flask-testing
> >>  * humanize
> >>  * gunicorn
> >>  * markdown
> >>  * pandas
> >>  * parsedatetime
> >>  * pydruid
> >>  * PyHive
> >>  * python-dateutil
> >>  * requests
> >>  * simplejson
> >>  * six
> >>  * sqlalchemy
> >>  * sqlalchemy-utils
> >>  * sqlparse
> >>  * thrift
> >>  * thrift-sasl
> >>  * werkzeug
> >>
> >> List of Javascript packages, from NPM:
> >>  * autobind-decorator
> >>  * bootstrap
> >>  * bootstrap-datepicker
> >>  * brace
> >>  * brfs
> >>  * cal-heatmap
> >>  * classnames
> >>  * d3
> >>  * d3-cloud
> >>  * d3-sankey
> >>  * d3-scale
> >>  * d3-tip
> >>  * datamaps
> >>  * datatables-bootstrap3-plugin
> >>  * datatables.net-bs
> >>  * font-awesome
> >>  * gridster
> >>  * immutability-helper
> >>  * immutable
> >>  * jquery
> >>  * lodash.throttle
> >>  * mapbox-gl
> >>  * moment
> >>  * moments
> >>  * mustache
> >>  * nvd3
> >>  * react
> >>  * react-ace
> >>  * react-bootstrap
> >>  * react-bootstrap-table
> >>  * react-dom
> >>  * react-draggable
> >>  * react-gravatar
> >>  * react-grid-layout
> >>  * react-map-gl
> >>  * react-redux
> >>  * react-resizable
> >>  * react-select
> >>  * react-syntax-highlighter
> >>  * reactable
> >>  * redux
> >>  * redux-localstorage
> >>  * redux-thunk
> >>  * shortid
> >>  * style-loader
> >>  * supercluster
> >>  * topojson
> >>  * victory
> >>  * viewport-mercator-project
> >>
> >> == Cryptography ==
> >> The proposal does not include cryptographic code.
> >>
> >> == Required Resources ==
> >>
> >> === Mailing List ===
> >> There is a current mailing list as a Google Group “airbnb_superset” that
> >> we
> >> are planning on deprecating as the Apache.org become ready to serve our
> >> community.
> >>
> >>  * superset-private
> >>  * superset-dev
> >>  * superset-user
> >>
> >> === Subversion Directory ===
> >> Git is the preferred source control system.
> >> http://svn.apache.org/repos/asf/incubator/superset
> >>
> >> == Git Repository ==
> >> Git is the preferred source control system, we’re assuming
> >> https://github.com/apache/incubator-superset based on the naming scheme
> >>
> >> == Issue Tracking ==
> >> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues &
> >> PRs
> >> to manage our project as much as possible. It’s been said that there are
> >> ways to keep Github’s issues in sync with Jira, allowing us to get best
> of
> >> both worlds. If that is not possible, we will comply to using Jira.
> >>
> >> == Other Resources ==
> >> We currently use a set of Github integrated services that are free to
> the
> >> open source community, like Travis-ci, Code Climate, Coveralls,
> >> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep
> >> using
> >> these services as they allow us to scale contributions and optimize our
> >> development flows. These services require some elevated rights on the
> >> Github repository in order to set up or tune and we would like for the
> >> committers to have the required rights.
> >>
> >>
> >> == Initial Committers ==
> >>
> >>  * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer
> >>  * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer
> >>  * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer
> >>  * Vera Liu <vera....@airbnb.com> - Committer
> >>  * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer
> >>  * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer
> >>  * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer
> >>  * Slim Bouguerra <sbougue...@hortonworks.com> - Committer
> >>  * Priyank Shah <ps...@hortonworks.com> - Committer
> >>  * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer
> >>  * Daniel Dai <da...@apache.org> - Champion & Committer
> >>  * Luke Han <luke....@apache.org> - Mentor
> >>
> >> == Affiliations ==
> >> The initial committers are employees of Airbnb Inc. and Hortonworks.
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >> Daniel Dai <da...@apache.org>
> >>
> >> === Nominated Mentors ===
> >>  * Ashutosh Chauhan <hashut...@apache.org>
> >>  * Luke Han <luke....@apache.org>
> >>
> >> === Sponsoring Entity ===
> >> Incubator PMC
> >>
> >
> >
>

Re: [VOTE] Superset Proposal for Apache Incubator

Reply via email to