Re: [VOTE] Superset Proposal for Apache Incubator

Jitendra Pandey Tue, 25 Apr 2017 13:53:33 -0700

+1 (binding)

On 4/25/17, 1:27 PM, "Julian Hyde" <jh...@apache.org> wrote:


    +1 binding
    
    > On Apr 25, 2017, at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
    > 
    > +1 (non-binding)
    > 
    > On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <hashut...@apache.org>
    > wrote:
    > 
    >> +1 (binding)
    >> 
    >> Thanks,
    >> Ashutosh
    >> 
    >> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com> wrote:
    >> 
    >>> +1 binding
    >>> 
    >>> Love to see Superset to be new incubator project.
    >>> 
    >>> 
    >>> Best Regards!
    >>> ---------------------
    >>> 
    >>> Luke Han
    >>> 
    >>> On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.f...@gmail.com> wrote:
    >>> 
    >>>> Dear Apache Incubator Community,
    >>>> 
    >>>> We have updated the Superset proposal
    >>>> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) for
    >>>> 
    >>>> Apache Incubation with an additional mentor (Luke Han -
    >>>> luke....@apache.org),
    >>>> and would like to start a vote thread for acceptance into the 
incubator.
    >>>> 
    >>>> Our team is excited to share Superset with the Apache community and we
    >>>> hope
    >>>> for the your continued support!
    >>>> 
    >>>> Cheers,
    >>>> Jeff & the Superset Team
    >>>> 
    >>>> 
    >>>> 
    >>>> 
    >>>> = Superset =
    >>>> 
    >>>> == Abstract ==
    >>>> Superset is an enterprise-ready web application for data exploration,
    >> data
    >>>> visualization and dashboarding.
    >>>> 
    >>>> == Proposal ==
    >>>> Superset is business intelligence (BI) software that helps modern
    >>>> organizations visualize and interact with their data. Superset enables
    >>>> users explore data from a variety of databases, assemble beautiful
    >>>> dashboards and share their findings.  Superset works neatly with all
    >>>> modern
    >>>> SQL-speaking databases, and integrates with Druid.io to provide
    >> real-time,
    >>>> interactive, blazing fast data access to large datasets.
    >>>> 
    >>>> == Background ==
    >>>> Data is mission critical. To succeed in this era, organizations need to
    >>>> provide low-friction, intuitive and interactive access to data. It is
    >>>> paramount for knowledge workers to be capable of answering their own
    >>>> questions by querying, exploring and visualizing data.
    >>>> 
    >>>> The entire business intelligence industry has pivoted from a model of
    >>>> centralized top-down platforms driven by IT organizations to
    >> self-service
    >>>> analytics and agile workflows by any user.  This shift unblocks
    >>>> centralized
    >>>> service bottlenecks for creating data visualizations while also 
creating
    >>>> an
    >>>> environment that is iterative and fast-moving.  This means that 
business
    >>>> intelligence software must also be easy and delightful to use.
    >>>> Self-service analytics doesn’t mean that admin and governance features
    >> are
    >>>> not needed.
    >>>> Modern BI tools provide fine-grain access controls and auditing
    >>>> capabilities to understand how data is being used.  Superset is a
    >> solution
    >>>> that delivers on all of these vectors.
    >>>> 
    >>>> The technology stack is also constantly morphing - vendors are
    >> struggling
    >>>> to provide cheap, quick and easy solutions to access data.  Business
    >>>> intelligence users are finding existing solutions lacking as these
    >>>> software
    >>>> products either disregard or react slowly to recent game-changing
    >>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, 
d3.js,
    >>>> React.js and iPython’s Jupyter for instance.
    >>>> 
    >>>> == Rationale ==
    >>>> Business intelligence is more relevant today than at any other point in
    >>>> history.  Organizations are currently very limited in options for open
    >>>> source data visualization solutions, especially solutions that are both
    >>>> self-service and enterprise-ready.  Every company informing their
    >>>> decisions
    >>>> with data needs a BI tool.
    >>>> 
    >>>> We believe that Superset will be a strong compliment to existing Apache
    >>>> Software Foundation technologies by offering scalable user interactions
    >> to
    >>>> distributed storage and computation solutions.  Users will often find
    >> that
    >>>> Superset can act as a catalyst for tooling that can visualize the
    >>>> byproduct
    >>>> of data and computation infrastructure.
    >>>> 
    >>>> Superset has many key design elements that help fill a gap in current
    >>>> solutions for organizations:
    >>>> * Easy, low friction access to data through a simple, web-based data
    >>>> exploration interface.  Composing charts and dashboards are intuitive.
    >>>> Eliminating the need to write code or SQL empowers anyone to use it.
    >>>> * Access to a wide array of rich, interactive data visualization types.
    >>>> * Enterprise-ready: Integration with different authentication
    >> mechanisms
    >>>> and granular permissions centered around actions and data access.
    >>>> * Realtime & fast: Superset provides realtime analytics at the speed of
    >>>> thought on very large datasets when integrated with Druid.io.
    >>>> * Broad data access: Consume data out of any SQL-speaking relational
    >>>> database.
    >>>> * Extensible: Can be extended to talk to many noSQL databases like
    >> Apache
    >>>> Drill, Elastic Search, and other popular database engines.
    >>>> * Fast loading dashboards with configurable web-scale caching.
    >>>> * Plug-in framework that enables organizations to build custom
    >> analytical
    >>>> applications with new UI/UX interfaces.
    >>>> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users
    >>>> with more flexibility.  SQL Lab integrates with the visualization 
engine
    >>>> seamlessly.
    >>>> 
    >>>> == Initial Goals ==
    >>>> The initial goals of the Superset project are several-fold:
    >>>> * Move the existing codebase to Apache and integrate with the Apache
    >>>> development process.
    >>>> * Redesign the user interface and interaction model for creating
    >>>> visualizations/dashboards and connecting to data sources
    >>>> * Build robust support for security and governance of the tool
    >> including
    >>>> popular authorization modules (including Apache Ranger and Apache
    >> Sentry)
    >>>> and a more sophisticated permissions system
    >>>> * Grow the extensibility of the project both in terms of enhanced
    >>>> connectivity to NoSQL-based data sources and creating a plug-in
    >> framework
    >>>> that enables organizations to build custom analytical applications 
which
    >>>> require a new UI/UX
    >>>> 
    >>>> == Current Status ==
    >>>> By many standards, Superset is already a successful open source 
project.
    >>>> As
    >>>> of March 2017, Superset is officially used in production at about a
    >> dozen
    >>>> companies, has received contributions from over one hundred 
contributors
    >>>> on
    >>>> Github, 1500+ forks, and 12k+ stars.
    >>>> 
    >>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
    >>>> significant contributions, and expressed their commitment to the
    >> project.
    >>>> The product is feature complete and has been viable for months. It
    >> already
    >>>> serves as the main interface for consuming data at many companies of
    >>>> different sizes.
    >>>> 
    >>>> While the product is usable, there’s room for improvement across the
    >>>> board,
    >>>> starting with providing a smoother user experience around content
    >>>> creation,
    >>>> making sure all features work out-of-the-box on more platforms and
    >>>> databases, providing better user training guides and videos, having a
    >>>> predictable release process, and increasing the overall quality of the
    >>>> Superset releases.
    >>>> 
    >>>> === Meritocracy ===
    >>>> We plan to invest in supporting a meritocracy. We will discuss the
    >>>> requirements in an open forum. Several companies have expressed 
interest
    >>>> in
    >>>> this project, and we intend to invite additional developers to
    >>>> participate.
    >>>> We will encourage and monitor community participation so that 
privileges
    >>>> can be extended to those that contribute.
    >>>> 
    >>>> === Community ===
    >>>> The need for an enterprise-ready data visualization and exploration
    >>>> platform in the open source community is tremendous.  While Superset is
    >>>> fairly well known, recognized and used within the Druid.io community,
    >>>> adoption is currently limited outside of that niche. There is a huge
    >>>> opportunity to grow the community to hundreds if not thousands of
    >>>> organizations, and we are hoping that embracing “the Apache way” will
    >>>> accelerate the growth of our community.
    >>>> 
    >>>> We have already been active at seeking and inviting contributions, and
    >> are
    >>>> planning to scale the project by investing time and growing the support
    >>>> structure to grow the community.
    >>>> 
    >>>> === Core Developers ===
    >>>> The initial committers for Superset include experienced full stack,
    >>>> front-end and data engineers:
    >>>> * Maxime Beauchemin (Airbnb)
    >>>> * Alanna Scott (Airbnb)
    >>>> * Bogdan Kyryliuk (Airbnb)
    >>>> * Vera Liu  (Airbnb)
    >>>> * Jeff Feng (Airbnb)
    >>>> * Ashutosh Chauhan (Hortonworks)
    >>>> * Nishant Bangarwa (Hortonworks)
    >>>> * Slim Bouguerra (Hortonworks)
    >>>> * Priyank Shah (Hortonworks)
    >>>> * Sriharsha Chintalapani (Hortonworks)
    >>>> * Daniel Dai (Hortonworks)
    >>>> 
    >>>> We realize that additional employer diversity is needed, and we will
    >> work
    >>>> aggressively to recruit developers from additional companies.
    >>>> 
    >>>> === Alignment ===
    >>>> The initial committers strongly believe that a system for interactive
    >>>> visualization of data will gain broader adoption as an open source,
    >>>> community driven project, where the community can contribute not only 
to
    >>>> the core components, but also to a growing collection of connectors,
    >>>> visualizations and improving integration a all potential data sources.
    >>>> Superset already integrates closely with Apache Hive, the Hive
    >> metastore,
    >>>> as well as most SQL-speaking databases found in modern data ecosystems.
    >>>> 
    >>>> == Known Risks ==
    >>>> 
    >>>> === Orphaned Products ===
    >>>> Superset is a vital component for both visualizing, accessing and
    >>>> democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
    >>>> component of the DataFlow product offering.  Thus, the risk of the
    >> project
    >>>> being orphaned is relatively low.  The project could be at risk if
    >> Airbnb
    >>>> changes their approach for democratizing data or if Hortonworks changes
    >>>> their strategy in the market.  In such an event, the committers plan to
    >>>> continue working on the project on their own time, thought the progress
    >>>> will likely be slower.  We plan to mitigate this risk by recruiting
    >>>> additional committers.
    >>>> 
    >>>> === Inexperience with Open Source ===
    >>>> The initial committers include veteran Apache members (committers and
    >> PPMC
    >>>> members) and other developers who have varying degrees of experience
    >> with
    >>>> open source projects. All have been involved with source code that has
    >>>> been
    >>>> released under an open source license, and several also have experience
    >>>> developing code with an open source development process.
    >>>> 
    >>>> === Homogenous Developers ===
    >>>> The initial committers are employed by Airbnb Inc. and Hortonworks. We
    >> are
    >>>> committed to recruiting additional committers from other companies.
    >>>> 
    >>>> === Reliance on Salaried Developers ===
    >>>> It is expected that Superset development will occur on both salaried
    >> time
    >>>> and on volunteer time, after hours. The majority of initial committers
    >> are
    >>>> paid by their employer to contribute to this project. However, they are
    >>>> all
    >>>> passionate about the project, and we are confident that the project 
will
    >>>> continue even if no salaried developers contribute to the project. We
    >> are
    >>>> committed to recruiting additional committers including non-salaried
    >>>> developers.
    >>>> 
    >>>> === Relationships with Other Apache Products ===
    >>>> To the knowledge of the Initial Committers, there are no direct
    >>>> competitors
    >>>> to Superset within the Apache Software Foundation.  That said, Apache
    >>>> Zeppelin is an indirect competitor, but it solves a different use case.
    >>>> 
    >>>> Apache Zeppelin is a web-based notebook that enables interactive data
    >>>> analytics. It enables the creation of beautiful data-driven, 
interactive
    >>>> and collaborative documents with SQL, Scala and more.  Although a user
    >> can
    >>>> create data visualizations using this project, it leverages a notebook
    >>>> style user interfaces and it is geared towards the Spark community 
where
    >>>> Scala and SQL co-exist
    >>>> 
    >>>> We look forward to collaborating with those communities, as well as
    >> other
    >>>> Apache communities.
    >>>> 
    >>>> === An Excessive Fascination with the Apache Brand ===
    >>>> Superset is solving two huge challenges:
    >>>> The challenge of enabling every knowledge worker to make data informed
    >>>> decisions, particularly those who are not deeply skilled at writing 
SQL.
    >>>> The challenge of visualizing huge amounts of data interactively and in
    >>>> real-time
    >>>> 
    >>>> Superset was first developed as a data visualization solution for
    >> Druid.io
    >>>> as a way to visualize billions of rows of data.  Since then, usage of
    >>>> Superset has expanded to address data visualization use cases across 
SQL
    >>>> speaking data sources as well.
    >>>> 
    >>>> Our rationale for developing Superset as an Apache project is detailed
    >> in
    >>>> the Rationale Section.  We believe that the Apache brand and community
    >>>> process will help us attract more contributors to this project, and 
help
    >>>> grow the footprint of the project through usage at other organizations
    >> and
    >>>> within other applications.  Establishing consensus among users and
    >>>> developers will result in a more valuable tool for everyone.
    >>>> 
    >>>> == Documentation ==
    >>>> References to further reading material:
    >>>> * [[http://airbnb.io/superset/|Superset Documentation]]
    >>>> * [[
    >>>> https://medium.com/airbnb-engineering/caravel-airbnb-s-data-
    >>>> exploration-platform-15a72aa610e5#.npqmmbu25|Blog
    >>>> Post:  Superset: Airbnb’s Data Exploration Platform]]
    >>>> * [[
    >>>> https://medium.com/airbnb-engineering/superset-scaling-data-
    >>>> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog
    >>>> Post:  Superset: Scaling Data Access & Visual Insights at Airbnb]]
    >>>> 
    >>>> == Initial Source ==
    >>>> The origin of the proposed code base can be found at
    >>>> https://github.com/airbnb/superset.  The code base is primarily in
    >>>> Python.
    >>>> 
    >>>> == Source and Intellectual Property Submission Plan ==
    >>>> We do not expect any complications for the submission of the Superset
    >> code
    >>>> base.  Our code is already in Github and there is only a single code
    >> base.
    >>>> 
    >>>> == External Dependencies ==
    >>>> List of Python packages, from the Python Package Index (Pypi):
    >>>> 
    >>>> * boto3
    >>>> * celery
    >>>> * cryptography
    >>>> * flask-appbuilder
    >>>> * flask-cache
    >>>> * flask-migrate
    >>>> * flask-script
    >>>> * flask-sqlalchemy
    >>>> * flask-testing
    >>>> * humanize
    >>>> * gunicorn
    >>>> * markdown
    >>>> * pandas
    >>>> * parsedatetime
    >>>> * pydruid
    >>>> * PyHive
    >>>> * python-dateutil
    >>>> * requests
    >>>> * simplejson
    >>>> * six
    >>>> * sqlalchemy
    >>>> * sqlalchemy-utils
    >>>> * sqlparse
    >>>> * thrift
    >>>> * thrift-sasl
    >>>> * werkzeug
    >>>> 
    >>>> List of Javascript packages, from NPM:
    >>>> * autobind-decorator
    >>>> * bootstrap
    >>>> * bootstrap-datepicker
    >>>> * brace
    >>>> * brfs
    >>>> * cal-heatmap
    >>>> * classnames
    >>>> * d3
    >>>> * d3-cloud
    >>>> * d3-sankey
    >>>> * d3-scale
    >>>> * d3-tip
    >>>> * datamaps
    >>>> * datatables-bootstrap3-plugin
    >>>> * datatables.net-bs
    >>>> * font-awesome
    >>>> * gridster
    >>>> * immutability-helper
    >>>> * immutable
    >>>> * jquery
    >>>> * lodash.throttle
    >>>> * mapbox-gl
    >>>> * moment
    >>>> * moments
    >>>> * mustache
    >>>> * nvd3
    >>>> * react
    >>>> * react-ace
    >>>> * react-bootstrap
    >>>> * react-bootstrap-table
    >>>> * react-dom
    >>>> * react-draggable
    >>>> * react-gravatar
    >>>> * react-grid-layout
    >>>> * react-map-gl
    >>>> * react-redux
    >>>> * react-resizable
    >>>> * react-select
    >>>> * react-syntax-highlighter
    >>>> * reactable
    >>>> * redux
    >>>> * redux-localstorage
    >>>> * redux-thunk
    >>>> * shortid
    >>>> * style-loader
    >>>> * supercluster
    >>>> * topojson
    >>>> * victory
    >>>> * viewport-mercator-project
    >>>> 
    >>>> == Cryptography ==
    >>>> The proposal does not include cryptographic code.
    >>>> 
    >>>> == Required Resources ==
    >>>> 
    >>>> === Mailing List ===
    >>>> There is a current mailing list as a Google Group “airbnb_superset” 
that
    >>>> we
    >>>> are planning on deprecating as the Apache.org become ready to serve our
    >>>> community.
    >>>> 
    >>>> * superset-private
    >>>> * superset-dev
    >>>> * superset-user
    >>>> 
    >>>> === Subversion Directory ===
    >>>> Git is the preferred source control system.
    >>>> http://svn.apache.org/repos/asf/incubator/superset
    >>>> 
    >>>> == Git Repository ==
    >>>> Git is the preferred source control system, we’re assuming
    >>>> https://github.com/apache/incubator-superset based on the naming scheme
    >>>> 
    >>>> == Issue Tracking ==
    >>>> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues &
    >>>> PRs
    >>>> to manage our project as much as possible. It’s been said that there 
are
    >>>> ways to keep Github’s issues in sync with Jira, allowing us to get best
    >> of
    >>>> both worlds. If that is not possible, we will comply to using Jira.
    >>>> 
    >>>> == Other Resources ==
    >>>> We currently use a set of Github integrated services that are free to
    >> the
    >>>> open source community, like Travis-ci, Code Climate, Coveralls,
    >>>> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep
    >>>> using
    >>>> these services as they allow us to scale contributions and optimize our
    >>>> development flows. These services require some elevated rights on the
    >>>> Github repository in order to set up or tune and we would like for the
    >>>> committers to have the required rights.
    >>>> 
    >>>> 
    >>>> == Initial Committers ==
    >>>> 
    >>>> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer
    >>>> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer
    >>>> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer
    >>>> * Vera Liu <vera....@airbnb.com> - Committer
    >>>> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer
    >>>> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer
    >>>> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer
    >>>> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer
    >>>> * Priyank Shah <ps...@hortonworks.com> - Committer
    >>>> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer
    >>>> * Daniel Dai <da...@apache.org> - Champion & Committer
    >>>> * Luke Han <luke....@apache.org> - Mentor
    >>>> 
    >>>> == Affiliations ==
    >>>> The initial committers are employees of Airbnb Inc. and Hortonworks.
    >>>> 
    >>>> == Sponsors ==
    >>>> 
    >>>> === Champion ===
    >>>> Daniel Dai <da...@apache.org>
    >>>> 
    >>>> === Nominated Mentors ===
    >>>> * Ashutosh Chauhan <hashut...@apache.org>
    >>>> * Luke Han <luke....@apache.org>
    >>>> 
    >>>> === Sponsoring Entity ===
    >>>> Incubator PMC
    >>>> 
    >>> 
    >>> 
    >> 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
    For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Superset Proposal for Apache Incubator

Reply via email to