Re: [VOTE] Superset Proposal for Apache Incubator

Joe Witt Tue, 25 Apr 2017 13:58:31 -0700

+1 (binding)

On Tue, Apr 25, 2017 at 4:52 PM, Jitendra Pandey
<jiten...@hortonworks.com> wrote:
> +1 (binding)
>
> On 4/25/17, 1:27 PM, "Julian Hyde" <jh...@apache.org> wrote:
>
>     +1 binding
>
>     > On Apr 25, 2017, at 12:48 PM, moon soo Lee <m...@apache.org> wrote:
>     >
>     > +1 (non-binding)
>     >
>     > On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <hashut...@apache.org>
>     > wrote:
>     >
>     >> +1 (binding)
>     >>
>     >> Thanks,
>     >> Ashutosh
>     >>
>     >> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com> wrote:
>     >>
>     >>> +1 binding
>     >>>
>     >>> Love to see Superset to be new incubator project.
>     >>>
>     >>>
>     >>> Best Regards!
>     >>> ---------------------
>     >>>
>     >>> Luke Han
>     >>>
>     >>> On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.f...@gmail.com> 
> wrote:
>     >>>
>     >>>> Dear Apache Incubator Community,
>     >>>>
>     >>>> We have updated the Superset proposal
>     >>>> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) 
> for
>     >>>>
>     >>>> Apache Incubation with an additional mentor (Luke Han -
>     >>>> luke....@apache.org),
>     >>>> and would like to start a vote thread for acceptance into the 
> incubator.
>     >>>>
>     >>>> Our team is excited to share Superset with the Apache community and 
> we
>     >>>> hope
>     >>>> for the your continued support!
>     >>>>
>     >>>> Cheers,
>     >>>> Jeff & the Superset Team
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>> = Superset =
>     >>>>
>     >>>> == Abstract ==
>     >>>> Superset is an enterprise-ready web application for data exploration,
>     >> data
>     >>>> visualization and dashboarding.
>     >>>>
>     >>>> == Proposal ==
>     >>>> Superset is business intelligence (BI) software that helps modern
>     >>>> organizations visualize and interact with their data. Superset 
> enables
>     >>>> users explore data from a variety of databases, assemble beautiful
>     >>>> dashboards and share their findings.  Superset works neatly with all
>     >>>> modern
>     >>>> SQL-speaking databases, and integrates with Druid.io to provide
>     >> real-time,
>     >>>> interactive, blazing fast data access to large datasets.
>     >>>>
>     >>>> == Background ==
>     >>>> Data is mission critical. To succeed in this era, organizations need 
> to
>     >>>> provide low-friction, intuitive and interactive access to data. It is
>     >>>> paramount for knowledge workers to be capable of answering their own
>     >>>> questions by querying, exploring and visualizing data.
>     >>>>
>     >>>> The entire business intelligence industry has pivoted from a model of
>     >>>> centralized top-down platforms driven by IT organizations to
>     >> self-service
>     >>>> analytics and agile workflows by any user.  This shift unblocks
>     >>>> centralized
>     >>>> service bottlenecks for creating data visualizations while also 
> creating
>     >>>> an
>     >>>> environment that is iterative and fast-moving.  This means that 
> business
>     >>>> intelligence software must also be easy and delightful to use.
>     >>>> Self-service analytics doesn’t mean that admin and governance 
> features
>     >> are
>     >>>> not needed.
>     >>>> Modern BI tools provide fine-grain access controls and auditing
>     >>>> capabilities to understand how data is being used.  Superset is a
>     >> solution
>     >>>> that delivers on all of these vectors.
>     >>>>
>     >>>> The technology stack is also constantly morphing - vendors are
>     >> struggling
>     >>>> to provide cheap, quick and easy solutions to access data.  Business
>     >>>> intelligence users are finding existing solutions lacking as these
>     >>>> software
>     >>>> products either disregard or react slowly to recent game-changing
>     >>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, 
> d3.js,
>     >>>> React.js and iPython’s Jupyter for instance.
>     >>>>
>     >>>> == Rationale ==
>     >>>> Business intelligence is more relevant today than at any other point 
> in
>     >>>> history.  Organizations are currently very limited in options for 
> open
>     >>>> source data visualization solutions, especially solutions that are 
> both
>     >>>> self-service and enterprise-ready.  Every company informing their
>     >>>> decisions
>     >>>> with data needs a BI tool.
>     >>>>
>     >>>> We believe that Superset will be a strong compliment to existing 
> Apache
>     >>>> Software Foundation technologies by offering scalable user 
> interactions
>     >> to
>     >>>> distributed storage and computation solutions.  Users will often find
>     >> that
>     >>>> Superset can act as a catalyst for tooling that can visualize the
>     >>>> byproduct
>     >>>> of data and computation infrastructure.
>     >>>>
>     >>>> Superset has many key design elements that help fill a gap in current
>     >>>> solutions for organizations:
>     >>>> * Easy, low friction access to data through a simple, web-based data
>     >>>> exploration interface.  Composing charts and dashboards are 
> intuitive.
>     >>>> Eliminating the need to write code or SQL empowers anyone to use it.
>     >>>> * Access to a wide array of rich, interactive data visualization 
> types.
>     >>>> * Enterprise-ready: Integration with different authentication
>     >> mechanisms
>     >>>> and granular permissions centered around actions and data access.
>     >>>> * Realtime & fast: Superset provides realtime analytics at the speed 
> of
>     >>>> thought on very large datasets when integrated with Druid.io.
>     >>>> * Broad data access: Consume data out of any SQL-speaking relational
>     >>>> database.
>     >>>> * Extensible: Can be extended to talk to many noSQL databases like
>     >> Apache
>     >>>> Drill, Elastic Search, and other popular database engines.
>     >>>> * Fast loading dashboards with configurable web-scale caching.
>     >>>> * Plug-in framework that enables organizations to build custom
>     >> analytical
>     >>>> applications with new UI/UX interfaces.
>     >>>> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking 
> users
>     >>>> with more flexibility.  SQL Lab integrates with the visualization 
> engine
>     >>>> seamlessly.
>     >>>>
>     >>>> == Initial Goals ==
>     >>>> The initial goals of the Superset project are several-fold:
>     >>>> * Move the existing codebase to Apache and integrate with the Apache
>     >>>> development process.
>     >>>> * Redesign the user interface and interaction model for creating
>     >>>> visualizations/dashboards and connecting to data sources
>     >>>> * Build robust support for security and governance of the tool
>     >> including
>     >>>> popular authorization modules (including Apache Ranger and Apache
>     >> Sentry)
>     >>>> and a more sophisticated permissions system
>     >>>> * Grow the extensibility of the project both in terms of enhanced
>     >>>> connectivity to NoSQL-based data sources and creating a plug-in
>     >> framework
>     >>>> that enables organizations to build custom analytical applications 
> which
>     >>>> require a new UI/UX
>     >>>>
>     >>>> == Current Status ==
>     >>>> By many standards, Superset is already a successful open source 
> project.
>     >>>> As
>     >>>> of March 2017, Superset is officially used in production at about a
>     >> dozen
>     >>>> companies, has received contributions from over one hundred 
> contributors
>     >>>> on
>     >>>> Github, 1500+ forks, and 12k+ stars.
>     >>>>
>     >>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
>     >>>> significant contributions, and expressed their commitment to the
>     >> project.
>     >>>> The product is feature complete and has been viable for months. It
>     >> already
>     >>>> serves as the main interface for consuming data at many companies of
>     >>>> different sizes.
>     >>>>
>     >>>> While the product is usable, there’s room for improvement across the
>     >>>> board,
>     >>>> starting with providing a smoother user experience around content
>     >>>> creation,
>     >>>> making sure all features work out-of-the-box on more platforms and
>     >>>> databases, providing better user training guides and videos, having a
>     >>>> predictable release process, and increasing the overall quality of 
> the
>     >>>> Superset releases.
>     >>>>
>     >>>> === Meritocracy ===
>     >>>> We plan to invest in supporting a meritocracy. We will discuss the
>     >>>> requirements in an open forum. Several companies have expressed 
> interest
>     >>>> in
>     >>>> this project, and we intend to invite additional developers to
>     >>>> participate.
>     >>>> We will encourage and monitor community participation so that 
> privileges
>     >>>> can be extended to those that contribute.
>     >>>>
>     >>>> === Community ===
>     >>>> The need for an enterprise-ready data visualization and exploration
>     >>>> platform in the open source community is tremendous.  While Superset 
> is
>     >>>> fairly well known, recognized and used within the Druid.io community,
>     >>>> adoption is currently limited outside of that niche. There is a huge
>     >>>> opportunity to grow the community to hundreds if not thousands of
>     >>>> organizations, and we are hoping that embracing “the Apache way” will
>     >>>> accelerate the growth of our community.
>     >>>>
>     >>>> We have already been active at seeking and inviting contributions, 
> and
>     >> are
>     >>>> planning to scale the project by investing time and growing the 
> support
>     >>>> structure to grow the community.
>     >>>>
>     >>>> === Core Developers ===
>     >>>> The initial committers for Superset include experienced full stack,
>     >>>> front-end and data engineers:
>     >>>> * Maxime Beauchemin (Airbnb)
>     >>>> * Alanna Scott (Airbnb)
>     >>>> * Bogdan Kyryliuk (Airbnb)
>     >>>> * Vera Liu  (Airbnb)
>     >>>> * Jeff Feng (Airbnb)
>     >>>> * Ashutosh Chauhan (Hortonworks)
>     >>>> * Nishant Bangarwa (Hortonworks)
>     >>>> * Slim Bouguerra (Hortonworks)
>     >>>> * Priyank Shah (Hortonworks)
>     >>>> * Sriharsha Chintalapani (Hortonworks)
>     >>>> * Daniel Dai (Hortonworks)
>     >>>>
>     >>>> We realize that additional employer diversity is needed, and we will
>     >> work
>     >>>> aggressively to recruit developers from additional companies.
>     >>>>
>     >>>> === Alignment ===
>     >>>> The initial committers strongly believe that a system for interactive
>     >>>> visualization of data will gain broader adoption as an open source,
>     >>>> community driven project, where the community can contribute not 
> only to
>     >>>> the core components, but also to a growing collection of connectors,
>     >>>> visualizations and improving integration a all potential data 
> sources.
>     >>>> Superset already integrates closely with Apache Hive, the Hive
>     >> metastore,
>     >>>> as well as most SQL-speaking databases found in modern data 
> ecosystems.
>     >>>>
>     >>>> == Known Risks ==
>     >>>>
>     >>>> === Orphaned Products ===
>     >>>> Superset is a vital component for both visualizing, accessing and
>     >>>> democratizing data at Airbnb.  Also at Hortonworks, Superset is a 
> core
>     >>>> component of the DataFlow product offering.  Thus, the risk of the
>     >> project
>     >>>> being orphaned is relatively low.  The project could be at risk if
>     >> Airbnb
>     >>>> changes their approach for democratizing data or if Hortonworks 
> changes
>     >>>> their strategy in the market.  In such an event, the committers plan 
> to
>     >>>> continue working on the project on their own time, thought the 
> progress
>     >>>> will likely be slower.  We plan to mitigate this risk by recruiting
>     >>>> additional committers.
>     >>>>
>     >>>> === Inexperience with Open Source ===
>     >>>> The initial committers include veteran Apache members (committers and
>     >> PPMC
>     >>>> members) and other developers who have varying degrees of experience
>     >> with
>     >>>> open source projects. All have been involved with source code that 
> has
>     >>>> been
>     >>>> released under an open source license, and several also have 
> experience
>     >>>> developing code with an open source development process.
>     >>>>
>     >>>> === Homogenous Developers ===
>     >>>> The initial committers are employed by Airbnb Inc. and Hortonworks. 
> We
>     >> are
>     >>>> committed to recruiting additional committers from other companies.
>     >>>>
>     >>>> === Reliance on Salaried Developers ===
>     >>>> It is expected that Superset development will occur on both salaried
>     >> time
>     >>>> and on volunteer time, after hours. The majority of initial 
> committers
>     >> are
>     >>>> paid by their employer to contribute to this project. However, they 
> are
>     >>>> all
>     >>>> passionate about the project, and we are confident that the project 
> will
>     >>>> continue even if no salaried developers contribute to the project. We
>     >> are
>     >>>> committed to recruiting additional committers including non-salaried
>     >>>> developers.
>     >>>>
>     >>>> === Relationships with Other Apache Products ===
>     >>>> To the knowledge of the Initial Committers, there are no direct
>     >>>> competitors
>     >>>> to Superset within the Apache Software Foundation.  That said, Apache
>     >>>> Zeppelin is an indirect competitor, but it solves a different use 
> case.
>     >>>>
>     >>>> Apache Zeppelin is a web-based notebook that enables interactive data
>     >>>> analytics. It enables the creation of beautiful data-driven, 
> interactive
>     >>>> and collaborative documents with SQL, Scala and more.  Although a 
> user
>     >> can
>     >>>> create data visualizations using this project, it leverages a 
> notebook
>     >>>> style user interfaces and it is geared towards the Spark community 
> where
>     >>>> Scala and SQL co-exist
>     >>>>
>     >>>> We look forward to collaborating with those communities, as well as
>     >> other
>     >>>> Apache communities.
>     >>>>
>     >>>> === An Excessive Fascination with the Apache Brand ===
>     >>>> Superset is solving two huge challenges:
>     >>>> The challenge of enabling every knowledge worker to make data 
> informed
>     >>>> decisions, particularly those who are not deeply skilled at writing 
> SQL.
>     >>>> The challenge of visualizing huge amounts of data interactively and 
> in
>     >>>> real-time
>     >>>>
>     >>>> Superset was first developed as a data visualization solution for
>     >> Druid.io
>     >>>> as a way to visualize billions of rows of data.  Since then, usage of
>     >>>> Superset has expanded to address data visualization use cases across 
> SQL
>     >>>> speaking data sources as well.
>     >>>>
>     >>>> Our rationale for developing Superset as an Apache project is 
> detailed
>     >> in
>     >>>> the Rationale Section.  We believe that the Apache brand and 
> community
>     >>>> process will help us attract more contributors to this project, and 
> help
>     >>>> grow the footprint of the project through usage at other 
> organizations
>     >> and
>     >>>> within other applications.  Establishing consensus among users and
>     >>>> developers will result in a more valuable tool for everyone.
>     >>>>
>     >>>> == Documentation ==
>     >>>> References to further reading material:
>     >>>> * [[http://airbnb.io/superset/|Superset Documentation]]
>     >>>> * [[
>     >>>> https://medium.com/airbnb-engineering/caravel-airbnb-s-data-
>     >>>> exploration-platform-15a72aa610e5#.npqmmbu25|Blog
>     >>>> Post:  Superset: Airbnb’s Data Exploration Platform]]
>     >>>> * [[
>     >>>> https://medium.com/airbnb-engineering/superset-scaling-data-
>     >>>> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog
>     >>>> Post:  Superset: Scaling Data Access & Visual Insights at Airbnb]]
>     >>>>
>     >>>> == Initial Source ==
>     >>>> The origin of the proposed code base can be found at
>     >>>> https://github.com/airbnb/superset.  The code base is primarily in
>     >>>> Python.
>     >>>>
>     >>>> == Source and Intellectual Property Submission Plan ==
>     >>>> We do not expect any complications for the submission of the Superset
>     >> code
>     >>>> base.  Our code is already in Github and there is only a single code
>     >> base.
>     >>>>
>     >>>> == External Dependencies ==
>     >>>> List of Python packages, from the Python Package Index (Pypi):
>     >>>>
>     >>>> * boto3
>     >>>> * celery
>     >>>> * cryptography
>     >>>> * flask-appbuilder
>     >>>> * flask-cache
>     >>>> * flask-migrate
>     >>>> * flask-script
>     >>>> * flask-sqlalchemy
>     >>>> * flask-testing
>     >>>> * humanize
>     >>>> * gunicorn
>     >>>> * markdown
>     >>>> * pandas
>     >>>> * parsedatetime
>     >>>> * pydruid
>     >>>> * PyHive
>     >>>> * python-dateutil
>     >>>> * requests
>     >>>> * simplejson
>     >>>> * six
>     >>>> * sqlalchemy
>     >>>> * sqlalchemy-utils
>     >>>> * sqlparse
>     >>>> * thrift
>     >>>> * thrift-sasl
>     >>>> * werkzeug
>     >>>>
>     >>>> List of Javascript packages, from NPM:
>     >>>> * autobind-decorator
>     >>>> * bootstrap
>     >>>> * bootstrap-datepicker
>     >>>> * brace
>     >>>> * brfs
>     >>>> * cal-heatmap
>     >>>> * classnames
>     >>>> * d3
>     >>>> * d3-cloud
>     >>>> * d3-sankey
>     >>>> * d3-scale
>     >>>> * d3-tip
>     >>>> * datamaps
>     >>>> * datatables-bootstrap3-plugin
>     >>>> * datatables.net-bs
>     >>>> * font-awesome
>     >>>> * gridster
>     >>>> * immutability-helper
>     >>>> * immutable
>     >>>> * jquery
>     >>>> * lodash.throttle
>     >>>> * mapbox-gl
>     >>>> * moment
>     >>>> * moments
>     >>>> * mustache
>     >>>> * nvd3
>     >>>> * react
>     >>>> * react-ace
>     >>>> * react-bootstrap
>     >>>> * react-bootstrap-table
>     >>>> * react-dom
>     >>>> * react-draggable
>     >>>> * react-gravatar
>     >>>> * react-grid-layout
>     >>>> * react-map-gl
>     >>>> * react-redux
>     >>>> * react-resizable
>     >>>> * react-select
>     >>>> * react-syntax-highlighter
>     >>>> * reactable
>     >>>> * redux
>     >>>> * redux-localstorage
>     >>>> * redux-thunk
>     >>>> * shortid
>     >>>> * style-loader
>     >>>> * supercluster
>     >>>> * topojson
>     >>>> * victory
>     >>>> * viewport-mercator-project
>     >>>>
>     >>>> == Cryptography ==
>     >>>> The proposal does not include cryptographic code.
>     >>>>
>     >>>> == Required Resources ==
>     >>>>
>     >>>> === Mailing List ===
>     >>>> There is a current mailing list as a Google Group “airbnb_superset” 
> that
>     >>>> we
>     >>>> are planning on deprecating as the Apache.org become ready to serve 
> our
>     >>>> community.
>     >>>>
>     >>>> * superset-private
>     >>>> * superset-dev
>     >>>> * superset-user
>     >>>>
>     >>>> === Subversion Directory ===
>     >>>> Git is the preferred source control system.
>     >>>> http://svn.apache.org/repos/asf/incubator/superset
>     >>>>
>     >>>> == Git Repository ==
>     >>>> Git is the preferred source control system, we’re assuming
>     >>>> https://github.com/apache/incubator-superset based on the naming 
> scheme
>     >>>>
>     >>>> == Issue Tracking ==
>     >>>> JIRA Superset (SUPERSET). If possible, we’d like to use Github 
> issues &
>     >>>> PRs
>     >>>> to manage our project as much as possible. It’s been said that there 
> are
>     >>>> ways to keep Github’s issues in sync with Jira, allowing us to get 
> best
>     >> of
>     >>>> both worlds. If that is not possible, we will comply to using Jira.
>     >>>>
>     >>>> == Other Resources ==
>     >>>> We currently use a set of Github integrated services that are free to
>     >> the
>     >>>> open source community, like Travis-ci, Code Climate, Coveralls,
>     >>>> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep
>     >>>> using
>     >>>> these services as they allow us to scale contributions and optimize 
> our
>     >>>> development flows. These services require some elevated rights on the
>     >>>> Github repository in order to set up or tune and we would like for 
> the
>     >>>> committers to have the required rights.
>     >>>>
>     >>>>
>     >>>> == Initial Committers ==
>     >>>>
>     >>>> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer
>     >>>> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer
>     >>>> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer
>     >>>> * Vera Liu <vera....@airbnb.com> - Committer
>     >>>> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer
>     >>>> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer
>     >>>> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer
>     >>>> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer
>     >>>> * Priyank Shah <ps...@hortonworks.com> - Committer
>     >>>> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer
>     >>>> * Daniel Dai <da...@apache.org> - Champion & Committer
>     >>>> * Luke Han <luke....@apache.org> - Mentor
>     >>>>
>     >>>> == Affiliations ==
>     >>>> The initial committers are employees of Airbnb Inc. and Hortonworks.
>     >>>>
>     >>>> == Sponsors ==
>     >>>>
>     >>>> === Champion ===
>     >>>> Daniel Dai <da...@apache.org>
>     >>>>
>     >>>> === Nominated Mentors ===
>     >>>> * Ashutosh Chauhan <hashut...@apache.org>
>     >>>> * Luke Han <luke....@apache.org>
>     >>>>
>     >>>> === Sponsoring Entity ===
>     >>>> Incubator PMC
>     >>>>
>     >>>
>     >>>
>     >>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>     For additional commands, e-mail: general-h...@incubator.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Superset Proposal for Apache Incubator

Reply via email to