Congrats team! On Tue, May 2, 2017 at 9:54 AM, Jeff Feng <jeff.f...@gmail.com> wrote:
> The vote for Superset passes with 11 +1 binding votes, 3 +1 non-binding > votes and no -1 votes. Below are the overall results: > > *Binding:* > Ashutosh Chauhan +1 > Luke Han + 1 > Julian Hyde +1 > Jitendra Pandey +1 > Joe Witt +1 > Ted Dunning +1 > P. Taylor Goetz +1 > Edward Yoon +1 > Jacques Nadeau +1 > Julian Le Dem +1 > Jim Jagielski +1 > > *Non-Binding:* > Moon Soo Lee +1 > Naresh Agarwal +1 > Felix Cheng +1 > > Thank you to everyone who participated in the vote. > > Please welcome Superset to the Apache Incubator! > > Jeff > > > > On Sun, Apr 23, 2017 at 7:53 AM, Jeff Feng <jeff.f...@gmail.com> wrote: > >> Dear Apache Incubator Community, >> >> We have updated the Superset proposal >> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) for >> Apache Incubation with an additional mentor (Luke Han - >> luke....@apache.org), and would like to start a vote thread for >> acceptance into the incubator. >> >> Our team is excited to share Superset with the Apache community and we >> hope for the your continued support! >> >> Cheers, >> Jeff & the Superset Team >> >> >> >> >> = Superset = >> >> == Abstract == >> Superset is an enterprise-ready web application for data exploration, >> data visualization and dashboarding. >> >> == Proposal == >> Superset is business intelligence (BI) software that helps modern >> organizations visualize and interact with their data. Superset enables >> users explore data from a variety of databases, assemble beautiful >> dashboards and share their findings. Superset works neatly with all modern >> SQL-speaking databases, and integrates with Druid.io to provide real-time, >> interactive, blazing fast data access to large datasets. >> >> == Background == >> Data is mission critical. To succeed in this era, organizations need to >> provide low-friction, intuitive and interactive access to data. It is >> paramount for knowledge workers to be capable of answering their own >> questions by querying, exploring and visualizing data. >> >> The entire business intelligence industry has pivoted from a model of >> centralized top-down platforms driven by IT organizations to self-service >> analytics and agile workflows by any user. This shift unblocks centralized >> service bottlenecks for creating data visualizations while also creating an >> environment that is iterative and fast-moving. This means that business >> intelligence software must also be easy and delightful to use. >> Self-service analytics doesn’t mean that admin and governance features are >> not needed. >> Modern BI tools provide fine-grain access controls and auditing >> capabilities to understand how data is being used. Superset is a solution >> that delivers on all of these vectors. >> >> The technology stack is also constantly morphing - vendors are struggling >> to provide cheap, quick and easy solutions to access data. Business >> intelligence users are finding existing solutions lacking as these software >> products either disregard or react slowly to recent game-changing >> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js, >> React.js and iPython’s Jupyter for instance. >> >> == Rationale == >> Business intelligence is more relevant today than at any other point in >> history. Organizations are currently very limited in options for open >> source data visualization solutions, especially solutions that are both >> self-service and enterprise-ready. Every company informing their decisions >> with data needs a BI tool. >> >> We believe that Superset will be a strong compliment to existing Apache >> Software Foundation technologies by offering scalable user interactions to >> distributed storage and computation solutions. Users will often find that >> Superset can act as a catalyst for tooling that can visualize the byproduct >> of data and computation infrastructure. >> >> Superset has many key design elements that help fill a gap in current >> solutions for organizations: >> * Easy, low friction access to data through a simple, web-based data >> exploration interface. Composing charts and dashboards are intuitive. >> Eliminating the need to write code or SQL empowers anyone to use it. >> * Access to a wide array of rich, interactive data visualization types. >> * Enterprise-ready: Integration with different authentication mechanisms >> and granular permissions centered around actions and data access. >> * Realtime & fast: Superset provides realtime analytics at the speed of >> thought on very large datasets when integrated with Druid.io. >> * Broad data access: Consume data out of any SQL-speaking relational >> database. >> * Extensible: Can be extended to talk to many noSQL databases like >> Apache Drill, Elastic Search, and other popular database engines. >> * Fast loading dashboards with configurable web-scale caching. >> * Plug-in framework that enables organizations to build custom >> analytical applications with new UI/UX interfaces. >> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users >> with more flexibility. SQL Lab integrates with the visualization engine >> seamlessly. >> >> == Initial Goals == >> The initial goals of the Superset project are several-fold: >> * Move the existing codebase to Apache and integrate with the Apache >> development process. >> * Redesign the user interface and interaction model for creating >> visualizations/dashboards and connecting to data sources >> * Build robust support for security and governance of the tool including >> popular authorization modules (including Apache Ranger and Apache Sentry) >> and a more sophisticated permissions system >> * Grow the extensibility of the project both in terms of enhanced >> connectivity to NoSQL-based data sources and creating a plug-in framework >> that enables organizations to build custom analytical applications which >> require a new UI/UX >> >> == Current Status == >> By many standards, Superset is already a successful open source project. >> As of March 2017, Superset is officially used in production at about a >> dozen companies, has received contributions from over one hundred >> contributors on Github, 1500+ forks, and 12k+ stars. >> >> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made >> significant contributions, and expressed their commitment to the project. >> The product is feature complete and has been viable for months. It already >> serves as the main interface for consuming data at many companies of >> different sizes. >> >> While the product is usable, there’s room for improvement across the >> board, starting with providing a smoother user experience around content >> creation, making sure all features work out-of-the-box on more platforms >> and databases, providing better user training guides and videos, having a >> predictable release process, and increasing the overall quality of the >> Superset releases. >> >> === Meritocracy === >> We plan to invest in supporting a meritocracy. We will discuss the >> requirements in an open forum. Several companies have expressed interest in >> this project, and we intend to invite additional developers to participate. >> We will encourage and monitor community participation so that privileges >> can be extended to those that contribute. >> >> === Community === >> The need for an enterprise-ready data visualization and exploration >> platform in the open source community is tremendous. While Superset is >> fairly well known, recognized and used within the Druid.io community, >> adoption is currently limited outside of that niche. There is a huge >> opportunity to grow the community to hundreds if not thousands of >> organizations, and we are hoping that embracing “the Apache way” will >> accelerate the growth of our community. >> >> We have already been active at seeking and inviting contributions, and >> are planning to scale the project by investing time and growing the support >> structure to grow the community. >> >> === Core Developers === >> The initial committers for Superset include experienced full stack, >> front-end and data engineers: >> * Maxime Beauchemin (Airbnb) >> * Alanna Scott (Airbnb) >> * Bogdan Kyryliuk (Airbnb) >> * Vera Liu (Airbnb) >> * Jeff Feng (Airbnb) >> * Ashutosh Chauhan (Hortonworks) >> * Nishant Bangarwa (Hortonworks) >> * Slim Bouguerra (Hortonworks) >> * Priyank Shah (Hortonworks) >> * Sriharsha Chintalapani (Hortonworks) >> * Daniel Dai (Hortonworks) >> >> We realize that additional employer diversity is needed, and we will work >> aggressively to recruit developers from additional companies. >> >> === Alignment === >> The initial committers strongly believe that a system for interactive >> visualization of data will gain broader adoption as an open source, >> community driven project, where the community can contribute not only to >> the core components, but also to a growing collection of connectors, >> visualizations and improving integration a all potential data sources. >> Superset already integrates closely with Apache Hive, the Hive metastore, >> as well as most SQL-speaking databases found in modern data ecosystems. >> >> == Known Risks == >> >> === Orphaned Products === >> Superset is a vital component for both visualizing, accessing and >> democratizing data at Airbnb. Also at Hortonworks, Superset is a core >> component of the DataFlow product offering. Thus, the risk of the project >> being orphaned is relatively low. The project could be at risk if Airbnb >> changes their approach for democratizing data or if Hortonworks changes >> their strategy in the market. In such an event, the committers plan to >> continue working on the project on their own time, thought the progress >> will likely be slower. We plan to mitigate this risk by recruiting >> additional committers. >> >> === Inexperience with Open Source === >> The initial committers include veteran Apache members (committers and >> PPMC members) and other developers who have varying degrees of experience >> with open source projects. All have been involved with source code that has >> been released under an open source license, and several also have >> experience developing code with an open source development process. >> >> === Homogenous Developers === >> The initial committers are employed by Airbnb Inc. and Hortonworks. We >> are committed to recruiting additional committers from other companies. >> >> === Reliance on Salaried Developers === >> It is expected that Superset development will occur on both salaried time >> and on volunteer time, after hours. The majority of initial committers are >> paid by their employer to contribute to this project. However, they are all >> passionate about the project, and we are confident that the project will >> continue even if no salaried developers contribute to the project. We are >> committed to recruiting additional committers including non-salaried >> developers. >> >> === Relationships with Other Apache Products === >> To the knowledge of the Initial Committers, there are no direct >> competitors to Superset within the Apache Software Foundation. That said, >> Apache Zeppelin is an indirect competitor, but it solves a different use >> case. >> >> Apache Zeppelin is a web-based notebook that enables interactive data >> analytics. It enables the creation of beautiful data-driven, interactive >> and collaborative documents with SQL, Scala and more. Although a user can >> create data visualizations using this project, it leverages a notebook >> style user interfaces and it is geared towards the Spark community where >> Scala and SQL co-exist >> >> We look forward to collaborating with those communities, as well as other >> Apache communities. >> >> === An Excessive Fascination with the Apache Brand === >> Superset is solving two huge challenges: >> The challenge of enabling every knowledge worker to make data informed >> decisions, particularly those who are not deeply skilled at writing SQL. >> The challenge of visualizing huge amounts of data interactively and in >> real-time >> >> Superset was first developed as a data visualization solution for >> Druid.io as a way to visualize billions of rows of data. Since then, usage >> of Superset has expanded to address data visualization use cases across SQL >> speaking data sources as well. >> >> Our rationale for developing Superset as an Apache project is detailed in >> the Rationale Section. We believe that the Apache brand and community >> process will help us attract more contributors to this project, and help >> grow the footprint of the project through usage at other organizations and >> within other applications. Establishing consensus among users and >> developers will result in a more valuable tool for everyone. >> >> == Documentation == >> References to further reading material: >> * [[http://airbnb.io/superset/|Superset Documentation]] >> * [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat >> a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post: Superset: >> Airbnb’s Data Exploration Platform]] >> * [[https://medium.com/airbnb-engineering/superset-scaling-dat >> a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog >> Post: Superset: Scaling Data Access & Visual Insights at Airbnb]] >> >> == Initial Source == >> The origin of the proposed code base can be found at >> https://github.com/airbnb/superset. The code base is primarily in >> Python. >> >> == Source and Intellectual Property Submission Plan == >> We do not expect any complications for the submission of the Superset >> code base. Our code is already in Github and there is only a single code >> base. >> >> == External Dependencies == >> List of Python packages, from the Python Package Index (Pypi): >> >> * boto3 >> * celery >> * cryptography >> * flask-appbuilder >> * flask-cache >> * flask-migrate >> * flask-script >> * flask-sqlalchemy >> * flask-testing >> * humanize >> * gunicorn >> * markdown >> * pandas >> * parsedatetime >> * pydruid >> * PyHive >> * python-dateutil >> * requests >> * simplejson >> * six >> * sqlalchemy >> * sqlalchemy-utils >> * sqlparse >> * thrift >> * thrift-sasl >> * werkzeug >> >> List of Javascript packages, from NPM: >> * autobind-decorator >> * bootstrap >> * bootstrap-datepicker >> * brace >> * brfs >> * cal-heatmap >> * classnames >> * d3 >> * d3-cloud >> * d3-sankey >> * d3-scale >> * d3-tip >> * datamaps >> * datatables-bootstrap3-plugin >> * datatables.net-bs >> * font-awesome >> * gridster >> * immutability-helper >> * immutable >> * jquery >> * lodash.throttle >> * mapbox-gl >> * moment >> * moments >> * mustache >> * nvd3 >> * react >> * react-ace >> * react-bootstrap >> * react-bootstrap-table >> * react-dom >> * react-draggable >> * react-gravatar >> * react-grid-layout >> * react-map-gl >> * react-redux >> * react-resizable >> * react-select >> * react-syntax-highlighter >> * reactable >> * redux >> * redux-localstorage >> * redux-thunk >> * shortid >> * style-loader >> * supercluster >> * topojson >> * victory >> * viewport-mercator-project >> >> == Cryptography == >> The proposal does not include cryptographic code. >> >> == Required Resources == >> >> === Mailing List === >> There is a current mailing list as a Google Group “airbnb_superset” that >> we are planning on deprecating as the Apache.org become ready to serve our >> community. >> >> * superset-private >> * superset-dev >> * superset-user >> >> === Subversion Directory === >> Git is the preferred source control system. >> http://svn.apache.org/repos/asf/incubator/superset >> >> == Git Repository == >> Git is the preferred source control system, we’re assuming >> https://github.com/apache/incubator-superset based on the naming scheme >> >> == Issue Tracking == >> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues & >> PRs to manage our project as much as possible. It’s been said that there >> are ways to keep Github’s issues in sync with Jira, allowing us to get best >> of both worlds. If that is not possible, we will comply to using Jira. >> >> == Other Resources == >> We currently use a set of Github integrated services that are free to the >> open source community, like Travis-ci, Code Climate, Coveralls, >> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep using >> these services as they allow us to scale contributions and optimize our >> development flows. These services require some elevated rights on the >> Github repository in order to set up or tune and we would like for the >> committers to have the required rights. >> >> >> == Initial Committers == >> >> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer >> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer >> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer >> * Vera Liu <vera....@airbnb.com> - Committer >> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer >> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer >> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer >> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer >> * Priyank Shah <ps...@hortonworks.com> - Committer >> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer >> * Daniel Dai <da...@apache.org> - Champion & Committer >> * Luke Han <luke....@apache.org> - Mentor >> >> == Affiliations == >> The initial committers are employees of Airbnb Inc. and Hortonworks. >> >> == Sponsors == >> >> === Champion === >> Daniel Dai <da...@apache.org> >> >> === Nominated Mentors === >> * Ashutosh Chauhan <hashut...@apache.org> >> * Luke Han <luke....@apache.org> >> >> === Sponsoring Entity === >> Incubator PMC >> >> >