+1 (non-binding) On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <hashut...@apache.org> wrote:
> +1 (binding) > > Thanks, > Ashutosh > > On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com> wrote: > > > +1 binding > > > > Love to see Superset to be new incubator project. > > > > > > Best Regards! > > --------------------- > > > > Luke Han > > > > On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.f...@gmail.com> wrote: > > > >> Dear Apache Incubator Community, > >> > >> We have updated the Superset proposal > >> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) for > >> > >> Apache Incubation with an additional mentor (Luke Han - > >> luke....@apache.org), > >> and would like to start a vote thread for acceptance into the incubator. > >> > >> Our team is excited to share Superset with the Apache community and we > >> hope > >> for the your continued support! > >> > >> Cheers, > >> Jeff & the Superset Team > >> > >> > >> > >> > >> = Superset = > >> > >> == Abstract == > >> Superset is an enterprise-ready web application for data exploration, > data > >> visualization and dashboarding. > >> > >> == Proposal == > >> Superset is business intelligence (BI) software that helps modern > >> organizations visualize and interact with their data. Superset enables > >> users explore data from a variety of databases, assemble beautiful > >> dashboards and share their findings. Superset works neatly with all > >> modern > >> SQL-speaking databases, and integrates with Druid.io to provide > real-time, > >> interactive, blazing fast data access to large datasets. > >> > >> == Background == > >> Data is mission critical. To succeed in this era, organizations need to > >> provide low-friction, intuitive and interactive access to data. It is > >> paramount for knowledge workers to be capable of answering their own > >> questions by querying, exploring and visualizing data. > >> > >> The entire business intelligence industry has pivoted from a model of > >> centralized top-down platforms driven by IT organizations to > self-service > >> analytics and agile workflows by any user. This shift unblocks > >> centralized > >> service bottlenecks for creating data visualizations while also creating > >> an > >> environment that is iterative and fast-moving. This means that business > >> intelligence software must also be easy and delightful to use. > >> Self-service analytics doesn’t mean that admin and governance features > are > >> not needed. > >> Modern BI tools provide fine-grain access controls and auditing > >> capabilities to understand how data is being used. Superset is a > solution > >> that delivers on all of these vectors. > >> > >> The technology stack is also constantly morphing - vendors are > struggling > >> to provide cheap, quick and easy solutions to access data. Business > >> intelligence users are finding existing solutions lacking as these > >> software > >> products either disregard or react slowly to recent game-changing > >> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js, > >> React.js and iPython’s Jupyter for instance. > >> > >> == Rationale == > >> Business intelligence is more relevant today than at any other point in > >> history. Organizations are currently very limited in options for open > >> source data visualization solutions, especially solutions that are both > >> self-service and enterprise-ready. Every company informing their > >> decisions > >> with data needs a BI tool. > >> > >> We believe that Superset will be a strong compliment to existing Apache > >> Software Foundation technologies by offering scalable user interactions > to > >> distributed storage and computation solutions. Users will often find > that > >> Superset can act as a catalyst for tooling that can visualize the > >> byproduct > >> of data and computation infrastructure. > >> > >> Superset has many key design elements that help fill a gap in current > >> solutions for organizations: > >> * Easy, low friction access to data through a simple, web-based data > >> exploration interface. Composing charts and dashboards are intuitive. > >> Eliminating the need to write code or SQL empowers anyone to use it. > >> * Access to a wide array of rich, interactive data visualization types. > >> * Enterprise-ready: Integration with different authentication > mechanisms > >> and granular permissions centered around actions and data access. > >> * Realtime & fast: Superset provides realtime analytics at the speed of > >> thought on very large datasets when integrated with Druid.io. > >> * Broad data access: Consume data out of any SQL-speaking relational > >> database. > >> * Extensible: Can be extended to talk to many noSQL databases like > Apache > >> Drill, Elastic Search, and other popular database engines. > >> * Fast loading dashboards with configurable web-scale caching. > >> * Plug-in framework that enables organizations to build custom > analytical > >> applications with new UI/UX interfaces. > >> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users > >> with more flexibility. SQL Lab integrates with the visualization engine > >> seamlessly. > >> > >> == Initial Goals == > >> The initial goals of the Superset project are several-fold: > >> * Move the existing codebase to Apache and integrate with the Apache > >> development process. > >> * Redesign the user interface and interaction model for creating > >> visualizations/dashboards and connecting to data sources > >> * Build robust support for security and governance of the tool > including > >> popular authorization modules (including Apache Ranger and Apache > Sentry) > >> and a more sophisticated permissions system > >> * Grow the extensibility of the project both in terms of enhanced > >> connectivity to NoSQL-based data sources and creating a plug-in > framework > >> that enables organizations to build custom analytical applications which > >> require a new UI/UX > >> > >> == Current Status == > >> By many standards, Superset is already a successful open source project. > >> As > >> of March 2017, Superset is officially used in production at about a > dozen > >> companies, has received contributions from over one hundred contributors > >> on > >> Github, 1500+ forks, and 12k+ stars. > >> > >> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made > >> significant contributions, and expressed their commitment to the > project. > >> The product is feature complete and has been viable for months. It > already > >> serves as the main interface for consuming data at many companies of > >> different sizes. > >> > >> While the product is usable, there’s room for improvement across the > >> board, > >> starting with providing a smoother user experience around content > >> creation, > >> making sure all features work out-of-the-box on more platforms and > >> databases, providing better user training guides and videos, having a > >> predictable release process, and increasing the overall quality of the > >> Superset releases. > >> > >> === Meritocracy === > >> We plan to invest in supporting a meritocracy. We will discuss the > >> requirements in an open forum. Several companies have expressed interest > >> in > >> this project, and we intend to invite additional developers to > >> participate. > >> We will encourage and monitor community participation so that privileges > >> can be extended to those that contribute. > >> > >> === Community === > >> The need for an enterprise-ready data visualization and exploration > >> platform in the open source community is tremendous. While Superset is > >> fairly well known, recognized and used within the Druid.io community, > >> adoption is currently limited outside of that niche. There is a huge > >> opportunity to grow the community to hundreds if not thousands of > >> organizations, and we are hoping that embracing “the Apache way” will > >> accelerate the growth of our community. > >> > >> We have already been active at seeking and inviting contributions, and > are > >> planning to scale the project by investing time and growing the support > >> structure to grow the community. > >> > >> === Core Developers === > >> The initial committers for Superset include experienced full stack, > >> front-end and data engineers: > >> * Maxime Beauchemin (Airbnb) > >> * Alanna Scott (Airbnb) > >> * Bogdan Kyryliuk (Airbnb) > >> * Vera Liu (Airbnb) > >> * Jeff Feng (Airbnb) > >> * Ashutosh Chauhan (Hortonworks) > >> * Nishant Bangarwa (Hortonworks) > >> * Slim Bouguerra (Hortonworks) > >> * Priyank Shah (Hortonworks) > >> * Sriharsha Chintalapani (Hortonworks) > >> * Daniel Dai (Hortonworks) > >> > >> We realize that additional employer diversity is needed, and we will > work > >> aggressively to recruit developers from additional companies. > >> > >> === Alignment === > >> The initial committers strongly believe that a system for interactive > >> visualization of data will gain broader adoption as an open source, > >> community driven project, where the community can contribute not only to > >> the core components, but also to a growing collection of connectors, > >> visualizations and improving integration a all potential data sources. > >> Superset already integrates closely with Apache Hive, the Hive > metastore, > >> as well as most SQL-speaking databases found in modern data ecosystems. > >> > >> == Known Risks == > >> > >> === Orphaned Products === > >> Superset is a vital component for both visualizing, accessing and > >> democratizing data at Airbnb. Also at Hortonworks, Superset is a core > >> component of the DataFlow product offering. Thus, the risk of the > project > >> being orphaned is relatively low. The project could be at risk if > Airbnb > >> changes their approach for democratizing data or if Hortonworks changes > >> their strategy in the market. In such an event, the committers plan to > >> continue working on the project on their own time, thought the progress > >> will likely be slower. We plan to mitigate this risk by recruiting > >> additional committers. > >> > >> === Inexperience with Open Source === > >> The initial committers include veteran Apache members (committers and > PPMC > >> members) and other developers who have varying degrees of experience > with > >> open source projects. All have been involved with source code that has > >> been > >> released under an open source license, and several also have experience > >> developing code with an open source development process. > >> > >> === Homogenous Developers === > >> The initial committers are employed by Airbnb Inc. and Hortonworks. We > are > >> committed to recruiting additional committers from other companies. > >> > >> === Reliance on Salaried Developers === > >> It is expected that Superset development will occur on both salaried > time > >> and on volunteer time, after hours. The majority of initial committers > are > >> paid by their employer to contribute to this project. However, they are > >> all > >> passionate about the project, and we are confident that the project will > >> continue even if no salaried developers contribute to the project. We > are > >> committed to recruiting additional committers including non-salaried > >> developers. > >> > >> === Relationships with Other Apache Products === > >> To the knowledge of the Initial Committers, there are no direct > >> competitors > >> to Superset within the Apache Software Foundation. That said, Apache > >> Zeppelin is an indirect competitor, but it solves a different use case. > >> > >> Apache Zeppelin is a web-based notebook that enables interactive data > >> analytics. It enables the creation of beautiful data-driven, interactive > >> and collaborative documents with SQL, Scala and more. Although a user > can > >> create data visualizations using this project, it leverages a notebook > >> style user interfaces and it is geared towards the Spark community where > >> Scala and SQL co-exist > >> > >> We look forward to collaborating with those communities, as well as > other > >> Apache communities. > >> > >> === An Excessive Fascination with the Apache Brand === > >> Superset is solving two huge challenges: > >> The challenge of enabling every knowledge worker to make data informed > >> decisions, particularly those who are not deeply skilled at writing SQL. > >> The challenge of visualizing huge amounts of data interactively and in > >> real-time > >> > >> Superset was first developed as a data visualization solution for > Druid.io > >> as a way to visualize billions of rows of data. Since then, usage of > >> Superset has expanded to address data visualization use cases across SQL > >> speaking data sources as well. > >> > >> Our rationale for developing Superset as an Apache project is detailed > in > >> the Rationale Section. We believe that the Apache brand and community > >> process will help us attract more contributors to this project, and help > >> grow the footprint of the project through usage at other organizations > and > >> within other applications. Establishing consensus among users and > >> developers will result in a more valuable tool for everyone. > >> > >> == Documentation == > >> References to further reading material: > >> * [[http://airbnb.io/superset/|Superset Documentation]] > >> * [[ > >> https://medium.com/airbnb-engineering/caravel-airbnb-s-data- > >> exploration-platform-15a72aa610e5#.npqmmbu25|Blog > >> Post: Superset: Airbnb’s Data Exploration Platform]] > >> * [[ > >> https://medium.com/airbnb-engineering/superset-scaling-data- > >> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog > >> Post: Superset: Scaling Data Access & Visual Insights at Airbnb]] > >> > >> == Initial Source == > >> The origin of the proposed code base can be found at > >> https://github.com/airbnb/superset. The code base is primarily in > >> Python. > >> > >> == Source and Intellectual Property Submission Plan == > >> We do not expect any complications for the submission of the Superset > code > >> base. Our code is already in Github and there is only a single code > base. > >> > >> == External Dependencies == > >> List of Python packages, from the Python Package Index (Pypi): > >> > >> * boto3 > >> * celery > >> * cryptography > >> * flask-appbuilder > >> * flask-cache > >> * flask-migrate > >> * flask-script > >> * flask-sqlalchemy > >> * flask-testing > >> * humanize > >> * gunicorn > >> * markdown > >> * pandas > >> * parsedatetime > >> * pydruid > >> * PyHive > >> * python-dateutil > >> * requests > >> * simplejson > >> * six > >> * sqlalchemy > >> * sqlalchemy-utils > >> * sqlparse > >> * thrift > >> * thrift-sasl > >> * werkzeug > >> > >> List of Javascript packages, from NPM: > >> * autobind-decorator > >> * bootstrap > >> * bootstrap-datepicker > >> * brace > >> * brfs > >> * cal-heatmap > >> * classnames > >> * d3 > >> * d3-cloud > >> * d3-sankey > >> * d3-scale > >> * d3-tip > >> * datamaps > >> * datatables-bootstrap3-plugin > >> * datatables.net-bs > >> * font-awesome > >> * gridster > >> * immutability-helper > >> * immutable > >> * jquery > >> * lodash.throttle > >> * mapbox-gl > >> * moment > >> * moments > >> * mustache > >> * nvd3 > >> * react > >> * react-ace > >> * react-bootstrap > >> * react-bootstrap-table > >> * react-dom > >> * react-draggable > >> * react-gravatar > >> * react-grid-layout > >> * react-map-gl > >> * react-redux > >> * react-resizable > >> * react-select > >> * react-syntax-highlighter > >> * reactable > >> * redux > >> * redux-localstorage > >> * redux-thunk > >> * shortid > >> * style-loader > >> * supercluster > >> * topojson > >> * victory > >> * viewport-mercator-project > >> > >> == Cryptography == > >> The proposal does not include cryptographic code. > >> > >> == Required Resources == > >> > >> === Mailing List === > >> There is a current mailing list as a Google Group “airbnb_superset” that > >> we > >> are planning on deprecating as the Apache.org become ready to serve our > >> community. > >> > >> * superset-private > >> * superset-dev > >> * superset-user > >> > >> === Subversion Directory === > >> Git is the preferred source control system. > >> http://svn.apache.org/repos/asf/incubator/superset > >> > >> == Git Repository == > >> Git is the preferred source control system, we’re assuming > >> https://github.com/apache/incubator-superset based on the naming scheme > >> > >> == Issue Tracking == > >> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues & > >> PRs > >> to manage our project as much as possible. It’s been said that there are > >> ways to keep Github’s issues in sync with Jira, allowing us to get best > of > >> both worlds. If that is not possible, we will comply to using Jira. > >> > >> == Other Resources == > >> We currently use a set of Github integrated services that are free to > the > >> open source community, like Travis-ci, Code Climate, Coveralls, > >> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep > >> using > >> these services as they allow us to scale contributions and optimize our > >> development flows. These services require some elevated rights on the > >> Github repository in order to set up or tune and we would like for the > >> committers to have the required rights. > >> > >> > >> == Initial Committers == > >> > >> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer > >> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer > >> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer > >> * Vera Liu <vera....@airbnb.com> - Committer > >> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer > >> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer > >> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer > >> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer > >> * Priyank Shah <ps...@hortonworks.com> - Committer > >> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer > >> * Daniel Dai <da...@apache.org> - Champion & Committer > >> * Luke Han <luke....@apache.org> - Mentor > >> > >> == Affiliations == > >> The initial committers are employees of Airbnb Inc. and Hortonworks. > >> > >> == Sponsors == > >> > >> === Champion === > >> Daniel Dai <da...@apache.org> > >> > >> === Nominated Mentors === > >> * Ashutosh Chauhan <hashut...@apache.org> > >> * Luke Han <luke....@apache.org> > >> > >> === Sponsoring Entity === > >> Incubator PMC > >> > > > > >