I missed this discussion. In your IP section, you list out: == Source and Intellectual Property Submission Plan == We do not expect any complications for the submission of the Superset code base. Our code is already in Github and there is only a single code base.
This IMHO not clear. Does Airbnb plan to submit a SGA for Superset, or expect that no SGA is required because its Apache licensed? John On Sun, Apr 2, 2017 at 4:09 PM Jeff Feng <jeff.f...@airbnb.com.invalid> wrote: > Dear Apache Incubator Community, > > We are excited to share our proposal for discussion and feedback for > entering Apache Incubation. Superset is an enterprise-ready web > application for data exploration, data visualization and dashboarding. > > Our Incubation proposal is at the following Wiki as well as copied in the > email below: > > https://wiki.apache.org/incubator/SupersetProposal > > We have an active Superset community including 400+ members and nearly 200 > topics. The Google Group can be found below. We plan to move the > discussion to the ASF: > > https://groups.google.com/forum/#!forum/airbnb_superset > > Thank you and look forward to the discussion! > > Jeff, Max & Alanna > > > > > = Superset = > > == Abstract == > > Superset is an enterprise-ready web application for data exploration, data > visualization and dashboarding. > > == Proposal == > > Superset is business intelligence (BI) software that helps modern > organizations visualize and interact with their data. Superset enables > users explore data from a variety of databases, assemble beautiful > dashboards and share their findings. Superset works neatly with all modern > SQL-speaking databases, and integrates with Druid.io to provide real-time, > interactive, blazing fast data access to large datasets. > > == Background == > > Data is mission critical. To succeed in this era, organizations need to > provide low-friction, intuitive and interactive access to data. It is > paramount for knowledge workers to be capable of answering their own > questions by querying, exploring and visualizing data. > > The entire business intelligence industry has pivoted from a model of > centralized top-down platforms driven by IT organizations to self-service > analytics and agile workflows by any user. This shift unblocks centralized > service bottlenecks for creating data visualizations while also creating an > environment that is iterative and fast-moving. This means that business > intelligence software must also be easy and delightful to use. > Self-service analytics doesn’t mean that admin and governance features are > not needed. > > Modern BI tools provide fine-grain access controls and auditing > capabilities to understand how data is being used. Superset is a solution > that delivers on all of these vectors. > > The technology stack is also constantly morphing - vendors are struggling > to provide cheap, quick and easy solutions to access data. Business > intelligence users are finding existing solutions lacking as these software > products either disregard or react slowly to recent game-changing > technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js, > React.js and iPython’s Jupyter for instance. > > == Rationale == > > Business intelligence is more relevant today than at any other point in > history. Organizations are currently very limited in options for open > source data visualization solutions, especially solutions that are both > self-service and enterprise-ready. Every company informing their decisions > with data needs a BI tool. > > We believe that Superset will be a strong compliment to existing Apache > Software Foundation technologies by offering scalable user interactions to > distributed storage and computation solutions. Users will often find that > Superset can act as a catalyst for tooling that can visualize the byproduct > of data and computation infrastructure. > > Superset has many key design elements that help fill a gap in current > solutions for organizations: > > * Easy, low friction access to data through a simple, web-based data > exploration interface. Composing charts and dashboards are intuitive. > Eliminating the need to write code or SQL empowers anyone to use it. > > * Access to a wide array of rich, interactive data visualization types. > > * Enterprise-ready: Integration with different authentication mechanisms > and granular permissions centered around actions and data access. > > * Realtime & fast: Superset provides realtime analytics at the speed of > thought on very large datasets when integrated with Druid.io. > > * Broad data access: Consume data out of any SQL-speaking relational > database. > > * Extensible: Can be extended to talk to many noSQL databases like Apache > Drill, Elastic Search, and other popular database engines. > > * Fast loading dashboards with configurable web-scale caching. > > * Plug-in framework that enables organizations to build custom analytical > applications with new UI/UX interfaces. > > * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users with > more flexibility. SQL Lab integrates with the visualization engine > seamlessly. > > == Initial Goals == > > The initial goals of the Superset project are several-fold: > > Move the existing codebase to Apache and integrate with the Apache > development process. > > Redesign the user interface and interaction model for creating > visualizations/dashboards and connecting to data sources > > Build robust support for security and governance of the tool including > popular authorization modules (including Apache Ranger and Apache Sentry) > and a more sophisticated permissions system > > Grow the extensibility of the project both in terms of enhanced > connectivity to NoSQL-based data sources and creating a plug-in framework > that enables organizations to build custom analytical applications which > require a new UI/UX > > == Current Status == > > By many standards, Superset is already a successful open source project. As > of March 2017, Superset is officially used in production at about a dozen > companies, has received contributions from over one hundred contributors on > Github, 1500+ forks, and 12k+ stars. > > Sizeable companies like Airbnb, Yahoo! and Hortonworks have made > significant contributions, and expressed their commitment to the project. > The product is feature complete and has been viable for months. It already > serves as the main interface for consuming data at many companies of > different sizes. > > While the product is usable, there’s room for improvement across the board, > starting with providing a smoother user experience around content creation, > making sure all features work out-of-the-box on more platforms and > databases, providing better user training guides and videos, having a > predictable release process, and increasing the overall quality of the > Superset releases. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. Several companies have expressed interest in > this project, and we intend to invite additional developers to participate. > We will encourage and monitor community participation so that privileges > can be extended to those that contribute. > > === Community === > > The need for an enterprise-ready data visualization and exploration > platform in the open source community is tremendous. While Superset is > fairly well known, recognized and used within the Druid.io community, > adoption is currently limited outside of that niche. There is a huge > opportunity to grow the community to hundreds if not thousands of > organizations, and we are hoping that embracing “the Apache way” will > accelerate the growth of our community. > > We have already been active at seeking and inviting contributions, and are > planning to scale the project by investing time and growing the support > structure to grow the community. > > === Core Developers === > > The initial committers for Superset include experienced full stack, > front-end and data engineers: > > * Maxime Beauchemin (Airbnb) > > * Alanna Scott (Airbnb) > > * Bogdan Kyryliuk (Airbnb) > > * Vera Liu (Airbnb) > > * Jeff Feng (Airbnb) > > * Ashutosh Chauhan (Hortonworks) > > * Nishant Bangarwa (Hortonworks) > > * Slim Bouguerra (Hortonworks) > > * Priyank Shah (Hortonworks) > > * Sriharsha Chintalapani (Hortonworks) > > * Daniel Dai (Hortonworks) > > We realize that additional employer diversity is needed, and we will work > aggressively to recruit developers from additional companies. > > === Alignment === > > The initial committers strongly believe that a system for interactive > visualization of data will gain broader adoption as an open source, > community driven project, where the community can contribute not only to > the core components, but also to a growing collection of connectors, > visualizations and improving integration a all potential data sources. > Superset already integrates closely with Apache Hive, the Hive metastore, > as well as most SQL-speaking databases found in modern data ecosystems. > > == Known Risks == > > === Orphaned Products === > > Superset is a vital component for both visualizing, accessing and > democratizing data at Airbnb. Also at Hortonworks, Superset is a core > component of the DataFlow product offering. Thus, the risk of the project > being orphaned is relatively low. The project could be at risk if Airbnb > changes their approach for democratizing data or if Hortonworks changes > their strategy in the market. In such an event, the committers plan to > continue working on the project on their own time, thought the progress > will likely be slower. We plan to mitigate this risk by recruiting > additional committers. > > === Inexperience with Open Source === > > The initial committers include veteran Apache members (committers and PMC > members) and other developers who have varying degrees of experience with > open source projects. All have been involved with source code that has been > released under an open source license, and several also have experience > developing code with an open source development process. > > === Homogenous Developers === > > The initial committers are employed by Airbnb Inc., and Hortonworks. We are > committed to recruiting additional committers from other companies. > > === Reliance on Salaried Developers === > > It is expected that Superset development will occur on both salaried time > and on volunteer time, after hours. The majority of initial committers are > paid by their employer to contribute to this project. However, they are all > passionate about the project, and we are confident that the project will > continue even if no salaried developers contribute to the project. We are > committed to recruiting additional committers including non-salaried > developers. > > === Relationships with Other Apache Products === > > To the knowledge of the Initial Committers, there are no direct competitors > to Superset within the Apache Software Foundation. That said, Apache > Zeppelin is an indirect competitor, but it solves a different use case. > > Apache Zeppelin is a web-based notebook that enables interactive data > analytics. It enables the creation of beautiful data-driven, interactive > and collaborative documents with SQL, Scala and more. Although a user can > create data visualizations using this project, it leverages a notebook > style user interfaces and it is geared towards the Spark community where > Scala and SQL co-exist > > We look forward to collaborating with those communities, as well as other > Apache communities. > > === An Excessive Fascination with the Apache Brand === > > Superset is solving two huge challenges: > > The challenge of enabling every knowledge worker to make data informed > decisions, particularly those who are not deeply skilled at writing SQL. > > The challenge of visualizing huge amounts of data interactively and in > real-time > > Superset was first developed as a data visualization solution for Druid.io > as a way to visualize billions of rows of data. Since then, usage of > Superset has expanded to address data visualization use cases across SQL > speaking data sources as well. > > Our rationale for developing Superset as an Apache project is detailed in > the Rationale Section. We believe that the Apache brand and community > process will help us attract more contributors to this project, and help > grow the footprint of the project through usage at other organizations and > within other applications. Establishing consensus among users and > developers will result in a more valuable tool for everyone. > > == Documentation == > > References to further reading material: > > * [[http://airbnb.io/superset/|Superset Documentation]] > > * [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat > a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post: Superset: > Airbnb’s Data Exploration Platform]] > > * [[https://medium.com/airbnb-engineering/superset-scaling-dat > a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog Post: > Superset: Scaling Data Access & Visual Insights at Airbnb]] > > == Initial Source == > > The origin of the proposed code base can be found at > https://github.com/airbnb/superset. The code base is primarily in Python. > > == Source and Intellectual Property Submission Plan == > > We do not expect any complications for the submission of the Superset code > base. Our code is already in Github and there is only a single code base. > > == External Dependencies == > > List of Python packages, from the Python Package Index (Pypi): > > * boto3 > > * celery > > * cryptography > > * flask-appbuilder > > * flask-cache > > * flask-migrate > > * flask-script > > * flask-sqlalchemy > > * flask-testing > > * humanize > > * gunicorn > > * markdown > > * pandas > > * parsedatetime > > * pydruid > > * PyHive > > * python-dateutil > > * requests > > * simplejson > > * six > > * sqlalchemy > > * sqlalchemy-utils > > * sqlparse > > * thrift > > * thrift-sasl > > * werkzeug > > List of Javascript packages, from NPM: > > * autobind-decorator > > * bootstrap > > * bootstrap-datepicker > > * brace > > * brfs > > * cal-heatmap > > * classnames > > * d3 > > * d3-cloud > > * d3-sankey > > * d3-scale > > * d3-tip > > * datamaps > > * datatables-bootstrap3-plugin > > * datatables.net-bs > > * font-awesome > > * gridster > > * immutability-helper > > * immutable > > * jquery > > * lodash.throttle > > * mapbox-gl > > * moment > > * moments > > * mustache > > * nvd3 > > * react > > * react-ace > > * react-bootstrap > > * react-bootstrap-table > > * react-dom > > * react-draggable > > * react-gravatar > > * react-grid-layout > > * react-map-gl > > * react-redux > > * react-resizable > > * react-select > > * react-syntax-highlighter > > * reactable > > * redux > > * redux-localstorage > > * redux-thunk > > * shortid > > * style-loader > > * supercluster > > * topojson > > * victory > > * viewport-mercator-project > > == Cryptography == > > The proposal does not include cryptographic code. > > == Required Resources == > > === Mailing List === > > There is a current mailing list as a Google Group “airbnb_superset” that we > are planning on deprecating as the Apache.org become ready to serve our > community. > > * superset-private > > * superset-dev > > * superset-user > > === Subversion Directory === > > Git is the preferred source control system. http://svn.apache.org/repos/as > f/incubator/superset <http://svn.apache.org/repos/asf/incubator/superset> > > == Git Repository == > > Git is the preferred source control system, we’re assuming > https://github.com/apache/incubator-superset based on the naming scheme > > == Issue Tracking == > > JIRA Superset (SUPERSET). If possible, we’d like to use Github issues & PRs > to manage our project as much as possible. It’s been said that there are > ways to keep Github’s issues in sync with Jira, allowing us to get best of > both worlds. If that is not possible, we will comply to using Jira. > > == Other Resources == > > We currently use a set of Github integrated services that are free to the > open source community, like Travis-ci, Code Climate, Coveralls, > Landscape.io, Requires.io, david-dm and Gitter. We would like to keep using > these services as they allow us to scale contributions and optimize our > development flows. These services require some elevated rights on the > Github repository in order to set up or tune and we would like for the > committers to have the required rights. > > > == Initial Committers == > > * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PMC & Committer > > * Alanna Scott <alanna.sc...@airbnb.com> - PMC & Committer > > * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PMC & Committer > > * Vera Liu <vera....@airbnb.com> - Committer > > * Jeff Feng <jeff.f...@airbnb.com> - PMC & Committer > > * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer > > * Nishant Bangarwa <nbanga...@hortonworks.com> - PMC & Committer > > * Slim Bouguerra <sbougue...@hortonworks.com> - Committer > > * Priyank Shah <ps...@hortonworks.com> - Committer > > * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer > > * Daniel Dai <da...@apache.org> - Champion & Committer > > == Affiliations == > > The initial committers are employees of Airbnb Inc. and Hortonworks. > > == Sponsors == > > === Champion === > > Daniel Dai <da...@apache.org> > > === Nominated Mentors === > > Ashutosh Chauhan <hashut...@apache.org> > > === Sponsoring Entity === > > Incubator PMC > > > -- > > *Jeff Feng* > Product Manager > m: (949)-610-5108 <(949)%20610-5108> <(949)%20610-5108> > twitter: @jtfeng >