+1 (binding) On Tue, Apr 25, 2017 at 4:52 PM, Jitendra Pandey <jiten...@hortonworks.com> wrote: > +1 (binding) > > On 4/25/17, 1:27 PM, "Julian Hyde" <jh...@apache.org> wrote: > > +1 binding > > > On Apr 25, 2017, at 12:48 PM, moon soo Lee <m...@apache.org> wrote: > > > > +1 (non-binding) > > > > On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <hashut...@apache.org> > > wrote: > > > >> +1 (binding) > >> > >> Thanks, > >> Ashutosh > >> > >> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com> wrote: > >> > >>> +1 binding > >>> > >>> Love to see Superset to be new incubator project. > >>> > >>> > >>> Best Regards! > >>> --------------------- > >>> > >>> Luke Han > >>> > >>> On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.f...@gmail.com> > wrote: > >>> > >>>> Dear Apache Incubator Community, > >>>> > >>>> We have updated the Superset proposal > >>>> <https://wiki.apache.org/incubator/SupersetProposal> (copied below) > for > >>>> > >>>> Apache Incubation with an additional mentor (Luke Han - > >>>> luke....@apache.org), > >>>> and would like to start a vote thread for acceptance into the > incubator. > >>>> > >>>> Our team is excited to share Superset with the Apache community and > we > >>>> hope > >>>> for the your continued support! > >>>> > >>>> Cheers, > >>>> Jeff & the Superset Team > >>>> > >>>> > >>>> > >>>> > >>>> = Superset = > >>>> > >>>> == Abstract == > >>>> Superset is an enterprise-ready web application for data exploration, > >> data > >>>> visualization and dashboarding. > >>>> > >>>> == Proposal == > >>>> Superset is business intelligence (BI) software that helps modern > >>>> organizations visualize and interact with their data. Superset > enables > >>>> users explore data from a variety of databases, assemble beautiful > >>>> dashboards and share their findings. Superset works neatly with all > >>>> modern > >>>> SQL-speaking databases, and integrates with Druid.io to provide > >> real-time, > >>>> interactive, blazing fast data access to large datasets. > >>>> > >>>> == Background == > >>>> Data is mission critical. To succeed in this era, organizations need > to > >>>> provide low-friction, intuitive and interactive access to data. It is > >>>> paramount for knowledge workers to be capable of answering their own > >>>> questions by querying, exploring and visualizing data. > >>>> > >>>> The entire business intelligence industry has pivoted from a model of > >>>> centralized top-down platforms driven by IT organizations to > >> self-service > >>>> analytics and agile workflows by any user. This shift unblocks > >>>> centralized > >>>> service bottlenecks for creating data visualizations while also > creating > >>>> an > >>>> environment that is iterative and fast-moving. This means that > business > >>>> intelligence software must also be easy and delightful to use. > >>>> Self-service analytics doesn’t mean that admin and governance > features > >> are > >>>> not needed. > >>>> Modern BI tools provide fine-grain access controls and auditing > >>>> capabilities to understand how data is being used. Superset is a > >> solution > >>>> that delivers on all of these vectors. > >>>> > >>>> The technology stack is also constantly morphing - vendors are > >> struggling > >>>> to provide cheap, quick and easy solutions to access data. Business > >>>> intelligence users are finding existing solutions lacking as these > >>>> software > >>>> products either disregard or react slowly to recent game-changing > >>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, > d3.js, > >>>> React.js and iPython’s Jupyter for instance. > >>>> > >>>> == Rationale == > >>>> Business intelligence is more relevant today than at any other point > in > >>>> history. Organizations are currently very limited in options for > open > >>>> source data visualization solutions, especially solutions that are > both > >>>> self-service and enterprise-ready. Every company informing their > >>>> decisions > >>>> with data needs a BI tool. > >>>> > >>>> We believe that Superset will be a strong compliment to existing > Apache > >>>> Software Foundation technologies by offering scalable user > interactions > >> to > >>>> distributed storage and computation solutions. Users will often find > >> that > >>>> Superset can act as a catalyst for tooling that can visualize the > >>>> byproduct > >>>> of data and computation infrastructure. > >>>> > >>>> Superset has many key design elements that help fill a gap in current > >>>> solutions for organizations: > >>>> * Easy, low friction access to data through a simple, web-based data > >>>> exploration interface. Composing charts and dashboards are > intuitive. > >>>> Eliminating the need to write code or SQL empowers anyone to use it. > >>>> * Access to a wide array of rich, interactive data visualization > types. > >>>> * Enterprise-ready: Integration with different authentication > >> mechanisms > >>>> and granular permissions centered around actions and data access. > >>>> * Realtime & fast: Superset provides realtime analytics at the speed > of > >>>> thought on very large datasets when integrated with Druid.io. > >>>> * Broad data access: Consume data out of any SQL-speaking relational > >>>> database. > >>>> * Extensible: Can be extended to talk to many noSQL databases like > >> Apache > >>>> Drill, Elastic Search, and other popular database engines. > >>>> * Fast loading dashboards with configurable web-scale caching. > >>>> * Plug-in framework that enables organizations to build custom > >> analytical > >>>> applications with new UI/UX interfaces. > >>>> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking > users > >>>> with more flexibility. SQL Lab integrates with the visualization > engine > >>>> seamlessly. > >>>> > >>>> == Initial Goals == > >>>> The initial goals of the Superset project are several-fold: > >>>> * Move the existing codebase to Apache and integrate with the Apache > >>>> development process. > >>>> * Redesign the user interface and interaction model for creating > >>>> visualizations/dashboards and connecting to data sources > >>>> * Build robust support for security and governance of the tool > >> including > >>>> popular authorization modules (including Apache Ranger and Apache > >> Sentry) > >>>> and a more sophisticated permissions system > >>>> * Grow the extensibility of the project both in terms of enhanced > >>>> connectivity to NoSQL-based data sources and creating a plug-in > >> framework > >>>> that enables organizations to build custom analytical applications > which > >>>> require a new UI/UX > >>>> > >>>> == Current Status == > >>>> By many standards, Superset is already a successful open source > project. > >>>> As > >>>> of March 2017, Superset is officially used in production at about a > >> dozen > >>>> companies, has received contributions from over one hundred > contributors > >>>> on > >>>> Github, 1500+ forks, and 12k+ stars. > >>>> > >>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made > >>>> significant contributions, and expressed their commitment to the > >> project. > >>>> The product is feature complete and has been viable for months. It > >> already > >>>> serves as the main interface for consuming data at many companies of > >>>> different sizes. > >>>> > >>>> While the product is usable, there’s room for improvement across the > >>>> board, > >>>> starting with providing a smoother user experience around content > >>>> creation, > >>>> making sure all features work out-of-the-box on more platforms and > >>>> databases, providing better user training guides and videos, having a > >>>> predictable release process, and increasing the overall quality of > the > >>>> Superset releases. > >>>> > >>>> === Meritocracy === > >>>> We plan to invest in supporting a meritocracy. We will discuss the > >>>> requirements in an open forum. Several companies have expressed > interest > >>>> in > >>>> this project, and we intend to invite additional developers to > >>>> participate. > >>>> We will encourage and monitor community participation so that > privileges > >>>> can be extended to those that contribute. > >>>> > >>>> === Community === > >>>> The need for an enterprise-ready data visualization and exploration > >>>> platform in the open source community is tremendous. While Superset > is > >>>> fairly well known, recognized and used within the Druid.io community, > >>>> adoption is currently limited outside of that niche. There is a huge > >>>> opportunity to grow the community to hundreds if not thousands of > >>>> organizations, and we are hoping that embracing “the Apache way” will > >>>> accelerate the growth of our community. > >>>> > >>>> We have already been active at seeking and inviting contributions, > and > >> are > >>>> planning to scale the project by investing time and growing the > support > >>>> structure to grow the community. > >>>> > >>>> === Core Developers === > >>>> The initial committers for Superset include experienced full stack, > >>>> front-end and data engineers: > >>>> * Maxime Beauchemin (Airbnb) > >>>> * Alanna Scott (Airbnb) > >>>> * Bogdan Kyryliuk (Airbnb) > >>>> * Vera Liu (Airbnb) > >>>> * Jeff Feng (Airbnb) > >>>> * Ashutosh Chauhan (Hortonworks) > >>>> * Nishant Bangarwa (Hortonworks) > >>>> * Slim Bouguerra (Hortonworks) > >>>> * Priyank Shah (Hortonworks) > >>>> * Sriharsha Chintalapani (Hortonworks) > >>>> * Daniel Dai (Hortonworks) > >>>> > >>>> We realize that additional employer diversity is needed, and we will > >> work > >>>> aggressively to recruit developers from additional companies. > >>>> > >>>> === Alignment === > >>>> The initial committers strongly believe that a system for interactive > >>>> visualization of data will gain broader adoption as an open source, > >>>> community driven project, where the community can contribute not > only to > >>>> the core components, but also to a growing collection of connectors, > >>>> visualizations and improving integration a all potential data > sources. > >>>> Superset already integrates closely with Apache Hive, the Hive > >> metastore, > >>>> as well as most SQL-speaking databases found in modern data > ecosystems. > >>>> > >>>> == Known Risks == > >>>> > >>>> === Orphaned Products === > >>>> Superset is a vital component for both visualizing, accessing and > >>>> democratizing data at Airbnb. Also at Hortonworks, Superset is a > core > >>>> component of the DataFlow product offering. Thus, the risk of the > >> project > >>>> being orphaned is relatively low. The project could be at risk if > >> Airbnb > >>>> changes their approach for democratizing data or if Hortonworks > changes > >>>> their strategy in the market. In such an event, the committers plan > to > >>>> continue working on the project on their own time, thought the > progress > >>>> will likely be slower. We plan to mitigate this risk by recruiting > >>>> additional committers. > >>>> > >>>> === Inexperience with Open Source === > >>>> The initial committers include veteran Apache members (committers and > >> PPMC > >>>> members) and other developers who have varying degrees of experience > >> with > >>>> open source projects. All have been involved with source code that > has > >>>> been > >>>> released under an open source license, and several also have > experience > >>>> developing code with an open source development process. > >>>> > >>>> === Homogenous Developers === > >>>> The initial committers are employed by Airbnb Inc. and Hortonworks. > We > >> are > >>>> committed to recruiting additional committers from other companies. > >>>> > >>>> === Reliance on Salaried Developers === > >>>> It is expected that Superset development will occur on both salaried > >> time > >>>> and on volunteer time, after hours. The majority of initial > committers > >> are > >>>> paid by their employer to contribute to this project. However, they > are > >>>> all > >>>> passionate about the project, and we are confident that the project > will > >>>> continue even if no salaried developers contribute to the project. We > >> are > >>>> committed to recruiting additional committers including non-salaried > >>>> developers. > >>>> > >>>> === Relationships with Other Apache Products === > >>>> To the knowledge of the Initial Committers, there are no direct > >>>> competitors > >>>> to Superset within the Apache Software Foundation. That said, Apache > >>>> Zeppelin is an indirect competitor, but it solves a different use > case. > >>>> > >>>> Apache Zeppelin is a web-based notebook that enables interactive data > >>>> analytics. It enables the creation of beautiful data-driven, > interactive > >>>> and collaborative documents with SQL, Scala and more. Although a > user > >> can > >>>> create data visualizations using this project, it leverages a > notebook > >>>> style user interfaces and it is geared towards the Spark community > where > >>>> Scala and SQL co-exist > >>>> > >>>> We look forward to collaborating with those communities, as well as > >> other > >>>> Apache communities. > >>>> > >>>> === An Excessive Fascination with the Apache Brand === > >>>> Superset is solving two huge challenges: > >>>> The challenge of enabling every knowledge worker to make data > informed > >>>> decisions, particularly those who are not deeply skilled at writing > SQL. > >>>> The challenge of visualizing huge amounts of data interactively and > in > >>>> real-time > >>>> > >>>> Superset was first developed as a data visualization solution for > >> Druid.io > >>>> as a way to visualize billions of rows of data. Since then, usage of > >>>> Superset has expanded to address data visualization use cases across > SQL > >>>> speaking data sources as well. > >>>> > >>>> Our rationale for developing Superset as an Apache project is > detailed > >> in > >>>> the Rationale Section. We believe that the Apache brand and > community > >>>> process will help us attract more contributors to this project, and > help > >>>> grow the footprint of the project through usage at other > organizations > >> and > >>>> within other applications. Establishing consensus among users and > >>>> developers will result in a more valuable tool for everyone. > >>>> > >>>> == Documentation == > >>>> References to further reading material: > >>>> * [[http://airbnb.io/superset/|Superset Documentation]] > >>>> * [[ > >>>> https://medium.com/airbnb-engineering/caravel-airbnb-s-data- > >>>> exploration-platform-15a72aa610e5#.npqmmbu25|Blog > >>>> Post: Superset: Airbnb’s Data Exploration Platform]] > >>>> * [[ > >>>> https://medium.com/airbnb-engineering/superset-scaling-data- > >>>> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog > >>>> Post: Superset: Scaling Data Access & Visual Insights at Airbnb]] > >>>> > >>>> == Initial Source == > >>>> The origin of the proposed code base can be found at > >>>> https://github.com/airbnb/superset. The code base is primarily in > >>>> Python. > >>>> > >>>> == Source and Intellectual Property Submission Plan == > >>>> We do not expect any complications for the submission of the Superset > >> code > >>>> base. Our code is already in Github and there is only a single code > >> base. > >>>> > >>>> == External Dependencies == > >>>> List of Python packages, from the Python Package Index (Pypi): > >>>> > >>>> * boto3 > >>>> * celery > >>>> * cryptography > >>>> * flask-appbuilder > >>>> * flask-cache > >>>> * flask-migrate > >>>> * flask-script > >>>> * flask-sqlalchemy > >>>> * flask-testing > >>>> * humanize > >>>> * gunicorn > >>>> * markdown > >>>> * pandas > >>>> * parsedatetime > >>>> * pydruid > >>>> * PyHive > >>>> * python-dateutil > >>>> * requests > >>>> * simplejson > >>>> * six > >>>> * sqlalchemy > >>>> * sqlalchemy-utils > >>>> * sqlparse > >>>> * thrift > >>>> * thrift-sasl > >>>> * werkzeug > >>>> > >>>> List of Javascript packages, from NPM: > >>>> * autobind-decorator > >>>> * bootstrap > >>>> * bootstrap-datepicker > >>>> * brace > >>>> * brfs > >>>> * cal-heatmap > >>>> * classnames > >>>> * d3 > >>>> * d3-cloud > >>>> * d3-sankey > >>>> * d3-scale > >>>> * d3-tip > >>>> * datamaps > >>>> * datatables-bootstrap3-plugin > >>>> * datatables.net-bs > >>>> * font-awesome > >>>> * gridster > >>>> * immutability-helper > >>>> * immutable > >>>> * jquery > >>>> * lodash.throttle > >>>> * mapbox-gl > >>>> * moment > >>>> * moments > >>>> * mustache > >>>> * nvd3 > >>>> * react > >>>> * react-ace > >>>> * react-bootstrap > >>>> * react-bootstrap-table > >>>> * react-dom > >>>> * react-draggable > >>>> * react-gravatar > >>>> * react-grid-layout > >>>> * react-map-gl > >>>> * react-redux > >>>> * react-resizable > >>>> * react-select > >>>> * react-syntax-highlighter > >>>> * reactable > >>>> * redux > >>>> * redux-localstorage > >>>> * redux-thunk > >>>> * shortid > >>>> * style-loader > >>>> * supercluster > >>>> * topojson > >>>> * victory > >>>> * viewport-mercator-project > >>>> > >>>> == Cryptography == > >>>> The proposal does not include cryptographic code. > >>>> > >>>> == Required Resources == > >>>> > >>>> === Mailing List === > >>>> There is a current mailing list as a Google Group “airbnb_superset” > that > >>>> we > >>>> are planning on deprecating as the Apache.org become ready to serve > our > >>>> community. > >>>> > >>>> * superset-private > >>>> * superset-dev > >>>> * superset-user > >>>> > >>>> === Subversion Directory === > >>>> Git is the preferred source control system. > >>>> http://svn.apache.org/repos/asf/incubator/superset > >>>> > >>>> == Git Repository == > >>>> Git is the preferred source control system, we’re assuming > >>>> https://github.com/apache/incubator-superset based on the naming > scheme > >>>> > >>>> == Issue Tracking == > >>>> JIRA Superset (SUPERSET). If possible, we’d like to use Github > issues & > >>>> PRs > >>>> to manage our project as much as possible. It’s been said that there > are > >>>> ways to keep Github’s issues in sync with Jira, allowing us to get > best > >> of > >>>> both worlds. If that is not possible, we will comply to using Jira. > >>>> > >>>> == Other Resources == > >>>> We currently use a set of Github integrated services that are free to > >> the > >>>> open source community, like Travis-ci, Code Climate, Coveralls, > >>>> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep > >>>> using > >>>> these services as they allow us to scale contributions and optimize > our > >>>> development flows. These services require some elevated rights on the > >>>> Github repository in order to set up or tune and we would like for > the > >>>> committers to have the required rights. > >>>> > >>>> > >>>> == Initial Committers == > >>>> > >>>> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer > >>>> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer > >>>> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer > >>>> * Vera Liu <vera....@airbnb.com> - Committer > >>>> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer > >>>> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer > >>>> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer > >>>> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer > >>>> * Priyank Shah <ps...@hortonworks.com> - Committer > >>>> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer > >>>> * Daniel Dai <da...@apache.org> - Champion & Committer > >>>> * Luke Han <luke....@apache.org> - Mentor > >>>> > >>>> == Affiliations == > >>>> The initial committers are employees of Airbnb Inc. and Hortonworks. > >>>> > >>>> == Sponsors == > >>>> > >>>> === Champion === > >>>> Daniel Dai <da...@apache.org> > >>>> > >>>> === Nominated Mentors === > >>>> * Ashutosh Chauhan <hashut...@apache.org> > >>>> * Luke Han <luke....@apache.org> > >>>> > >>>> === Sponsoring Entity === > >>>> Incubator PMC > >>>> > >>> > >>> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > > >
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org