Re: [VOTE] Superset Proposal for Apache Incubator

Julian Hyde Thu, 27 Apr 2017 10:45:53 -0700

Re-affriming my vote:

+1 (binding)

> On Apr 26, 2017, at 11:12 PM, Jeff Feng <jeff.f...@gmail.com> wrote:
> 
> Hello everyone,
> 
> Thank you for checking out our proposal on Superset and for your
> consideration for the Apache Incubator.  So far, I believe we have 8
> binding votes and 2 non-binding votes.
> 
> As Taylor mentioned earlier, we made a minor update to the wording in the
> "Source and Intellectual Property Submission Plan" section based on a
> suggestion by John Ament.  The update was to help confirm the previously
> unstated assumption that we will submit an SGA.  I have copied the updated
> proposal from the wiki to the email below and highlighted (in yellow) the
> new sentence below in the document.
> 
> Folks on the cc line who have already voted, please let us know if the
> change impacts your vote.
> 
> Thank you all,
> Jeff
> 
> 
> 
> = Superset =
> 
> == Abstract ==
> Superset is an enterprise-ready web application for data exploration, data
> visualization and dashboarding.
> 
> == Proposal ==
> Superset is business intelligence (BI) software that helps modern
> organizations visualize and interact with their data. Superset enables
> users explore data from a variety of databases, assemble beautiful
> dashboards and share their findings.  Superset works neatly with all modern
> SQL-speaking databases, and integrates with Druid.io to provide real-time,
> interactive, blazing fast data access to large datasets.
> 
> == Background ==
> Data is mission critical. To succeed in this era, organizations need to
> provide low-friction, intuitive and interactive access to data. It is
> paramount for knowledge workers to be capable of answering their own
> questions by querying, exploring and visualizing data.
> 
> The entire business intelligence industry has pivoted from a model of
> centralized top-down platforms driven by IT organizations to self-service
> analytics and agile workflows by any user.  This shift unblocks centralized
> service bottlenecks for creating data visualizations while also creating an
> environment that is iterative and fast-moving.  This means that business
> intelligence software must also be easy and delightful to use.
> Self-service analytics doesn’t mean that admin and governance features are
> not needed.
> Modern BI tools provide fine-grain access controls and auditing
> capabilities to understand how data is being used.  Superset is a solution
> that delivers on all of these vectors.
> 
> The technology stack is also constantly morphing - vendors are struggling
> to provide cheap, quick and easy solutions to access data.  Business
> intelligence users are finding existing solutions lacking as these software
> products either disregard or react slowly to recent game-changing
> technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js,
> React.js and iPython’s Jupyter for instance.
> 
> == Rationale ==
> Business intelligence is more relevant today than at any other point in
> history.  Organizations are currently very limited in options for open
> source data visualization solutions, especially solutions that are both
> self-service and enterprise-ready.  Every company informing their decisions
> with data needs a BI tool.
> 
> We believe that Superset will be a strong compliment to existing Apache
> Software Foundation technologies by offering scalable user interactions to
> distributed storage and computation solutions.  Users will often find that
> Superset can act as a catalyst for tooling that can visualize the byproduct
> of data and computation infrastructure.
> 
> Superset has many key design elements that help fill a gap in current
> solutions for organizations:
> * Easy, low friction access to data through a simple, web-based data
> exploration interface.  Composing charts and dashboards are intuitive.
> Eliminating the need to write code or SQL empowers anyone to use it.
> * Access to a wide array of rich, interactive data visualization types.
> * Enterprise-ready: Integration with different authentication mechanisms
> and granular permissions centered around actions and data access.
> * Realtime & fast: Superset provides realtime analytics at the speed of
> thought on very large datasets when integrated with Druid.io.
> * Broad data access: Consume data out of any SQL-speaking relational
> database.
> * Extensible: Can be extended to talk to many noSQL databases like Apache
> Drill, Elastic Search, and other popular database engines.
> * Fast loading dashboards with configurable web-scale caching.
> * Plug-in framework that enables organizations to build custom analytical
> applications with new UI/UX interfaces.
> * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users
> with more flexibility.  SQL Lab integrates with the visualization engine
> seamlessly.
> 
> == Initial Goals ==
> The initial goals of the Superset project are several-fold:
> * Move the existing codebase to Apache and integrate with the Apache
> development process.
> * Redesign the user interface and interaction model for creating
> visualizations/dashboards and connecting to data sources
> * Build robust support for security and governance of the tool including
> popular authorization modules (including Apache Ranger and Apache Sentry)
> and a more sophisticated permissions system
> * Grow the extensibility of the project both in terms of enhanced
> connectivity to NoSQL-based data sources and creating a plug-in framework
> that enables organizations to build custom analytical applications which
> require a new UI/UX
> 
> == Current Status ==
> By many standards, Superset is already a successful open source project. As
> of March 2017, Superset is officially used in production at about a dozen
> companies, has received contributions from over one hundred contributors on
> Github, 1500+ forks, and 12k+ stars.
> 
> Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
> significant contributions, and expressed their commitment to the project.
> The product is feature complete and has been viable for months. It already
> serves as the main interface for consuming data at many companies of
> different sizes.
> 
> While the product is usable, there’s room for improvement across the board,
> starting with providing a smoother user experience around content creation,
> making sure all features work out-of-the-box on more platforms and
> databases, providing better user training guides and videos, having a
> predictable release process, and increasing the overall quality of the
> Superset releases.
> 
> === Meritocracy ===
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. Several companies have expressed interest in
> this project, and we intend to invite additional developers to participate.
> We will encourage and monitor community participation so that privileges
> can be extended to those that contribute.
> 
> === Community ===
> The need for an enterprise-ready data visualization and exploration
> platform in the open source community is tremendous.  While Superset is
> fairly well known, recognized and used within the Druid.io community,
> adoption is currently limited outside of that niche. There is a huge
> opportunity to grow the community to hundreds if not thousands of
> organizations, and we are hoping that embracing “the Apache way” will
> accelerate the growth of our community.
> 
> We have already been active at seeking and inviting contributions, and are
> planning to scale the project by investing time and growing the support
> structure to grow the community.
> 
> === Core Developers ===
> The initial committers for Superset include experienced full stack,
> front-end and data engineers:
> * Maxime Beauchemin (Airbnb)
> * Alanna Scott (Airbnb)
> * Bogdan Kyryliuk (Airbnb)
> * Vera Liu  (Airbnb)
> * Jeff Feng (Airbnb)
> * Ashutosh Chauhan (Hortonworks)
> * Nishant Bangarwa (Hortonworks)
> * Slim Bouguerra (Hortonworks)
> * Priyank Shah (Hortonworks)
> * Sriharsha Chintalapani (Hortonworks)
> * Daniel Dai (Hortonworks)
> 
> We realize that additional employer diversity is needed, and we will work
> aggressively to recruit developers from additional companies.
> 
> === Alignment ===
> The initial committers strongly believe that a system for interactive
> visualization of data will gain broader adoption as an open source,
> community driven project, where the community can contribute not only to
> the core components, but also to a growing collection of connectors,
> visualizations and improving integration a all potential data sources.
> Superset already integrates closely with Apache Hive, the Hive metastore,
> as well as most SQL-speaking databases found in modern data ecosystems.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> Superset is a vital component for both visualizing, accessing and
> democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
> component of the DataFlow product offering.  Thus, the risk of the project
> being orphaned is relatively low.  The project could be at risk if Airbnb
> changes their approach for democratizing data or if Hortonworks changes
> their strategy in the market.  In such an event, the committers plan to
> continue working on the project on their own time, thought the progress
> will likely be slower.  We plan to mitigate this risk by recruiting
> additional committers.
> 
> === Inexperience with Open Source ===
> The initial committers include veteran Apache members (committers and PPMC
> members) and other developers who have varying degrees of experience with
> open source projects. All have been involved with source code that has been
> released under an open source license, and several also have experience
> developing code with an open source development process.
> 
> === Homogenous Developers ===
> The initial committers are employed by Airbnb Inc. and Hortonworks. We are
> committed to recruiting additional committers from other companies.
> 
> === Reliance on Salaried Developers ===
> It is expected that Superset development will occur on both salaried time
> and on volunteer time, after hours. The majority of initial committers are
> paid by their employer to contribute to this project. However, they are all
> passionate about the project, and we are confident that the project will
> continue even if no salaried developers contribute to the project. We are
> committed to recruiting additional committers including non-salaried
> developers.
> 
> === Relationships with Other Apache Products ===
> To the knowledge of the Initial Committers, there are no direct competitors
> to Superset within the Apache Software Foundation.  That said, Apache
> Zeppelin is an indirect competitor, but it solves a different use case.
> 
> Apache Zeppelin is a web-based notebook that enables interactive data
> analytics. It enables the creation of beautiful data-driven, interactive
> and collaborative documents with SQL, Scala and more.  Although a user can
> create data visualizations using this project, it leverages a notebook
> style user interfaces and it is geared towards the Spark community where
> Scala and SQL co-exist
> 
> We look forward to collaborating with those communities, as well as other
> Apache communities.
> 
> === An Excessive Fascination with the Apache Brand ===
> Superset is solving two huge challenges:
> The challenge of enabling every knowledge worker to make data informed
> decisions, particularly those who are not deeply skilled at writing SQL.
> The challenge of visualizing huge amounts of data interactively and in
> real-time
> 
> Superset was first developed as a data visualization solution for Druid.io
> as a way to visualize billions of rows of data.  Since then, usage of
> Superset has expanded to address data visualization use cases across SQL
> speaking data sources as well.
> 
> Our rationale for developing Superset as an Apache project is detailed in
> the Rationale Section.  We believe that the Apache brand and community
> process will help us attract more contributors to this project, and help
> grow the footprint of the project through usage at other organizations and
> within other applications.  Establishing consensus among users and
> developers will result in a more valuable tool for everyone.
> 
> == Documentation ==
> References to further reading material:
> * [[http://airbnb.io/superset/|Superset Documentation]]
> * [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat
> a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post:  Superset:
> Airbnb’s Data Exploration Platform]]
> * [[https://medium.com/airbnb-engineering/superset-scaling-dat
> a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog Post:
> Superset: Scaling Data Access & Visual Insights at Airbnb]]
> 
> == Initial Source ==
> The origin of the proposed code base can be found at
> https://github.com/airbnb/superset.  The code base is primarily in Python.
> 
> == Source and Intellectual Property Submission Plan ==
> Airbnb will submit a Software Grant Agreement (SGA) as Superset joins the
> incubator. We do not expect any complications for the submission of the
> Superset code base.  Our code is already in Github and there is only a
> single code base.
> 
> == External Dependencies ==
> List of Python packages, from the Python Package Index (Pypi):
> 
> * boto3
> * celery
> * cryptography
> * flask-appbuilder
> * flask-cache
> * flask-migrate
> * flask-script
> * flask-sqlalchemy
> * flask-testing
> * humanize
> * gunicorn
> * markdown
> * pandas
> * parsedatetime
> * pydruid
> * PyHive
> * python-dateutil
> * requests
> * simplejson
> * six
> * sqlalchemy
> * sqlalchemy-utils
> * sqlparse
> * thrift
> * thrift-sasl
> * werkzeug
> 
> List of Javascript packages, from NPM:
> * autobind-decorator
> * bootstrap
> * bootstrap-datepicker
> * brace
> * brfs
> * cal-heatmap
> * classnames
> * d3
> * d3-cloud
> * d3-sankey
> * d3-scale
> * d3-tip
> * datamaps
> * datatables-bootstrap3-plugin
> * datatables.net-bs
> * font-awesome
> * gridster
> * immutability-helper
> * immutable
> * jquery
> * lodash.throttle
> * mapbox-gl
> * moment
> * moments
> * mustache
> * nvd3
> * react
> * react-ace
> * react-bootstrap
> * react-bootstrap-table
> * react-dom
> * react-draggable
> * react-gravatar
> * react-grid-layout
> * react-map-gl
> * react-redux
> * react-resizable
> * react-select
> * react-syntax-highlighter
> * reactable
> * redux
> * redux-localstorage
> * redux-thunk
> * shortid
> * style-loader
> * supercluster
> * topojson
> * victory
> * viewport-mercator-project
> 
> == Cryptography ==
> The proposal does not include cryptographic code.
> 
> == Required Resources ==
> 
> === Mailing List ===
> There is a current mailing list as a Google Group “airbnb_superset” that we
> are planning on deprecating as the Apache.org become ready to serve our
> community.
> 
> * superset-private
> * superset-dev
> * superset-user
> 
> === Subversion Directory ===
> Git is the preferred source control system. http://svn.apache.org/repos/as
> f/incubator/superset
> 
> == Git Repository ==
> Git is the preferred source control system, we’re assuming
> https://github.com/apache/incubator-superset based on the naming scheme
> 
> == Issue Tracking ==
> JIRA Superset (SUPERSET). If possible, we’d like to use Github issues & PRs
> to manage our project as much as possible. It’s been said that there are
> ways to keep Github’s issues in sync with Jira, allowing us to get best of
> both worlds. If that is not possible, we will comply to using Jira.
> 
> == Other Resources ==
> We currently use a set of Github integrated services that are free to the
> open source community, like Travis-ci, Code Climate, Coveralls,
> Landscape.io, Requires.io, david-dm and Gitter. We would like to keep using
> these services as they allow us to scale contributions and optimize our
> development flows. These services require some elevated rights on the
> Github repository in order to set up or tune and we would like for the
> committers to have the required rights.
> 
> 
> == Initial Committers ==
> 
> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC & Committer
> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC & Committer
> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC & Committer
> * Vera Liu <vera....@airbnb.com> - Committer
> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer
> * Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer
> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC & Committer
> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer
> * Priyank Shah <ps...@hortonworks.com> - Committer
> * Harsha Chintalapani <schintalap...@hortonworks.com> - Committer
> * Daniel Dai <da...@apache.org> - Champion & Committer
> * Luke Han <luke....@apache.org> - Mentor
> 
> == Affiliations ==
> The initial committers are employees of Airbnb Inc. and Hortonworks.
> 
> == Sponsors ==
> 
> === Champion ===
> Daniel Dai <da...@apache.org>
> 
> === Nominated Mentors ===
> * Ashutosh Chauhan <hashut...@apache.org>
> * Luke Han <luke....@apache.org>
> 
> === Sponsoring Entity ===
> Incubator PMC
> 
> 
> 
> 
> 
> On Wed, Apr 26, 2017 at 6:31 PM, Edward J. Yoon <edwardy...@apache.org>
> wrote:
> 
>> +1 binding
>> 
>> On Thu, Apr 27, 2017 at 10:29 AM, Naresh Agarwal
>> <naresh.agar...@gmail.com> wrote:
>>> +1 (non-binding).
>>> 
>>> Thanks
>>> Naresh Agarwal
>>> 
>>> On Thu, Apr 27, 2017 at 5:06 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>> 
>>>> +1 (binding)
>>>> 
>>>> 
>>>> 
>>>> On Tue, Apr 25, 2017 at 1:58 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>>> 
>>>>> +1 (binding)
>>>>> 
>>>>> On Tue, Apr 25, 2017 at 4:52 PM, Jitendra Pandey
>>>>> <jiten...@hortonworks.com> wrote:
>>>>>> +1 (binding)
>>>>>> 
>>>>>> On 4/25/17, 1:27 PM, "Julian Hyde" <jh...@apache.org> wrote:
>>>>>> 
>>>>>>    +1 binding
>>>>>> 
>>>>>>> On Apr 25, 2017, at 12:48 PM, moon soo Lee <m...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>> +1 (non-binding)
>>>>>>> 
>>>>>>> On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <
>>>>> hashut...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1 (binding)
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Ashutosh
>>>>>>>> 
>>>>>>>> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke...@gmail.com
>>> 
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1 binding
>>>>>>>>> 
>>>>>>>>> Love to see Superset to be new incubator project.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best Regards!
>>>>>>>>> ---------------------
>>>>>>>>> 
>>>>>>>>> Luke Han
>>>>>>>>> 
>>>>>>>>> On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <
>>>> jeff.f...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Dear Apache Incubator Community,
>>>>>>>>>> 
>>>>>>>>>> We have updated the Superset proposal
>>>>>>>>>> <https://wiki.apache.org/incubator/SupersetProposal>
>> (copied
>>>>> below) for
>>>>>>>>>> 
>>>>>>>>>> Apache Incubation with an additional mentor (Luke Han -
>>>>>>>>>> luke....@apache.org),
>>>>>>>>>> and would like to start a vote thread for acceptance into
>> the
>>>>> incubator.
>>>>>>>>>> 
>>>>>>>>>> Our team is excited to share Superset with the Apache
>>>> community
>>>>> and we
>>>>>>>>>> hope
>>>>>>>>>> for the your continued support!
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Jeff & the Superset Team
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> = Superset =
>>>>>>>>>> 
>>>>>>>>>> == Abstract ==
>>>>>>>>>> Superset is an enterprise-ready web application for data
>>>>> exploration,
>>>>>>>> data
>>>>>>>>>> visualization and dashboarding.
>>>>>>>>>> 
>>>>>>>>>> == Proposal ==
>>>>>>>>>> Superset is business intelligence (BI) software that helps
>>>>> modern
>>>>>>>>>> organizations visualize and interact with their data.
>> Superset
>>>>> enables
>>>>>>>>>> users explore data from a variety of databases, assemble
>>>>> beautiful
>>>>>>>>>> dashboards and share their findings.  Superset works neatly
>>>>> with all
>>>>>>>>>> modern
>>>>>>>>>> SQL-speaking databases, and integrates with Druid.io to
>>>> provide
>>>>>>>> real-time,
>>>>>>>>>> interactive, blazing fast data access to large datasets.
>>>>>>>>>> 
>>>>>>>>>> == Background ==
>>>>>>>>>> Data is mission critical. To succeed in this era,
>>>> organizations
>>>>> need to
>>>>>>>>>> provide low-friction, intuitive and interactive access to
>>>> data.
>>>>> It is
>>>>>>>>>> paramount for knowledge workers to be capable of answering
>>>>> their own
>>>>>>>>>> questions by querying, exploring and visualizing data.
>>>>>>>>>> 
>>>>>>>>>> The entire business intelligence industry has pivoted from
>> a
>>>>> model of
>>>>>>>>>> centralized top-down platforms driven by IT organizations
>> to
>>>>>>>> self-service
>>>>>>>>>> analytics and agile workflows by any user.  This shift
>>>> unblocks
>>>>>>>>>> centralized
>>>>>>>>>> service bottlenecks for creating data visualizations while
>>>> also
>>>>> creating
>>>>>>>>>> an
>>>>>>>>>> environment that is iterative and fast-moving.  This means
>>>> that
>>>>> business
>>>>>>>>>> intelligence software must also be easy and delightful to
>> use.
>>>>>>>>>> Self-service analytics doesn’t mean that admin and
>> governance
>>>>> features
>>>>>>>> are
>>>>>>>>>> not needed.
>>>>>>>>>> Modern BI tools provide fine-grain access controls and
>>>> auditing
>>>>>>>>>> capabilities to understand how data is being used.
>> Superset
>>>> is
>>>>> a
>>>>>>>> solution
>>>>>>>>>> that delivers on all of these vectors.
>>>>>>>>>> 
>>>>>>>>>> The technology stack is also constantly morphing - vendors
>> are
>>>>>>>> struggling
>>>>>>>>>> to provide cheap, quick and easy solutions to access data.
>>>>> Business
>>>>>>>>>> intelligence users are finding existing solutions lacking
>> as
>>>>> these
>>>>>>>>>> software
>>>>>>>>>> products either disregard or react slowly to recent
>>>>> game-changing
>>>>>>>>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache
>>>>> Kylin, d3.js,
>>>>>>>>>> React.js and iPython’s Jupyter for instance.
>>>>>>>>>> 
>>>>>>>>>> == Rationale ==
>>>>>>>>>> Business intelligence is more relevant today than at any
>> other
>>>>> point in
>>>>>>>>>> history.  Organizations are currently very limited in
>> options
>>>>> for open
>>>>>>>>>> source data visualization solutions, especially solutions
>> that
>>>>> are both
>>>>>>>>>> self-service and enterprise-ready.  Every company informing
>>>>> their
>>>>>>>>>> decisions
>>>>>>>>>> with data needs a BI tool.
>>>>>>>>>> 
>>>>>>>>>> We believe that Superset will be a strong compliment to
>>>>> existing Apache
>>>>>>>>>> Software Foundation technologies by offering scalable user
>>>>> interactions
>>>>>>>> to
>>>>>>>>>> distributed storage and computation solutions.  Users will
>>>>> often find
>>>>>>>> that
>>>>>>>>>> Superset can act as a catalyst for tooling that can
>> visualize
>>>>> the
>>>>>>>>>> byproduct
>>>>>>>>>> of data and computation infrastructure.
>>>>>>>>>> 
>>>>>>>>>> Superset has many key design elements that help fill a gap
>> in
>>>>> current
>>>>>>>>>> solutions for organizations:
>>>>>>>>>> * Easy, low friction access to data through a simple,
>>>> web-based
>>>>> data
>>>>>>>>>> exploration interface.  Composing charts and dashboards are
>>>>> intuitive.
>>>>>>>>>> Eliminating the need to write code or SQL empowers anyone
>> to
>>>>> use it.
>>>>>>>>>> * Access to a wide array of rich, interactive data
>>>>> visualization types.
>>>>>>>>>> * Enterprise-ready: Integration with different
>> authentication
>>>>>>>> mechanisms
>>>>>>>>>> and granular permissions centered around actions and data
>>>>> access.
>>>>>>>>>> * Realtime & fast: Superset provides realtime analytics at
>> the
>>>>> speed of
>>>>>>>>>> thought on very large datasets when integrated with
>> Druid.io.
>>>>>>>>>> * Broad data access: Consume data out of any SQL-speaking
>>>>> relational
>>>>>>>>>> database.
>>>>>>>>>> * Extensible: Can be extended to talk to many noSQL
>> databases
>>>>> like
>>>>>>>> Apache
>>>>>>>>>> Drill, Elastic Search, and other popular database engines.
>>>>>>>>>> * Fast loading dashboards with configurable web-scale
>> caching.
>>>>>>>>>> * Plug-in framework that enables organizations to build
>> custom
>>>>>>>> analytical
>>>>>>>>>> applications with new UI/UX interfaces.
>>>>>>>>>> * SQL Lab, a state-of-the-art SQL IDE that empowers
>>>>> SQL-speaking users
>>>>>>>>>> with more flexibility.  SQL Lab integrates with the
>>>>> visualization engine
>>>>>>>>>> seamlessly.
>>>>>>>>>> 
>>>>>>>>>> == Initial Goals ==
>>>>>>>>>> The initial goals of the Superset project are several-fold:
>>>>>>>>>> * Move the existing codebase to Apache and integrate with
>> the
>>>>> Apache
>>>>>>>>>> development process.
>>>>>>>>>> * Redesign the user interface and interaction model for
>>>> creating
>>>>>>>>>> visualizations/dashboards and connecting to data sources
>>>>>>>>>> * Build robust support for security and governance of the
>> tool
>>>>>>>> including
>>>>>>>>>> popular authorization modules (including Apache Ranger and
>>>>> Apache
>>>>>>>> Sentry)
>>>>>>>>>> and a more sophisticated permissions system
>>>>>>>>>> * Grow the extensibility of the project both in terms of
>>>>> enhanced
>>>>>>>>>> connectivity to NoSQL-based data sources and creating a
>>>> plug-in
>>>>>>>> framework
>>>>>>>>>> that enables organizations to build custom analytical
>>>>> applications which
>>>>>>>>>> require a new UI/UX
>>>>>>>>>> 
>>>>>>>>>> == Current Status ==
>>>>>>>>>> By many standards, Superset is already a successful open
>>>> source
>>>>> project.
>>>>>>>>>> As
>>>>>>>>>> of March 2017, Superset is officially used in production at
>>>>> about a
>>>>>>>> dozen
>>>>>>>>>> companies, has received contributions from over one hundred
>>>>> contributors
>>>>>>>>>> on
>>>>>>>>>> Github, 1500+ forks, and 12k+ stars.
>>>>>>>>>> 
>>>>>>>>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have
>>>> made
>>>>>>>>>> significant contributions, and expressed their commitment
>> to
>>>> the
>>>>>>>> project.
>>>>>>>>>> The product is feature complete and has been viable for
>>>> months.
>>>>> It
>>>>>>>> already
>>>>>>>>>> serves as the main interface for consuming data at many
>>>>> companies of
>>>>>>>>>> different sizes.
>>>>>>>>>> 
>>>>>>>>>> While the product is usable, there’s room for improvement
>>>>> across the
>>>>>>>>>> board,
>>>>>>>>>> starting with providing a smoother user experience around
>>>>> content
>>>>>>>>>> creation,
>>>>>>>>>> making sure all features work out-of-the-box on more
>> platforms
>>>>> and
>>>>>>>>>> databases, providing better user training guides and
>> videos,
>>>>> having a
>>>>>>>>>> predictable release process, and increasing the overall
>>>> quality
>>>>> of the
>>>>>>>>>> Superset releases.
>>>>>>>>>> 
>>>>>>>>>> === Meritocracy ===
>>>>>>>>>> We plan to invest in supporting a meritocracy. We will
>> discuss
>>>>> the
>>>>>>>>>> requirements in an open forum. Several companies have
>>>> expressed
>>>>> interest
>>>>>>>>>> in
>>>>>>>>>> this project, and we intend to invite additional
>> developers to
>>>>>>>>>> participate.
>>>>>>>>>> We will encourage and monitor community participation so
>> that
>>>>> privileges
>>>>>>>>>> can be extended to those that contribute.
>>>>>>>>>> 
>>>>>>>>>> === Community ===
>>>>>>>>>> The need for an enterprise-ready data visualization and
>>>>> exploration
>>>>>>>>>> platform in the open source community is tremendous.  While
>>>>> Superset is
>>>>>>>>>> fairly well known, recognized and used within the Druid.io
>>>>> community,
>>>>>>>>>> adoption is currently limited outside of that niche. There
>> is
>>>> a
>>>>> huge
>>>>>>>>>> opportunity to grow the community to hundreds if not
>> thousands
>>>>> of
>>>>>>>>>> organizations, and we are hoping that embracing “the Apache
>>>>> way” will
>>>>>>>>>> accelerate the growth of our community.
>>>>>>>>>> 
>>>>>>>>>> We have already been active at seeking and inviting
>>>>> contributions, and
>>>>>>>> are
>>>>>>>>>> planning to scale the project by investing time and growing
>>>> the
>>>>> support
>>>>>>>>>> structure to grow the community.
>>>>>>>>>> 
>>>>>>>>>> === Core Developers ===
>>>>>>>>>> The initial committers for Superset include experienced
>> full
>>>>> stack,
>>>>>>>>>> front-end and data engineers:
>>>>>>>>>> * Maxime Beauchemin (Airbnb)
>>>>>>>>>> * Alanna Scott (Airbnb)
>>>>>>>>>> * Bogdan Kyryliuk (Airbnb)
>>>>>>>>>> * Vera Liu  (Airbnb)
>>>>>>>>>> * Jeff Feng (Airbnb)
>>>>>>>>>> * Ashutosh Chauhan (Hortonworks)
>>>>>>>>>> * Nishant Bangarwa (Hortonworks)
>>>>>>>>>> * Slim Bouguerra (Hortonworks)
>>>>>>>>>> * Priyank Shah (Hortonworks)
>>>>>>>>>> * Sriharsha Chintalapani (Hortonworks)
>>>>>>>>>> * Daniel Dai (Hortonworks)
>>>>>>>>>> 
>>>>>>>>>> We realize that additional employer diversity is needed,
>> and
>>>> we
>>>>> will
>>>>>>>> work
>>>>>>>>>> aggressively to recruit developers from additional
>> companies.
>>>>>>>>>> 
>>>>>>>>>> === Alignment ===
>>>>>>>>>> The initial committers strongly believe that a system for
>>>>> interactive
>>>>>>>>>> visualization of data will gain broader adoption as an open
>>>>> source,
>>>>>>>>>> community driven project, where the community can
>> contribute
>>>>> not only to
>>>>>>>>>> the core components, but also to a growing collection of
>>>>> connectors,
>>>>>>>>>> visualizations and improving integration a all potential
>> data
>>>>> sources.
>>>>>>>>>> Superset already integrates closely with Apache Hive, the
>> Hive
>>>>>>>> metastore,
>>>>>>>>>> as well as most SQL-speaking databases found in modern data
>>>>> ecosystems.
>>>>>>>>>> 
>>>>>>>>>> == Known Risks ==
>>>>>>>>>> 
>>>>>>>>>> === Orphaned Products ===
>>>>>>>>>> Superset is a vital component for both visualizing,
>> accessing
>>>>> and
>>>>>>>>>> democratizing data at Airbnb.  Also at Hortonworks,
>> Superset
>>>> is
>>>>> a core
>>>>>>>>>> component of the DataFlow product offering.  Thus, the
>> risk of
>>>>> the
>>>>>>>> project
>>>>>>>>>> being orphaned is relatively low.  The project could be at
>>>> risk
>>>>> if
>>>>>>>> Airbnb
>>>>>>>>>> changes their approach for democratizing data or if
>>>> Hortonworks
>>>>> changes
>>>>>>>>>> their strategy in the market.  In such an event, the
>>>> committers
>>>>> plan to
>>>>>>>>>> continue working on the project on their own time, thought
>> the
>>>>> progress
>>>>>>>>>> will likely be slower.  We plan to mitigate this risk by
>>>>> recruiting
>>>>>>>>>> additional committers.
>>>>>>>>>> 
>>>>>>>>>> === Inexperience with Open Source ===
>>>>>>>>>> The initial committers include veteran Apache members
>>>>> (committers and
>>>>>>>> PPMC
>>>>>>>>>> members) and other developers who have varying degrees of
>>>>> experience
>>>>>>>> with
>>>>>>>>>> open source projects. All have been involved with source
>> code
>>>>> that has
>>>>>>>>>> been
>>>>>>>>>> released under an open source license, and several also
>> have
>>>>> experience
>>>>>>>>>> developing code with an open source development process.
>>>>>>>>>> 
>>>>>>>>>> === Homogenous Developers ===
>>>>>>>>>> The initial committers are employed by Airbnb Inc. and
>>>>> Hortonworks. We
>>>>>>>> are
>>>>>>>>>> committed to recruiting additional committers from other
>>>>> companies.
>>>>>>>>>> 
>>>>>>>>>> === Reliance on Salaried Developers ===
>>>>>>>>>> It is expected that Superset development will occur on both
>>>>> salaried
>>>>>>>> time
>>>>>>>>>> and on volunteer time, after hours. The majority of initial
>>>>> committers
>>>>>>>> are
>>>>>>>>>> paid by their employer to contribute to this project.
>> However,
>>>>> they are
>>>>>>>>>> all
>>>>>>>>>> passionate about the project, and we are confident that the
>>>>> project will
>>>>>>>>>> continue even if no salaried developers contribute to the
>>>>> project. We
>>>>>>>> are
>>>>>>>>>> committed to recruiting additional committers including
>>>>> non-salaried
>>>>>>>>>> developers.
>>>>>>>>>> 
>>>>>>>>>> === Relationships with Other Apache Products ===
>>>>>>>>>> To the knowledge of the Initial Committers, there are no
>>>> direct
>>>>>>>>>> competitors
>>>>>>>>>> to Superset within the Apache Software Foundation.  That
>> said,
>>>>> Apache
>>>>>>>>>> Zeppelin is an indirect competitor, but it solves a
>> different
>>>>> use case.
>>>>>>>>>> 
>>>>>>>>>> Apache Zeppelin is a web-based notebook that enables
>>>>> interactive data
>>>>>>>>>> analytics. It enables the creation of beautiful
>> data-driven,
>>>>> interactive
>>>>>>>>>> and collaborative documents with SQL, Scala and more.
>>>> Although
>>>>> a user
>>>>>>>> can
>>>>>>>>>> create data visualizations using this project, it
>> leverages a
>>>>> notebook
>>>>>>>>>> style user interfaces and it is geared towards the Spark
>>>>> community where
>>>>>>>>>> Scala and SQL co-exist
>>>>>>>>>> 
>>>>>>>>>> We look forward to collaborating with those communities, as
>>>>> well as
>>>>>>>> other
>>>>>>>>>> Apache communities.
>>>>>>>>>> 
>>>>>>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>>>>>>> Superset is solving two huge challenges:
>>>>>>>>>> The challenge of enabling every knowledge worker to make
>> data
>>>>> informed
>>>>>>>>>> decisions, particularly those who are not deeply skilled at
>>>>> writing SQL.
>>>>>>>>>> The challenge of visualizing huge amounts of data
>>>> interactively
>>>>> and in
>>>>>>>>>> real-time
>>>>>>>>>> 
>>>>>>>>>> Superset was first developed as a data visualization
>> solution
>>>>> for
>>>>>>>> Druid.io
>>>>>>>>>> as a way to visualize billions of rows of data.  Since
>> then,
>>>>> usage of
>>>>>>>>>> Superset has expanded to address data visualization use
>> cases
>>>>> across SQL
>>>>>>>>>> speaking data sources as well.
>>>>>>>>>> 
>>>>>>>>>> Our rationale for developing Superset as an Apache project
>> is
>>>>> detailed
>>>>>>>> in
>>>>>>>>>> the Rationale Section.  We believe that the Apache brand
>> and
>>>>> community
>>>>>>>>>> process will help us attract more contributors to this
>>>> project,
>>>>> and help
>>>>>>>>>> grow the footprint of the project through usage at other
>>>>> organizations
>>>>>>>> and
>>>>>>>>>> within other applications.  Establishing consensus among
>> users
>>>>> and
>>>>>>>>>> developers will result in a more valuable tool for
>> everyone.
>>>>>>>>>> 
>>>>>>>>>> == Documentation ==
>>>>>>>>>> References to further reading material:
>>>>>>>>>> * [[http://airbnb.io/superset/|Superset Documentation]]
>>>>>>>>>> * [[
>>>>>>>>>> https://medium.com/airbnb-engi
>> neering/caravel-airbnb-s-data-
>>>>>>>>>> exploration-platform-15a72aa610e5#.npqmmbu25|Blog
>>>>>>>>>> Post:  Superset: Airbnb’s Data Exploration Platform]]
>>>>>>>>>> * [[
>>>>>>>>>> https://medium.com/airbnb-engi
>> neering/superset-scaling-data-
>>>>>>>>>> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.
>>>>> a505zvb1t|Blog
>>>>>>>>>> Post:  Superset: Scaling Data Access & Visual Insights at
>>>>> Airbnb]]
>>>>>>>>>> 
>>>>>>>>>> == Initial Source ==
>>>>>>>>>> The origin of the proposed code base can be found at
>>>>>>>>>> https://github.com/airbnb/superset.  The code base is
>>>>> primarily in
>>>>>>>>>> Python.
>>>>>>>>>> 
>>>>>>>>>> == Source and Intellectual Property Submission Plan ==
>>>>>>>>>> We do not expect any complications for the submission of
>> the
>>>>> Superset
>>>>>>>> code
>>>>>>>>>> base.  Our code is already in Github and there is only a
>>>> single
>>>>> code
>>>>>>>> base.
>>>>>>>>>> 
>>>>>>>>>> == External Dependencies ==
>>>>>>>>>> List of Python packages, from the Python Package Index
>> (Pypi):
>>>>>>>>>> 
>>>>>>>>>> * boto3
>>>>>>>>>> * celery
>>>>>>>>>> * cryptography
>>>>>>>>>> * flask-appbuilder
>>>>>>>>>> * flask-cache
>>>>>>>>>> * flask-migrate
>>>>>>>>>> * flask-script
>>>>>>>>>> * flask-sqlalchemy
>>>>>>>>>> * flask-testing
>>>>>>>>>> * humanize
>>>>>>>>>> * gunicorn
>>>>>>>>>> * markdown
>>>>>>>>>> * pandas
>>>>>>>>>> * parsedatetime
>>>>>>>>>> * pydruid
>>>>>>>>>> * PyHive
>>>>>>>>>> * python-dateutil
>>>>>>>>>> * requests
>>>>>>>>>> * simplejson
>>>>>>>>>> * six
>>>>>>>>>> * sqlalchemy
>>>>>>>>>> * sqlalchemy-utils
>>>>>>>>>> * sqlparse
>>>>>>>>>> * thrift
>>>>>>>>>> * thrift-sasl
>>>>>>>>>> * werkzeug
>>>>>>>>>> 
>>>>>>>>>> List of Javascript packages, from NPM:
>>>>>>>>>> * autobind-decorator
>>>>>>>>>> * bootstrap
>>>>>>>>>> * bootstrap-datepicker
>>>>>>>>>> * brace
>>>>>>>>>> * brfs
>>>>>>>>>> * cal-heatmap
>>>>>>>>>> * classnames
>>>>>>>>>> * d3
>>>>>>>>>> * d3-cloud
>>>>>>>>>> * d3-sankey
>>>>>>>>>> * d3-scale
>>>>>>>>>> * d3-tip
>>>>>>>>>> * datamaps
>>>>>>>>>> * datatables-bootstrap3-plugin
>>>>>>>>>> * datatables.net-bs
>>>>>>>>>> * font-awesome
>>>>>>>>>> * gridster
>>>>>>>>>> * immutability-helper
>>>>>>>>>> * immutable
>>>>>>>>>> * jquery
>>>>>>>>>> * lodash.throttle
>>>>>>>>>> * mapbox-gl
>>>>>>>>>> * moment
>>>>>>>>>> * moments
>>>>>>>>>> * mustache
>>>>>>>>>> * nvd3
>>>>>>>>>> * react
>>>>>>>>>> * react-ace
>>>>>>>>>> * react-bootstrap
>>>>>>>>>> * react-bootstrap-table
>>>>>>>>>> * react-dom
>>>>>>>>>> * react-draggable
>>>>>>>>>> * react-gravatar
>>>>>>>>>> * react-grid-layout
>>>>>>>>>> * react-map-gl
>>>>>>>>>> * react-redux
>>>>>>>>>> * react-resizable
>>>>>>>>>> * react-select
>>>>>>>>>> * react-syntax-highlighter
>>>>>>>>>> * reactable
>>>>>>>>>> * redux
>>>>>>>>>> * redux-localstorage
>>>>>>>>>> * redux-thunk
>>>>>>>>>> * shortid
>>>>>>>>>> * style-loader
>>>>>>>>>> * supercluster
>>>>>>>>>> * topojson
>>>>>>>>>> * victory
>>>>>>>>>> * viewport-mercator-project
>>>>>>>>>> 
>>>>>>>>>> == Cryptography ==
>>>>>>>>>> The proposal does not include cryptographic code.
>>>>>>>>>> 
>>>>>>>>>> == Required Resources ==
>>>>>>>>>> 
>>>>>>>>>> === Mailing List ===
>>>>>>>>>> There is a current mailing list as a Google Group
>>>>> “airbnb_superset” that
>>>>>>>>>> we
>>>>>>>>>> are planning on deprecating as the Apache.org become ready
>> to
>>>>> serve our
>>>>>>>>>> community.
>>>>>>>>>> 
>>>>>>>>>> * superset-private
>>>>>>>>>> * superset-dev
>>>>>>>>>> * superset-user
>>>>>>>>>> 
>>>>>>>>>> === Subversion Directory ===
>>>>>>>>>> Git is the preferred source control system.
>>>>>>>>>> http://svn.apache.org/repos/asf/incubator/superset
>>>>>>>>>> 
>>>>>>>>>> == Git Repository ==
>>>>>>>>>> Git is the preferred source control system, we’re assuming
>>>>>>>>>> https://github.com/apache/incubator-superset based on the
>>>>> naming scheme
>>>>>>>>>> 
>>>>>>>>>> == Issue Tracking ==
>>>>>>>>>> JIRA Superset (SUPERSET). If possible, we’d like to use
>> Github
>>>>> issues &
>>>>>>>>>> PRs
>>>>>>>>>> to manage our project as much as possible. It’s been said
>> that
>>>>> there are
>>>>>>>>>> ways to keep Github’s issues in sync with Jira, allowing
>> us to
>>>>> get best
>>>>>>>> of
>>>>>>>>>> both worlds. If that is not possible, we will comply to
>> using
>>>>> Jira.
>>>>>>>>>> 
>>>>>>>>>> == Other Resources ==
>>>>>>>>>> We currently use a set of Github integrated services that
>> are
>>>>> free to
>>>>>>>> the
>>>>>>>>>> open source community, like Travis-ci, Code Climate,
>>>> Coveralls,
>>>>>>>>>> Landscape.io, Requires.io, david-dm and Gitter. We would
>> like
>>>>> to keep
>>>>>>>>>> using
>>>>>>>>>> these services as they allow us to scale contributions and
>>>>> optimize our
>>>>>>>>>> development flows. These services require some elevated
>> rights
>>>>> on the
>>>>>>>>>> Github repository in order to set up or tune and we would
>> like
>>>>> for the
>>>>>>>>>> committers to have the required rights.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> == Initial Committers ==
>>>>>>>>>> 
>>>>>>>>>> * Maxime Beauchemin <maxime.beauche...@airbnb.com> - PPMC
>> &
>>>>> Committer
>>>>>>>>>> * Alanna Scott <alanna.sc...@airbnb.com> - PPMC &
>> Committer
>>>>>>>>>> * Bogdan Kyryliuk <b.kyryl...@gmail.com> - PPMC &
>> Committer
>>>>>>>>>> * Vera Liu <vera....@airbnb.com> - Committer
>>>>>>>>>> * Jeff Feng <jeff.f...@airbnb.com> - PPMC & Committer
>>>>>>>>>> * Ashutosh Chauhan <hashut...@apache.org> - Mentor &
>>>> Committer
>>>>>>>>>> * Nishant Bangarwa <nbanga...@hortonworks.com> - PPMC &
>>>>> Committer
>>>>>>>>>> * Slim Bouguerra <sbougue...@hortonworks.com> - Committer
>>>>>>>>>> * Priyank Shah <ps...@hortonworks.com> - Committer
>>>>>>>>>> * Harsha Chintalapani <schintalap...@hortonworks.com> -
>>>>> Committer
>>>>>>>>>> * Daniel Dai <da...@apache.org> - Champion & Committer
>>>>>>>>>> * Luke Han <luke....@apache.org> - Mentor
>>>>>>>>>> 
>>>>>>>>>> == Affiliations ==
>>>>>>>>>> The initial committers are employees of Airbnb Inc. and
>>>>> Hortonworks.
>>>>>>>>>> 
>>>>>>>>>> == Sponsors ==
>>>>>>>>>> 
>>>>>>>>>> === Champion ===
>>>>>>>>>> Daniel Dai <da...@apache.org>
>>>>>>>>>> 
>>>>>>>>>> === Nominated Mentors ===
>>>>>>>>>> * Ashutosh Chauhan <hashut...@apache.org>
>>>>>>>>>> * Luke Han <luke....@apache.org>
>>>>>>>>>> 
>>>>>>>>>> === Sponsoring Entity ===
>>>>>>>>>> Incubator PMC
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>    ------------------------------------------------------------
>>>>> ---------
>>>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.
>> apache.org
>>>>>>    For additional commands, e-mail: general-help@incubator.apache.
>> org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> 
>> --
>> Best Regards, Edward J. Yoon
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Superset Proposal for Apache Incubator

Reply via email to