Hi Henry,

thanks! It’s great that you’ve seen (and liked) AsterixDB before.

Even if your time is very limited we would be very happy to have you on board 
as a mentor.
I’ll add you to the proposal.

Cheers,
Till

> On Jan 19, 2015, at 10:26 AM, Henry Saputra <henry.sapu...@gmail.com> wrote:
> 
> +1 This is GREAT News!
> 
> Was watching and trying AsterixDB last year and looked in awesome shape.
> 
> I have my plate full but would love to help mentor this project to get
> it going to ASF if needed!
> 
> - Henry
> 
> On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
> <chris.a.mattm...@jpl.nasa.gov> wrote:
>> Hi Folks,
>> 
>> I am pleased to bring forth the Apache AsterixDB proposal to the
>> Apache Incubator as Champion, working in collaboration with the
>> team. Please find the wiki proposal here:
>> 
>> https://wiki.apache.org/incubator/AsterixDBProposal
>> 
>> 
>> Full text of the proposal is below. Please discuss and enjoy. I’ll
>> leave the discussion open for a week, and then look to call a VOTE
>> hopefully end of next week if all is well.
>> 
>> Cheers!
>> Chris Mattmann
>> 
>> =============================================================
>> Apache AsterixDB Proposal
>> 
>> Abstract
>> 
>> Apache AsterixDB is a scalable big data management system (BDMS) that
>> provides storage, management, and query capabilities for large
>> collections of semi-structured data.
>> 
>> Proposal
>> 
>> AsterixDB is a big data management system (BDMS) that makes it
>> well-suited to needs such as web data warehousing and social data
>> storage and analysis. Feature-wise, AsterixDB has:
>> 
>> * A NoSQL style data model (ADM) based on extending JSON with object
>>  database concepts.
>> * An expressive and declarative query language (AQL) for querying
>>  semi-structured data.
>> * A runtime query execution engine, Hyracks, for partitioned-parallel
>>  execution of query plans.
>> * Partitioned LSM-based data storage and indexing for efficient
>>  ingestion of newly arriving data.
>> * Support for querying and indexing external data (e.g., in HDFS) as
>>  well as data stored within AsterixDB.
>> * A rich set of primitive data types, including support for spatial,
>>  temporal, and textual data.
>> * Indexing options that include B+ trees, R trees, and inverted
>>  keyword index support.
>> * Basic transactional (concurrency and recovery) capabilities akin to
>>  those of a NoSQL store.
>> 
>> 
>> Background and Rationale
>> 
>> In the world of relational databases, the need to tackle data volumes
>> that exceed the capabilities of a single server led to the
>> development of “shared-nothing” parallel database systems several
>> decades ago. These systems spread data over a cluster based on a
>> partitioning strategy, such as hash partitioning, and queries are
>> processed by employing partitioned-parallel divide-and-conquer
>> techniques. Since these systems are fronted by a high-level,
>> declarative language (SQL), their users are shielded from the
>> complexities of parallel programming. Parallel database systems have
>> been an extremely successful application of parallel computing, and
>> quite a number of commercial products exist today.
>> 
>> In the distributed systems world, the Web brought a need to index and
>> query its huge content. SQL and relational databases were not the
>> answer, though shared-nothing clusters again emerged as the hardware
>> platform of choice. Google developed the Google File System (GFS) and
>> MapReduce programming model to allow programmers to store and process
>> Big Data by writing a few user-defined functions. The MapReduce
>> framework applies these functions in parallel to data instances in
>> distributed files (map) and to sorted groups of instances sharing a
>> common key (reduce) -- not unlike the partitioned parallelism in
>> parallel database systems. Apache's Hadoop MapReduce platform is the
>> most prominent implementation of this paradigm for the rest of the
>> Big Data community. On top of Hadoop and HDFS sit declarative
>> languages like Pig and Hive that each compile down to Hadoop
>> MapReduce jobs.
>> 
>> The big Web companies were also challenged by extreme user bases
>> (100s of millions of users) and needed fast simple lookups and
>> updates to very large keyed data sets like user profiles. SQL
>> databases were deemed either too expensive or not scalable, so the
>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
>> popular key-value stores, in this space. MongoDB and Couchbase are
>> other open source alternatives (document stores).
>> 
>> It is evident from the rapidly growing popularity of "NoSQL" stores,
>> as well as the strong demand for Big Data analytics engines today,
>> that there is a strong (and growing!) need to store, process, *and*
>> query large volumes of semi-structured data in many application
>> areas. Until very recently, developers have had to ``choose'' between
>> using big data analytics engines like Apache Hive or Apache Spark,
>> which can do complex query processing and analysis over HDFS-resident
>> files, and flexible but low-function data stores like MongoDB or
>> Apache HBase. (The Apache Phoenix project,
>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
>> aims to bridge between these choices.)
>> 
>> AsterixDB is a highly scalable data management system that can store,
>> index, and manage semi-structured data, e.g., much like MongoDB, but
>> it also supports a full-power query language with the expressiveness
>> of SQL (and more). Unlike analytics engines like Hive or Spark, it
>> stores and manages data, so AsterixDB can exploit its knowledge of
>> data partitioning and the availability of indexes to avoid always
>> scanning data set(s) to process queries. Somewhat surprisingly, there
>> is no open source parallel database system (relational or otherwise)
>> available to developers today -- AsterixDB aims to fill this need.
>> Since Apache is where the majority of the today's most important Big
>> Data technologies live, the ASF seems like the obvious home for a
>> system like AsterixDB.
>> 
>> Current Status
>> 
>> The current version of AsterixDB was co-developed by a team of
>> faculty, staff, and students at UC Irvine and UC Riverside. The
>> project was initiated as a large NSF-sponsored project in 2009, the
>> goal of which was to combine the best ideas from the parallel
>> database world, the then new Hadoop world, and the semi-structured
>> (e.g., XML/JSON) data world in order to create a next-generation
>> BDMS. A first informal open source release was made four years later,
>> in June of 2013, under the Apache Software License 2.0.
>> 
>> 
>> Meritocracy
>> 
>> The current developers are familiar with meritocratic open source
>> development at Apache. Apache was chosen specifically because we want
>> to encourage this style of development for the project.
>> 
>> 
>> Community
>> 
>> While AsterixDB started as a university project it has developed into
>> a community. A number of the initial committers started contributing
>> in academia and continue to actively participate and contribute after
>> graduation. And we seek to further develop developer and user
>> communities. One way to broaden the community that is ongoing is
>> through academic collaborations (currently with IIT Mumbai in India
>> and TU Berlin in Germany). During incubation we will also explicitly
>> seek increased industrial participation.
>> 
>> Some indicators of the effort's development community and history can
>> be
>> found at:
>> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo,
>> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo
>> 
>> 
>> Core Developers
>> 
>> The core developers of the project are diverse, although initially UC
>> Irvine heavy (roughly 50) due to the project's origins at UCI. The
>> other 50 are from other academic institutions (UC Riverside and the
>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook,
>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software).
>> 
>> 
>> Alignment
>> 
>> Apache is, by far, the most natural home for taking the AsterixDB
>> project forward. A large fraction of today's top Big Data
>> technologies have their homes in Apache, including Hadoop, YARN, Pig,
>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a
>> significant gap -- the parallel data management system gap -- that
>> exists in the Big Data open source world. It is well-aligned with a
>> number of the Apache projects, e.g., it has strong support for
>> accessing and indexing external data in HDFS, and it uses YARN as an
>> answer to basic cluster resource management. AsterixDB also seeks to
>> achieve an Apache-style development model; it is seeking a broader
>> community of contributors and users in order to achieve its full
>> potential and value to the Big Data community.
>> 
>> There are also a number of related Apache projects and dependencies
>> that will be mentioned below in the Relationships with Other Apache
>> products section.
>> 
>> 
>> Known Risks
>> 
>> Orphaned products
>> 
>> Given the current level of intellectual investment in AsterixDB, the
>> risk of the project being abandoned is very small. The UCI/UCR
>> faculty team leads are highly incentivized to continue development
>> since the database groups at UC Irvine and UC Riverside are both
>> reliant on AsterixDB as a platform for long-term graduate research
>> projects. UC San Diego is also beginning to contribute to the code
>> base, and a collaboration involving public health applications is
>> forming with UCLA. The work on AsterixDB is managed via a mix of
>> mailing list discussions supplemented by weekly project status
>> meetings which are summarized on the mailing list. Typical (local
>> plus Skype-in) attendance to the weekly status meetings runs at about
>> 20 active contributors.
>> 
>> 
>> Inexperience with Open Source
>> 
>> AsterixDB and Hyracks were completely developed in Open Source under
>> the ASL 2.0. The source code repositories, issue tracker, and mailing
>> lists are available on Google Code and discussions and decisions
>> happen on the mailing lists (which is necessary due to the geographic
>> distribution of the current developers).
>> 
>> Also a few of the initial committers have contributed to Apache
>> projects. Vinayak Borkar is a committer on the Apache Helix and
>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF
>> and an IPMC member. Preston Carman and Steven Jacobs are committers
>> on the Apache VXQuery project.
>> 
>> 
>> Relationships with Other Apache Products
>> 
>> Apache VXQuery is based on the Hyracks data-parallel runtime, which
>> is also included in the AsterixDB code base.
>> 
>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB
>> is support for accessing external data in HDFS (and Hive formats),
>> and resource management and system administration features are in the
>> process of being migrated to YARN.
>> 
>> AsterixDB's AQL query facilities offer comparable query power to
>> Apache's Pig and Hive systems for big data analytics. AsterixDB
>> differs in storing and indexing data and thus being able to quickly
>> answer small and medium queries without large HDFS data scans -
>> thereby targeting a different class of use cases.
>> 
>> AsterixDB's data storage and indexing facilities are similar to those
>> of HBase, but AsterixDB differs in being a much more complete and
>> queryable BDMS (not just a key-value style store).
>> 
>> AsterixDB's target use cases are not in-memory processing or
>> iterative algorithm support, making AsterixDB complementary to the
>> Apache Spark platform. (Spark interoperability is on our longer-term
>> to-do wishlist.)
>> 
>> 
>> Homogeneous Developers
>> 
>> As mentioned before the current community is already organizationally
>> and geographically distributed - and we would like to increase the
>> heterogeneity.
>> 
>> 
>> Reliance on Salaried Developers
>> 
>> Of the initial committers only 3 are full-time UCI staff. The other
>> committers are a mix of students, alumni who continue to contribute
>> to the effort, and individuals working with permission part-time (or
>> in spare time) on this project.
>> 
>> 
>> A Excessive Fascination with the Apache Brand
>> 
>> We believe in the processes, systems, and framework Apache has put in
>> place. Apache is also known to foster a great community around their
>> projects and provide exposure. While brand is important, our
>> fascination with it is not excessive. We believe that the ASF is the
>> right home for AsterixDB and that having AsterixDB inside of the ASF
>> will lead to a better long-term outcome for the Big Data community.
>> 
>> 
>> Documentation
>> 
>> Documentation and publications related to AsterixDB can be found at
>> http://asterixdb.ics.uci.edu/.
>> 
>> 
>> Initial Source
>> 
>> Current source resides in Google code:
>> https://code.google.com/p/asterixdb/ (query language and upper system
>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime
>> system and storage management libraries).
>> 
>> 
>> External Dependencies
>> 
>> AsterixDB depends on a number of Apache projects:
>> 
>> - Ant
>> - Avro
>> - ApacheDB JDO
>> - Commons
>> - Derby
>> - Hadoop
>> - Hive
>> - HTTPComponents
>> - Jakarta ORO
>> - Maven
>> - Tomcat
>> - Thrift
>> - Velocity
>> - Wicket
>> - Xerces
>> 
>> and other open source projects (organized by license):
>> 
>> -- ASL 2.0:
>> - Jackson
>> - Google Guava
>> - Google Guice
>> - JSON-simple
>> - BoneCP
>> - Microsoft Azure SDK
>> - Netty
>> - Rome
>> - JetS3t
>> - Groovy
>> - Jettison
>> - Plexus
>> - Datanucleus (JDO)
>> - Jetty
>> - Twitter4J
>> - Snappy-java
>> 
>> -- BSD:
>> - Antlr
>> - ObjectWeb ASM
>> - Protobuf
>> - JSCH
>> - JavaCC
>> - Paranamer
>> - JLine
>> - Stax
>> - StringTemplate
>> - xmlEnc
>> 
>> -- MIT
>> - AppAssembler
>> - SimpleLog4J
>> 
>> -- CDDL 1.0
>> - Java Activation Framework
>> - Java Transactions
>> - Java Servlet API
>> - Grizzly
>> - gmbal
>> - Glassfish
>> 
>> -- CDDL 1.1
>> - Jersey
>> - JAXB Reference Implementation
>> 
>> -- JSON License
>> - JSON
>> 
>> -- EPL 1.0
>> - JUnit
>> 
>> -- JDOM License
>> - JDOM
>> 
>> -- Public Domain
>> - xz
>> - AOPAlliance
>> 
>> As all dependencies are managed using Apache Maven, none of the
>> external libraries need to be packaged in a source distribution.
>> 
>> 
>> Required Resources
>> 
>> Developer and user mailing lists
>> 
>> priv...@asterixdb.incubator.apache.org (with moderated subscriptions)
>> comm...@asterixdb.incubator.apache.org
>> d...@asterixdb.incubator.apache.org
>> us...@asterixdb.incubator.apache.org
>> 
>> 
>> A git repository
>> 
>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git
>> 
>> 
>> A JIRA issue tracker
>> 
>> https://issues.apache.org/jira/browse/ASTERIXDB
>> 
>> 
>> Initial Committers
>> 
>> The following is a list of the planned initial Apache committers (the
>> active subset of the committers for the current repository at Google
>> code).
>> 
>> Abdullah Alamoudi (bamou...@gmail.com)
>> Cameron Samak (euf...@gmail.com)
>> Chen Li (che...@gmail.com)
>> Ian Maxon (ima...@uci.edu)
>> Ildar Absalyamov (ildar.absalya...@gmail.com)
>> Jianfeng Jia (jianfeng....@gmail.com)
>> Karen Ouaknine (ker...@gmail.com)
>> Markus Dreseler (apa...@dreseler.de)
>> Mike Carey (dtab...@apache.org)
>> Murtadha Hubail (hubail...@gmail.com)
>> Pouria Pirzadeh (pouria.pirza...@gmail.com)
>> Preston Carman (prest...@apache.org)
>> Raman Grover (ramangrove...@gmail.com)
>> Sattam Alsubaiee (salsuba...@gmail.com)
>> Steven Jacobs (sjaco...@apache.org)
>> Taewoo Kim (wangs...@gmail.com)
>> Till Westmann (ti...@apache.org)
>> Vinayak Borkar (vinay...@apache.org)
>> Yingyi Bu (buyin...@gmail.com)
>> Young-Seok Kim (kiss...@gmail.com)
>> Zach Heilbron (zheilb...@gmail.com)
>> 
>> 
>> Affiliations
>> 
>> UC Irvine
>> - Mike Carey
>> - Chen Li
>> - Ian Maxon
>> - Yingyi Bu
>> - Raman Grover
>> - Pouria Pirzadeh
>> - Young-Seok Kim
>> - Cameron Samak
>> - Taewoo Kim
>> - Jianfeng Jia
>> - Murtadha Hubail
>> - Markus Dreseler
>> 
>> UC Riverside
>> - Ildar Absalyamov
>> - Preston Carman
>> - Steven Jacobs
>> 
>> Hebrew University
>> - Keren Ouaknine
>> 
>> Oracle
>> - Till Westmann
>> 
>> X15 Software
>> - Vinayak Borkar
>> - Zach Heilbron
>> 
>> KACST Saudi Arabia
>> - Sattam Alsubaiee
>> 
>> Saudi Aramco
>> - Abdullah Alamoudi
>> 
>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI
>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The
>> non-UC committers are a mix of alumni who continue to contribute to
>> the effort and individuals working with permission part-time (or in
>> spare time) on this project.
>> 
>> 
>> Sponsors
>> 
>> Champion
>> 
>> Chris Mattmann (NASA/JPL)
>> 
>> Nominated Mentors
>> 
>> TBD
>> 
>> Sponsoring Entity
>> 
>> The Apache Incubator
>> 
>> 
>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to