+1 happy to see you guys come on board! On Fri Feb 20 2015 at 12:40:42 AM Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote:
> Hi Everyone, > > OK, discussion has died down on this thread. I was originally > suggesting that the pTLP option may be best for this community, > but after some discussions with the existing community of > AsterixDB’ers proposing to bring the project here to the ASF, > AsterixDB would like to move forward independent of whatever > comes of the pTLP discussions. > > That said, I would like to propose Apache AsterixDB as an > Incubator project. I am now calling a VOTE to accept AsterixDB > into the Apache Incubator. This VOTE will run for at least 72 hours. > > [ ] +1 Accept Apache AsterixDB into the Incubator > [ ] +0 Don’t care. > [ ] -1 Don’t accept Apache AsterixDB into the Incubator because.. > > Thanks for the feedback so far and looking forward to the VOTE! > > You can count my binding +1. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: <Mattmann>, Chris Mattmann <chris.a.mattm...@jpl.nasa.gov> > Date: Wednesday, January 14, 2015 at 6:20 PM > To: "general@incubator.apache.org" <general@incubator.apache.org> > Cc: Michael Carey <dtab...@gmail.com>, Ian Maxon <ima...@uci.edu>, Till > Westmann <t...@westmann.org> > Subject: [PROPOSAL] Apache AsterixDB Incubator > > >Hi Folks, > > > >I am pleased to bring forth the Apache AsterixDB proposal to the > >Apache Incubator as Champion, working in collaboration with the > >team. Please find the wiki proposal here: > > > >https://wiki.apache.org/incubator/AsterixDBProposal > > > > > >Full text of the proposal is below. Please discuss and enjoy. I’ll > >leave the discussion open for a week, and then look to call a VOTE > >hopefully end of next week if all is well. > > > >Cheers! > >Chris Mattmann > > > >============================================================= > >Apache AsterixDB Proposal > > > >Abstract > > > >Apache AsterixDB is a scalable big data management system (BDMS) that > >provides storage, management, and query capabilities for large > >collections of semi-structured data. > > > >Proposal > > > >AsterixDB is a big data management system (BDMS) that makes it > >well-suited to needs such as web data warehousing and social data > >storage and analysis. Feature-wise, AsterixDB has: > > > >* A NoSQL style data model (ADM) based on extending JSON with object > > database concepts. > >* An expressive and declarative query language (AQL) for querying > > semi-structured data. > >* A runtime query execution engine, Hyracks, for partitioned-parallel > > execution of query plans. > >* Partitioned LSM-based data storage and indexing for efficient > > ingestion of newly arriving data. > >* Support for querying and indexing external data (e.g., in HDFS) as > > well as data stored within AsterixDB. > >* A rich set of primitive data types, including support for spatial, > > temporal, and textual data. > >* Indexing options that include B+ trees, R trees, and inverted > > keyword index support. > >* Basic transactional (concurrency and recovery) capabilities akin to > > those of a NoSQL store. > > > > > >Background and Rationale > > > >In the world of relational databases, the need to tackle data volumes > >that exceed the capabilities of a single server led to the > >development of “shared-nothing” parallel database systems several > >decades ago. These systems spread data over a cluster based on a > >partitioning strategy, such as hash partitioning, and queries are > >processed by employing partitioned-parallel divide-and-conquer > >techniques. Since these systems are fronted by a high-level, > >declarative language (SQL), their users are shielded from the > >complexities of parallel programming. Parallel database systems have > >been an extremely successful application of parallel computing, and > >quite a number of commercial products exist today. > > > >In the distributed systems world, the Web brought a need to index and > >query its huge content. SQL and relational databases were not the > >answer, though shared-nothing clusters again emerged as the hardware > >platform of choice. Google developed the Google File System (GFS) and > >MapReduce programming model to allow programmers to store and process > >Big Data by writing a few user-defined functions. The MapReduce > >framework applies these functions in parallel to data instances in > >distributed files (map) and to sorted groups of instances sharing a > >common key (reduce) -- not unlike the partitioned parallelism in > >parallel database systems. Apache's Hadoop MapReduce platform is the > >most prominent implementation of this paradigm for the rest of the > >Big Data community. On top of Hadoop and HDFS sit declarative > >languages like Pig and Hive that each compile down to Hadoop > >MapReduce jobs. > > > >The big Web companies were also challenged by extreme user bases > >(100s of millions of users) and needed fast simple lookups and > >updates to very large keyed data sets like user profiles. SQL > >databases were deemed either too expensive or not scalable, so the > >“NoSQL movement” was born. The ASF now has HBase and Cassandra, two > >popular key-value stores, in this space. MongoDB and Couchbase are > >other open source alternatives (document stores). > > > >It is evident from the rapidly growing popularity of "NoSQL" stores, > >as well as the strong demand for Big Data analytics engines today, > >that there is a strong (and growing!) need to store, process, *and* > >query large volumes of semi-structured data in many application > >areas. Until very recently, developers have had to ``choose'' between > >using big data analytics engines like Apache Hive or Apache Spark, > >which can do complex query processing and analysis over HDFS-resident > >files, and flexible but low-function data stores like MongoDB or > >Apache HBase. (The Apache Phoenix project, > >http://phoenix.apache.org/, is a recent SQL-over-HBase effort that > >aims to bridge between these choices.) > > > >AsterixDB is a highly scalable data management system that can store, > >index, and manage semi-structured data, e.g., much like MongoDB, but > >it also supports a full-power query language with the expressiveness > >of SQL (and more). Unlike analytics engines like Hive or Spark, it > >stores and manages data, so AsterixDB can exploit its knowledge of > >data partitioning and the availability of indexes to avoid always > >scanning data set(s) to process queries. Somewhat surprisingly, there > >is no open source parallel database system (relational or otherwise) > >available to developers today -- AsterixDB aims to fill this need. > >Since Apache is where the majority of the today's most important Big > >Data technologies live, the ASF seems like the obvious home for a > >system like AsterixDB. > > > >Current Status > > > >The current version of AsterixDB was co-developed by a team of > >faculty, staff, and students at UC Irvine and UC Riverside. The > >project was initiated as a large NSF-sponsored project in 2009, the > >goal of which was to combine the best ideas from the parallel > >database world, the then new Hadoop world, and the semi-structured > >(e.g., XML/JSON) data world in order to create a next-generation > >BDMS. A first informal open source release was made four years later, > >in June of 2013, under the Apache Software License 2.0. > > > > > >Meritocracy > > > >The current developers are familiar with meritocratic open source > >development at Apache. Apache was chosen specifically because we want > >to encourage this style of development for the project. > > > > > >Community > > > >While AsterixDB started as a university project it has developed into > >a community. A number of the initial committers started contributing > >in academia and continue to actively participate and contribute after > >graduation. And we seek to further develop developer and user > >communities. One way to broaden the community that is ongoing is > >through academic collaborations (currently with IIT Mumbai in India > >and TU Berlin in Germany). During incubation we will also explicitly > >seek increased industrial participation. > > > >Some indicators of the effort's development community and history can > >be > >found at: > >https://www.openhub.net/p/asterixdb/contributors?query=& > sort=commits_12_mo > >, > >https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo > > > > > >Core Developers > > > >The core developers of the project are diverse, although initially UC > >Irvine heavy (roughly 50) due to the project's origins at UCI. The > >other 50 are from other academic institutions (UC Riverside and the > >Hebrew University in Jerusalem) and companies (Couchbase, Facebook, > >IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software). > > > > > >Alignment > > > >Apache is, by far, the most natural home for taking the AsterixDB > >project forward. A large fraction of today's top Big Data > >technologies have their homes in Apache, including Hadoop, YARN, Pig, > >Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a > >significant gap -- the parallel data management system gap -- that > >exists in the Big Data open source world. It is well-aligned with a > >number of the Apache projects, e.g., it has strong support for > >accessing and indexing external data in HDFS, and it uses YARN as an > >answer to basic cluster resource management. AsterixDB also seeks to > >achieve an Apache-style development model; it is seeking a broader > >community of contributors and users in order to achieve its full > >potential and value to the Big Data community. > > > >There are also a number of related Apache projects and dependencies > >that will be mentioned below in the Relationships with Other Apache > >products section. > > > > > >Known Risks > > > >Orphaned products > > > >Given the current level of intellectual investment in AsterixDB, the > >risk of the project being abandoned is very small. The UCI/UCR > >faculty team leads are highly incentivized to continue development > >since the database groups at UC Irvine and UC Riverside are both > >reliant on AsterixDB as a platform for long-term graduate research > >projects. UC San Diego is also beginning to contribute to the code > >base, and a collaboration involving public health applications is > >forming with UCLA. The work on AsterixDB is managed via a mix of > >mailing list discussions supplemented by weekly project status > >meetings which are summarized on the mailing list. Typical (local > >plus Skype-in) attendance to the weekly status meetings runs at about > >20 active contributors. > > > > > >Inexperience with Open Source > > > >AsterixDB and Hyracks were completely developed in Open Source under > >the ASL 2.0. The source code repositories, issue tracker, and mailing > >lists are available on Google Code and discussions and decisions > >happen on the mailing lists (which is necessary due to the geographic > >distribution of the current developers). > > > >Also a few of the initial committers have contributed to Apache > >projects. Vinayak Borkar is a committer on the Apache Helix and > >Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF > >and an IPMC member. Preston Carman and Steven Jacobs are committers > >on the Apache VXQuery project. > > > > > >Relationships with Other Apache Products > > > >Apache VXQuery is based on the Hyracks data-parallel runtime, which > >is also included in the AsterixDB code base. > > > >AsterixDB is closely related to Apache Hadoop. Included in AsterixDB > >is support for accessing external data in HDFS (and Hive formats), > >and resource management and system administration features are in the > >process of being migrated to YARN. > > > >AsterixDB's AQL query facilities offer comparable query power to > >Apache's Pig and Hive systems for big data analytics. AsterixDB > >differs in storing and indexing data and thus being able to quickly > >answer small and medium queries without large HDFS data scans - > >thereby targeting a different class of use cases. > > > >AsterixDB's data storage and indexing facilities are similar to those > >of HBase, but AsterixDB differs in being a much more complete and > >queryable BDMS (not just a key-value style store). > > > >AsterixDB's target use cases are not in-memory processing or > >iterative algorithm support, making AsterixDB complementary to the > >Apache Spark platform. (Spark interoperability is on our longer-term > >to-do wishlist.) > > > > > >Homogeneous Developers > > > >As mentioned before the current community is already organizationally > >and geographically distributed - and we would like to increase the > >heterogeneity. > > > > > >Reliance on Salaried Developers > > > >Of the initial committers only 3 are full-time UCI staff. The other > >committers are a mix of students, alumni who continue to contribute > >to the effort, and individuals working with permission part-time (or > >in spare time) on this project. > > > > > >A Excessive Fascination with the Apache Brand > > > >We believe in the processes, systems, and framework Apache has put in > >place. Apache is also known to foster a great community around their > >projects and provide exposure. While brand is important, our > >fascination with it is not excessive. We believe that the ASF is the > >right home for AsterixDB and that having AsterixDB inside of the ASF > >will lead to a better long-term outcome for the Big Data community. > > > > > >Documentation > > > >Documentation and publications related to AsterixDB can be found at > >http://asterixdb.ics.uci.edu/. > > > > > >Initial Source > > > >Current source resides in Google code: > >https://code.google.com/p/asterixdb/ (query language and upper system > >layers) and https://code.google.com/p/hyracks/ (dataflow runtime > >system and storage management libraries). > > > > > >External Dependencies > > > >AsterixDB depends on a number of Apache projects: > > > >- Ant > >- Avro > >- ApacheDB JDO > >- Commons > >- Derby > >- Hadoop > >- Hive > >- HTTPComponents > >- Jakarta ORO > >- Maven > >- Tomcat > >- Thrift > >- Velocity > >- Wicket > >- Xerces > > > >and other open source projects (organized by license): > > > >-- ASL 2.0: > > - Jackson > > - Google Guava > > - Google Guice > > - JSON-simple > > - BoneCP > > - Microsoft Azure SDK > > - Netty > > - Rome > > - JetS3t > > - Groovy > > - Jettison > > - Plexus > > - Datanucleus (JDO) > > - Jetty > > - Twitter4J > > - Snappy-java > > > >-- BSD: > > - Antlr > > - ObjectWeb ASM > > - Protobuf > > - JSCH > > - JavaCC > > - Paranamer > > - JLine > > - Stax > > - StringTemplate > > - xmlEnc > > > >-- MIT > > - AppAssembler > > - SimpleLog4J > > > >-- CDDL 1.0 > > - Java Activation Framework > > - Java Transactions > > - Java Servlet API > > - Grizzly > > - gmbal > > - Glassfish > > > >-- CDDL 1.1 > > - Jersey > > - JAXB Reference Implementation > > > >-- JSON License > > - JSON > > > >-- EPL 1.0 > > - JUnit > > > >-- JDOM License > > - JDOM > > > >-- Public Domain > > - xz > > - AOPAlliance > > > >As all dependencies are managed using Apache Maven, none of the > >external libraries need to be packaged in a source distribution. > > > > > >Required Resources > > > >Developer and user mailing lists > > > >priv...@asterixdb.incubator.apache.org (with moderated subscriptions) > >comm...@asterixdb.incubator.apache.org > >d...@asterixdb.incubator.apache.org > >us...@asterixdb.incubator.apache.org > > > > > >A git repository > > > >https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git > > > > > >A JIRA issue tracker > > > >https://issues.apache.org/jira/browse/ASTERIXDB > > > > > >Initial Committers > > > >The following is a list of the planned initial Apache committers (the > >active subset of the committers for the current repository at Google > >code). > > > >Abdullah Alamoudi (bamou...@gmail.com) > >Cameron Samak (euf...@gmail.com) > >Chen Li (che...@gmail.com) > >Ian Maxon (ima...@uci.edu) > >Ildar Absalyamov (ildar.absalya...@gmail.com) > >Jianfeng Jia (jianfeng....@gmail.com) > >Karen Ouaknine (ker...@gmail.com) > >Markus Dreseler (apa...@dreseler.de) > >Mike Carey (dtab...@apache.org) > >Murtadha Hubail (hubail...@gmail.com) > >Pouria Pirzadeh (pouria.pirza...@gmail.com) > >Preston Carman (prest...@apache.org) > >Raman Grover (ramangrove...@gmail.com) > >Sattam Alsubaiee (salsuba...@gmail.com) > >Steven Jacobs (sjaco...@apache.org) > >Taewoo Kim (wangs...@gmail.com) > >Till Westmann (ti...@apache.org) > >Vinayak Borkar (vinay...@apache.org) > >Yingyi Bu (buyin...@gmail.com) > >Young-Seok Kim (kiss...@gmail.com) > >Zach Heilbron (zheilb...@gmail.com) > > > > > >Affiliations > > > >UC Irvine > >- Mike Carey > >- Chen Li > >- Ian Maxon > >- Yingyi Bu > >- Raman Grover > >- Pouria Pirzadeh > >- Young-Seok Kim > >- Cameron Samak > >- Taewoo Kim > >- Jianfeng Jia > >- Murtadha Hubail > >- Markus Dreseler > > > >UC Riverside > >- Ildar Absalyamov > >- Preston Carman > >- Steven Jacobs > > > >Hebrew University > >- Keren Ouaknine > > > >Oracle > >- Till Westmann > > > >X15 Software > >- Vinayak Borkar > >- Zach Heilbron > > > >KACST Saudi Arabia > >- Sattam Alsubaiee > > > >Saudi Aramco > >- Abdullah Alamoudi > > > >Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI > >(UC Irvine) and UCR (UC Riverside) affiliates being students. The > >non-UC committers are a mix of alumni who continue to contribute to > >the effort and individuals working with permission part-time (or in > >spare time) on this project. > > > > > >Sponsors > > > >Champion > > > >Chris Mattmann (NASA/JPL) > > > >Nominated Mentors > > > >TBD > > > >Sponsoring Entity > > > >The Apache Incubator > > > > > > > > > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Chris Mattmann, Ph.D. > >Chief Architect > >Instrument Software and Science Data Systems Section (398) > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >Office: 168-519, Mailstop: 168-527 > >Email: chris.a.mattm...@nasa.gov > >WWW: http://sunset.usc.edu/~mattmann/ > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Adjunct Associate Professor, Computer Science Department > >University of Southern California, Los Angeles, CA 90089 USA > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >