+1 (binding) - Henry
On Thu, Feb 19, 2015 at 9:38 PM, Mattmann, Chris A (3980) <chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Everyone, > > OK, discussion has died down on this thread. I was originally > suggesting that the pTLP option may be best for this community, > but after some discussions with the existing community of > AsterixDB’ers proposing to bring the project here to the ASF, > AsterixDB would like to move forward independent of whatever > comes of the pTLP discussions. > > That said, I would like to propose Apache AsterixDB as an > Incubator project. I am now calling a VOTE to accept AsterixDB > into the Apache Incubator. This VOTE will run for at least 72 hours. > > [ ] +1 Accept Apache AsterixDB into the Incubator > [ ] +0 Don’t care. > [ ] -1 Don’t accept Apache AsterixDB into the Incubator because.. > > Thanks for the feedback so far and looking forward to the VOTE! > > You can count my binding +1. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: <Mattmann>, Chris Mattmann <chris.a.mattm...@jpl.nasa.gov> > Date: Wednesday, January 14, 2015 at 6:20 PM > To: "general@incubator.apache.org" <general@incubator.apache.org> > Cc: Michael Carey <dtab...@gmail.com>, Ian Maxon <ima...@uci.edu>, Till > Westmann <t...@westmann.org> > Subject: [PROPOSAL] Apache AsterixDB Incubator > >>Hi Folks, >> >>I am pleased to bring forth the Apache AsterixDB proposal to the >>Apache Incubator as Champion, working in collaboration with the >>team. Please find the wiki proposal here: >> >>https://wiki.apache.org/incubator/AsterixDBProposal >> >> >>Full text of the proposal is below. Please discuss and enjoy. I’ll >>leave the discussion open for a week, and then look to call a VOTE >>hopefully end of next week if all is well. >> >>Cheers! >>Chris Mattmann >> >>============================================================= >>Apache AsterixDB Proposal >> >>Abstract >> >>Apache AsterixDB is a scalable big data management system (BDMS) that >>provides storage, management, and query capabilities for large >>collections of semi-structured data. >> >>Proposal >> >>AsterixDB is a big data management system (BDMS) that makes it >>well-suited to needs such as web data warehousing and social data >>storage and analysis. Feature-wise, AsterixDB has: >> >>* A NoSQL style data model (ADM) based on extending JSON with object >> database concepts. >>* An expressive and declarative query language (AQL) for querying >> semi-structured data. >>* A runtime query execution engine, Hyracks, for partitioned-parallel >> execution of query plans. >>* Partitioned LSM-based data storage and indexing for efficient >> ingestion of newly arriving data. >>* Support for querying and indexing external data (e.g., in HDFS) as >> well as data stored within AsterixDB. >>* A rich set of primitive data types, including support for spatial, >> temporal, and textual data. >>* Indexing options that include B+ trees, R trees, and inverted >> keyword index support. >>* Basic transactional (concurrency and recovery) capabilities akin to >> those of a NoSQL store. >> >> >>Background and Rationale >> >>In the world of relational databases, the need to tackle data volumes >>that exceed the capabilities of a single server led to the >>development of “shared-nothing” parallel database systems several >>decades ago. These systems spread data over a cluster based on a >>partitioning strategy, such as hash partitioning, and queries are >>processed by employing partitioned-parallel divide-and-conquer >>techniques. Since these systems are fronted by a high-level, >>declarative language (SQL), their users are shielded from the >>complexities of parallel programming. Parallel database systems have >>been an extremely successful application of parallel computing, and >>quite a number of commercial products exist today. >> >>In the distributed systems world, the Web brought a need to index and >>query its huge content. SQL and relational databases were not the >>answer, though shared-nothing clusters again emerged as the hardware >>platform of choice. Google developed the Google File System (GFS) and >>MapReduce programming model to allow programmers to store and process >>Big Data by writing a few user-defined functions. The MapReduce >>framework applies these functions in parallel to data instances in >>distributed files (map) and to sorted groups of instances sharing a >>common key (reduce) -- not unlike the partitioned parallelism in >>parallel database systems. Apache's Hadoop MapReduce platform is the >>most prominent implementation of this paradigm for the rest of the >>Big Data community. On top of Hadoop and HDFS sit declarative >>languages like Pig and Hive that each compile down to Hadoop >>MapReduce jobs. >> >>The big Web companies were also challenged by extreme user bases >>(100s of millions of users) and needed fast simple lookups and >>updates to very large keyed data sets like user profiles. SQL >>databases were deemed either too expensive or not scalable, so the >>“NoSQL movement” was born. The ASF now has HBase and Cassandra, two >>popular key-value stores, in this space. MongoDB and Couchbase are >>other open source alternatives (document stores). >> >>It is evident from the rapidly growing popularity of "NoSQL" stores, >>as well as the strong demand for Big Data analytics engines today, >>that there is a strong (and growing!) need to store, process, *and* >>query large volumes of semi-structured data in many application >>areas. Until very recently, developers have had to ``choose'' between >>using big data analytics engines like Apache Hive or Apache Spark, >>which can do complex query processing and analysis over HDFS-resident >>files, and flexible but low-function data stores like MongoDB or >>Apache HBase. (The Apache Phoenix project, >>http://phoenix.apache.org/, is a recent SQL-over-HBase effort that >>aims to bridge between these choices.) >> >>AsterixDB is a highly scalable data management system that can store, >>index, and manage semi-structured data, e.g., much like MongoDB, but >>it also supports a full-power query language with the expressiveness >>of SQL (and more). Unlike analytics engines like Hive or Spark, it >>stores and manages data, so AsterixDB can exploit its knowledge of >>data partitioning and the availability of indexes to avoid always >>scanning data set(s) to process queries. Somewhat surprisingly, there >>is no open source parallel database system (relational or otherwise) >>available to developers today -- AsterixDB aims to fill this need. >>Since Apache is where the majority of the today's most important Big >>Data technologies live, the ASF seems like the obvious home for a >>system like AsterixDB. >> >>Current Status >> >>The current version of AsterixDB was co-developed by a team of >>faculty, staff, and students at UC Irvine and UC Riverside. The >>project was initiated as a large NSF-sponsored project in 2009, the >>goal of which was to combine the best ideas from the parallel >>database world, the then new Hadoop world, and the semi-structured >>(e.g., XML/JSON) data world in order to create a next-generation >>BDMS. A first informal open source release was made four years later, >>in June of 2013, under the Apache Software License 2.0. >> >> >>Meritocracy >> >>The current developers are familiar with meritocratic open source >>development at Apache. Apache was chosen specifically because we want >>to encourage this style of development for the project. >> >> >>Community >> >>While AsterixDB started as a university project it has developed into >>a community. A number of the initial committers started contributing >>in academia and continue to actively participate and contribute after >>graduation. And we seek to further develop developer and user >>communities. One way to broaden the community that is ongoing is >>through academic collaborations (currently with IIT Mumbai in India >>and TU Berlin in Germany). During incubation we will also explicitly >>seek increased industrial participation. >> >>Some indicators of the effort's development community and history can >>be >>found at: >>https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo >>, >>https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo >> >> >>Core Developers >> >>The core developers of the project are diverse, although initially UC >>Irvine heavy (roughly 50) due to the project's origins at UCI. The >>other 50 are from other academic institutions (UC Riverside and the >>Hebrew University in Jerusalem) and companies (Couchbase, Facebook, >>IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software). >> >> >>Alignment >> >>Apache is, by far, the most natural home for taking the AsterixDB >>project forward. A large fraction of today's top Big Data >>technologies have their homes in Apache, including Hadoop, YARN, Pig, >>Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a >>significant gap -- the parallel data management system gap -- that >>exists in the Big Data open source world. It is well-aligned with a >>number of the Apache projects, e.g., it has strong support for >>accessing and indexing external data in HDFS, and it uses YARN as an >>answer to basic cluster resource management. AsterixDB also seeks to >>achieve an Apache-style development model; it is seeking a broader >>community of contributors and users in order to achieve its full >>potential and value to the Big Data community. >> >>There are also a number of related Apache projects and dependencies >>that will be mentioned below in the Relationships with Other Apache >>products section. >> >> >>Known Risks >> >>Orphaned products >> >>Given the current level of intellectual investment in AsterixDB, the >>risk of the project being abandoned is very small. The UCI/UCR >>faculty team leads are highly incentivized to continue development >>since the database groups at UC Irvine and UC Riverside are both >>reliant on AsterixDB as a platform for long-term graduate research >>projects. UC San Diego is also beginning to contribute to the code >>base, and a collaboration involving public health applications is >>forming with UCLA. The work on AsterixDB is managed via a mix of >>mailing list discussions supplemented by weekly project status >>meetings which are summarized on the mailing list. Typical (local >>plus Skype-in) attendance to the weekly status meetings runs at about >>20 active contributors. >> >> >>Inexperience with Open Source >> >>AsterixDB and Hyracks were completely developed in Open Source under >>the ASL 2.0. The source code repositories, issue tracker, and mailing >>lists are available on Google Code and discussions and decisions >>happen on the mailing lists (which is necessary due to the geographic >>distribution of the current developers). >> >>Also a few of the initial committers have contributed to Apache >>projects. Vinayak Borkar is a committer on the Apache Helix and >>Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF >>and an IPMC member. Preston Carman and Steven Jacobs are committers >>on the Apache VXQuery project. >> >> >>Relationships with Other Apache Products >> >>Apache VXQuery is based on the Hyracks data-parallel runtime, which >>is also included in the AsterixDB code base. >> >>AsterixDB is closely related to Apache Hadoop. Included in AsterixDB >>is support for accessing external data in HDFS (and Hive formats), >>and resource management and system administration features are in the >>process of being migrated to YARN. >> >>AsterixDB's AQL query facilities offer comparable query power to >>Apache's Pig and Hive systems for big data analytics. AsterixDB >>differs in storing and indexing data and thus being able to quickly >>answer small and medium queries without large HDFS data scans - >>thereby targeting a different class of use cases. >> >>AsterixDB's data storage and indexing facilities are similar to those >>of HBase, but AsterixDB differs in being a much more complete and >>queryable BDMS (not just a key-value style store). >> >>AsterixDB's target use cases are not in-memory processing or >>iterative algorithm support, making AsterixDB complementary to the >>Apache Spark platform. (Spark interoperability is on our longer-term >>to-do wishlist.) >> >> >>Homogeneous Developers >> >>As mentioned before the current community is already organizationally >>and geographically distributed - and we would like to increase the >>heterogeneity. >> >> >>Reliance on Salaried Developers >> >>Of the initial committers only 3 are full-time UCI staff. The other >>committers are a mix of students, alumni who continue to contribute >>to the effort, and individuals working with permission part-time (or >>in spare time) on this project. >> >> >>A Excessive Fascination with the Apache Brand >> >>We believe in the processes, systems, and framework Apache has put in >>place. Apache is also known to foster a great community around their >>projects and provide exposure. While brand is important, our >>fascination with it is not excessive. We believe that the ASF is the >>right home for AsterixDB and that having AsterixDB inside of the ASF >>will lead to a better long-term outcome for the Big Data community. >> >> >>Documentation >> >>Documentation and publications related to AsterixDB can be found at >>http://asterixdb.ics.uci.edu/. >> >> >>Initial Source >> >>Current source resides in Google code: >>https://code.google.com/p/asterixdb/ (query language and upper system >>layers) and https://code.google.com/p/hyracks/ (dataflow runtime >>system and storage management libraries). >> >> >>External Dependencies >> >>AsterixDB depends on a number of Apache projects: >> >>- Ant >>- Avro >>- ApacheDB JDO >>- Commons >>- Derby >>- Hadoop >>- Hive >>- HTTPComponents >>- Jakarta ORO >>- Maven >>- Tomcat >>- Thrift >>- Velocity >>- Wicket >>- Xerces >> >>and other open source projects (organized by license): >> >>-- ASL 2.0: >> - Jackson >> - Google Guava >> - Google Guice >> - JSON-simple >> - BoneCP >> - Microsoft Azure SDK >> - Netty >> - Rome >> - JetS3t >> - Groovy >> - Jettison >> - Plexus >> - Datanucleus (JDO) >> - Jetty >> - Twitter4J >> - Snappy-java >> >>-- BSD: >> - Antlr >> - ObjectWeb ASM >> - Protobuf >> - JSCH >> - JavaCC >> - Paranamer >> - JLine >> - Stax >> - StringTemplate >> - xmlEnc >> >>-- MIT >> - AppAssembler >> - SimpleLog4J >> >>-- CDDL 1.0 >> - Java Activation Framework >> - Java Transactions >> - Java Servlet API >> - Grizzly >> - gmbal >> - Glassfish >> >>-- CDDL 1.1 >> - Jersey >> - JAXB Reference Implementation >> >>-- JSON License >> - JSON >> >>-- EPL 1.0 >> - JUnit >> >>-- JDOM License >> - JDOM >> >>-- Public Domain >> - xz >> - AOPAlliance >> >>As all dependencies are managed using Apache Maven, none of the >>external libraries need to be packaged in a source distribution. >> >> >>Required Resources >> >>Developer and user mailing lists >> >>priv...@asterixdb.incubator.apache.org (with moderated subscriptions) >>comm...@asterixdb.incubator.apache.org >>d...@asterixdb.incubator.apache.org >>us...@asterixdb.incubator.apache.org >> >> >>A git repository >> >>https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git >> >> >>A JIRA issue tracker >> >>https://issues.apache.org/jira/browse/ASTERIXDB >> >> >>Initial Committers >> >>The following is a list of the planned initial Apache committers (the >>active subset of the committers for the current repository at Google >>code). >> >>Abdullah Alamoudi (bamou...@gmail.com) >>Cameron Samak (euf...@gmail.com) >>Chen Li (che...@gmail.com) >>Ian Maxon (ima...@uci.edu) >>Ildar Absalyamov (ildar.absalya...@gmail.com) >>Jianfeng Jia (jianfeng....@gmail.com) >>Karen Ouaknine (ker...@gmail.com) >>Markus Dreseler (apa...@dreseler.de) >>Mike Carey (dtab...@apache.org) >>Murtadha Hubail (hubail...@gmail.com) >>Pouria Pirzadeh (pouria.pirza...@gmail.com) >>Preston Carman (prest...@apache.org) >>Raman Grover (ramangrove...@gmail.com) >>Sattam Alsubaiee (salsuba...@gmail.com) >>Steven Jacobs (sjaco...@apache.org) >>Taewoo Kim (wangs...@gmail.com) >>Till Westmann (ti...@apache.org) >>Vinayak Borkar (vinay...@apache.org) >>Yingyi Bu (buyin...@gmail.com) >>Young-Seok Kim (kiss...@gmail.com) >>Zach Heilbron (zheilb...@gmail.com) >> >> >>Affiliations >> >>UC Irvine >>- Mike Carey >>- Chen Li >>- Ian Maxon >>- Yingyi Bu >>- Raman Grover >>- Pouria Pirzadeh >>- Young-Seok Kim >>- Cameron Samak >>- Taewoo Kim >>- Jianfeng Jia >>- Murtadha Hubail >>- Markus Dreseler >> >>UC Riverside >>- Ildar Absalyamov >>- Preston Carman >>- Steven Jacobs >> >>Hebrew University >>- Keren Ouaknine >> >>Oracle >>- Till Westmann >> >>X15 Software >>- Vinayak Borkar >>- Zach Heilbron >> >>KACST Saudi Arabia >>- Sattam Alsubaiee >> >>Saudi Aramco >>- Abdullah Alamoudi >> >>Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI >>(UC Irvine) and UCR (UC Riverside) affiliates being students. The >>non-UC committers are a mix of alumni who continue to contribute to >>the effort and individuals working with permission part-time (or in >>spare time) on this project. >> >> >>Sponsors >> >>Champion >> >>Chris Mattmann (NASA/JPL) >> >>Nominated Mentors >> >>TBD >> >>Sponsoring Entity >> >>The Apache Incubator >> >> >> >> >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Chief Architect >>Instrument Software and Science Data Systems Section (398) >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>Office: 168-519, Mailstop: 168-527 >>Email: chris.a.mattm...@nasa.gov >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Adjunct Associate Professor, Computer Science Department >>University of Southern California, Los Angeles, CA 90089 USA >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org