Thank you. So for we’ve added 3 slots for mentors on the proposal - I hope that’ll be sufficient even for the relatively large number of new committers.
Till > On Jan 19, 2015, at 8:17 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > > Thanks Till, > > Will try to solicit more mentors to help. > Especially with initial committers mostly have not been exposed to > contributing the Apache way. > > - Henry > > On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann <t...@westmann.org> wrote: >> Hi Henry, >> >> thanks! It’s great that you’ve seen (and liked) AsterixDB before. >> >> Even if your time is very limited we would be very happy to have you on >> board as a mentor. >> I’ll add you to the proposal. >> >> Cheers, >> Till >> >>> On Jan 19, 2015, at 10:26 AM, Henry Saputra <henry.sapu...@gmail.com> wrote: >>> >>> +1 This is GREAT News! >>> >>> Was watching and trying AsterixDB last year and looked in awesome shape. >>> >>> I have my plate full but would love to help mentor this project to get >>> it going to ASF if needed! >>> >>> - Henry >>> >>> On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) >>> <chris.a.mattm...@jpl.nasa.gov> wrote: >>>> Hi Folks, >>>> >>>> I am pleased to bring forth the Apache AsterixDB proposal to the >>>> Apache Incubator as Champion, working in collaboration with the >>>> team. Please find the wiki proposal here: >>>> >>>> https://wiki.apache.org/incubator/AsterixDBProposal >>>> >>>> >>>> Full text of the proposal is below. Please discuss and enjoy. I’ll >>>> leave the discussion open for a week, and then look to call a VOTE >>>> hopefully end of next week if all is well. >>>> >>>> Cheers! >>>> Chris Mattmann >>>> >>>> ============================================================= >>>> Apache AsterixDB Proposal >>>> >>>> Abstract >>>> >>>> Apache AsterixDB is a scalable big data management system (BDMS) that >>>> provides storage, management, and query capabilities for large >>>> collections of semi-structured data. >>>> >>>> Proposal >>>> >>>> AsterixDB is a big data management system (BDMS) that makes it >>>> well-suited to needs such as web data warehousing and social data >>>> storage and analysis. Feature-wise, AsterixDB has: >>>> >>>> * A NoSQL style data model (ADM) based on extending JSON with object >>>> database concepts. >>>> * An expressive and declarative query language (AQL) for querying >>>> semi-structured data. >>>> * A runtime query execution engine, Hyracks, for partitioned-parallel >>>> execution of query plans. >>>> * Partitioned LSM-based data storage and indexing for efficient >>>> ingestion of newly arriving data. >>>> * Support for querying and indexing external data (e.g., in HDFS) as >>>> well as data stored within AsterixDB. >>>> * A rich set of primitive data types, including support for spatial, >>>> temporal, and textual data. >>>> * Indexing options that include B+ trees, R trees, and inverted >>>> keyword index support. >>>> * Basic transactional (concurrency and recovery) capabilities akin to >>>> those of a NoSQL store. >>>> >>>> >>>> Background and Rationale >>>> >>>> In the world of relational databases, the need to tackle data volumes >>>> that exceed the capabilities of a single server led to the >>>> development of “shared-nothing” parallel database systems several >>>> decades ago. These systems spread data over a cluster based on a >>>> partitioning strategy, such as hash partitioning, and queries are >>>> processed by employing partitioned-parallel divide-and-conquer >>>> techniques. Since these systems are fronted by a high-level, >>>> declarative language (SQL), their users are shielded from the >>>> complexities of parallel programming. Parallel database systems have >>>> been an extremely successful application of parallel computing, and >>>> quite a number of commercial products exist today. >>>> >>>> In the distributed systems world, the Web brought a need to index and >>>> query its huge content. SQL and relational databases were not the >>>> answer, though shared-nothing clusters again emerged as the hardware >>>> platform of choice. Google developed the Google File System (GFS) and >>>> MapReduce programming model to allow programmers to store and process >>>> Big Data by writing a few user-defined functions. The MapReduce >>>> framework applies these functions in parallel to data instances in >>>> distributed files (map) and to sorted groups of instances sharing a >>>> common key (reduce) -- not unlike the partitioned parallelism in >>>> parallel database systems. Apache's Hadoop MapReduce platform is the >>>> most prominent implementation of this paradigm for the rest of the >>>> Big Data community. On top of Hadoop and HDFS sit declarative >>>> languages like Pig and Hive that each compile down to Hadoop >>>> MapReduce jobs. >>>> >>>> The big Web companies were also challenged by extreme user bases >>>> (100s of millions of users) and needed fast simple lookups and >>>> updates to very large keyed data sets like user profiles. SQL >>>> databases were deemed either too expensive or not scalable, so the >>>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two >>>> popular key-value stores, in this space. MongoDB and Couchbase are >>>> other open source alternatives (document stores). >>>> >>>> It is evident from the rapidly growing popularity of "NoSQL" stores, >>>> as well as the strong demand for Big Data analytics engines today, >>>> that there is a strong (and growing!) need to store, process, *and* >>>> query large volumes of semi-structured data in many application >>>> areas. Until very recently, developers have had to ``choose'' between >>>> using big data analytics engines like Apache Hive or Apache Spark, >>>> which can do complex query processing and analysis over HDFS-resident >>>> files, and flexible but low-function data stores like MongoDB or >>>> Apache HBase. (The Apache Phoenix project, >>>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that >>>> aims to bridge between these choices.) >>>> >>>> AsterixDB is a highly scalable data management system that can store, >>>> index, and manage semi-structured data, e.g., much like MongoDB, but >>>> it also supports a full-power query language with the expressiveness >>>> of SQL (and more). Unlike analytics engines like Hive or Spark, it >>>> stores and manages data, so AsterixDB can exploit its knowledge of >>>> data partitioning and the availability of indexes to avoid always >>>> scanning data set(s) to process queries. Somewhat surprisingly, there >>>> is no open source parallel database system (relational or otherwise) >>>> available to developers today -- AsterixDB aims to fill this need. >>>> Since Apache is where the majority of the today's most important Big >>>> Data technologies live, the ASF seems like the obvious home for a >>>> system like AsterixDB. >>>> >>>> Current Status >>>> >>>> The current version of AsterixDB was co-developed by a team of >>>> faculty, staff, and students at UC Irvine and UC Riverside. The >>>> project was initiated as a large NSF-sponsored project in 2009, the >>>> goal of which was to combine the best ideas from the parallel >>>> database world, the then new Hadoop world, and the semi-structured >>>> (e.g., XML/JSON) data world in order to create a next-generation >>>> BDMS. A first informal open source release was made four years later, >>>> in June of 2013, under the Apache Software License 2.0. >>>> >>>> >>>> Meritocracy >>>> >>>> The current developers are familiar with meritocratic open source >>>> development at Apache. Apache was chosen specifically because we want >>>> to encourage this style of development for the project. >>>> >>>> >>>> Community >>>> >>>> While AsterixDB started as a university project it has developed into >>>> a community. A number of the initial committers started contributing >>>> in academia and continue to actively participate and contribute after >>>> graduation. And we seek to further develop developer and user >>>> communities. One way to broaden the community that is ongoing is >>>> through academic collaborations (currently with IIT Mumbai in India >>>> and TU Berlin in Germany). During incubation we will also explicitly >>>> seek increased industrial participation. >>>> >>>> Some indicators of the effort's development community and history can >>>> be >>>> found at: >>>> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo, >>>> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo >>>> >>>> >>>> Core Developers >>>> >>>> The core developers of the project are diverse, although initially UC >>>> Irvine heavy (roughly 50) due to the project's origins at UCI. The >>>> other 50 are from other academic institutions (UC Riverside and the >>>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook, >>>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software). >>>> >>>> >>>> Alignment >>>> >>>> Apache is, by far, the most natural home for taking the AsterixDB >>>> project forward. A large fraction of today's top Big Data >>>> technologies have their homes in Apache, including Hadoop, YARN, Pig, >>>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a >>>> significant gap -- the parallel data management system gap -- that >>>> exists in the Big Data open source world. It is well-aligned with a >>>> number of the Apache projects, e.g., it has strong support for >>>> accessing and indexing external data in HDFS, and it uses YARN as an >>>> answer to basic cluster resource management. AsterixDB also seeks to >>>> achieve an Apache-style development model; it is seeking a broader >>>> community of contributors and users in order to achieve its full >>>> potential and value to the Big Data community. >>>> >>>> There are also a number of related Apache projects and dependencies >>>> that will be mentioned below in the Relationships with Other Apache >>>> products section. >>>> >>>> >>>> Known Risks >>>> >>>> Orphaned products >>>> >>>> Given the current level of intellectual investment in AsterixDB, the >>>> risk of the project being abandoned is very small. The UCI/UCR >>>> faculty team leads are highly incentivized to continue development >>>> since the database groups at UC Irvine and UC Riverside are both >>>> reliant on AsterixDB as a platform for long-term graduate research >>>> projects. UC San Diego is also beginning to contribute to the code >>>> base, and a collaboration involving public health applications is >>>> forming with UCLA. The work on AsterixDB is managed via a mix of >>>> mailing list discussions supplemented by weekly project status >>>> meetings which are summarized on the mailing list. Typical (local >>>> plus Skype-in) attendance to the weekly status meetings runs at about >>>> 20 active contributors. >>>> >>>> >>>> Inexperience with Open Source >>>> >>>> AsterixDB and Hyracks were completely developed in Open Source under >>>> the ASL 2.0. The source code repositories, issue tracker, and mailing >>>> lists are available on Google Code and discussions and decisions >>>> happen on the mailing lists (which is necessary due to the geographic >>>> distribution of the current developers). >>>> >>>> Also a few of the initial committers have contributed to Apache >>>> projects. Vinayak Borkar is a committer on the Apache Helix and >>>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF >>>> and an IPMC member. Preston Carman and Steven Jacobs are committers >>>> on the Apache VXQuery project. >>>> >>>> >>>> Relationships with Other Apache Products >>>> >>>> Apache VXQuery is based on the Hyracks data-parallel runtime, which >>>> is also included in the AsterixDB code base. >>>> >>>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB >>>> is support for accessing external data in HDFS (and Hive formats), >>>> and resource management and system administration features are in the >>>> process of being migrated to YARN. >>>> >>>> AsterixDB's AQL query facilities offer comparable query power to >>>> Apache's Pig and Hive systems for big data analytics. AsterixDB >>>> differs in storing and indexing data and thus being able to quickly >>>> answer small and medium queries without large HDFS data scans - >>>> thereby targeting a different class of use cases. >>>> >>>> AsterixDB's data storage and indexing facilities are similar to those >>>> of HBase, but AsterixDB differs in being a much more complete and >>>> queryable BDMS (not just a key-value style store). >>>> >>>> AsterixDB's target use cases are not in-memory processing or >>>> iterative algorithm support, making AsterixDB complementary to the >>>> Apache Spark platform. (Spark interoperability is on our longer-term >>>> to-do wishlist.) >>>> >>>> >>>> Homogeneous Developers >>>> >>>> As mentioned before the current community is already organizationally >>>> and geographically distributed - and we would like to increase the >>>> heterogeneity. >>>> >>>> >>>> Reliance on Salaried Developers >>>> >>>> Of the initial committers only 3 are full-time UCI staff. The other >>>> committers are a mix of students, alumni who continue to contribute >>>> to the effort, and individuals working with permission part-time (or >>>> in spare time) on this project. >>>> >>>> >>>> A Excessive Fascination with the Apache Brand >>>> >>>> We believe in the processes, systems, and framework Apache has put in >>>> place. Apache is also known to foster a great community around their >>>> projects and provide exposure. While brand is important, our >>>> fascination with it is not excessive. We believe that the ASF is the >>>> right home for AsterixDB and that having AsterixDB inside of the ASF >>>> will lead to a better long-term outcome for the Big Data community. >>>> >>>> >>>> Documentation >>>> >>>> Documentation and publications related to AsterixDB can be found at >>>> http://asterixdb.ics.uci.edu/. >>>> >>>> >>>> Initial Source >>>> >>>> Current source resides in Google code: >>>> https://code.google.com/p/asterixdb/ (query language and upper system >>>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime >>>> system and storage management libraries). >>>> >>>> >>>> External Dependencies >>>> >>>> AsterixDB depends on a number of Apache projects: >>>> >>>> - Ant >>>> - Avro >>>> - ApacheDB JDO >>>> - Commons >>>> - Derby >>>> - Hadoop >>>> - Hive >>>> - HTTPComponents >>>> - Jakarta ORO >>>> - Maven >>>> - Tomcat >>>> - Thrift >>>> - Velocity >>>> - Wicket >>>> - Xerces >>>> >>>> and other open source projects (organized by license): >>>> >>>> -- ASL 2.0: >>>> - Jackson >>>> - Google Guava >>>> - Google Guice >>>> - JSON-simple >>>> - BoneCP >>>> - Microsoft Azure SDK >>>> - Netty >>>> - Rome >>>> - JetS3t >>>> - Groovy >>>> - Jettison >>>> - Plexus >>>> - Datanucleus (JDO) >>>> - Jetty >>>> - Twitter4J >>>> - Snappy-java >>>> >>>> -- BSD: >>>> - Antlr >>>> - ObjectWeb ASM >>>> - Protobuf >>>> - JSCH >>>> - JavaCC >>>> - Paranamer >>>> - JLine >>>> - Stax >>>> - StringTemplate >>>> - xmlEnc >>>> >>>> -- MIT >>>> - AppAssembler >>>> - SimpleLog4J >>>> >>>> -- CDDL 1.0 >>>> - Java Activation Framework >>>> - Java Transactions >>>> - Java Servlet API >>>> - Grizzly >>>> - gmbal >>>> - Glassfish >>>> >>>> -- CDDL 1.1 >>>> - Jersey >>>> - JAXB Reference Implementation >>>> >>>> -- JSON License >>>> - JSON >>>> >>>> -- EPL 1.0 >>>> - JUnit >>>> >>>> -- JDOM License >>>> - JDOM >>>> >>>> -- Public Domain >>>> - xz >>>> - AOPAlliance >>>> >>>> As all dependencies are managed using Apache Maven, none of the >>>> external libraries need to be packaged in a source distribution. >>>> >>>> >>>> Required Resources >>>> >>>> Developer and user mailing lists >>>> >>>> priv...@asterixdb.incubator.apache.org (with moderated subscriptions) >>>> comm...@asterixdb.incubator.apache.org >>>> d...@asterixdb.incubator.apache.org >>>> us...@asterixdb.incubator.apache.org >>>> >>>> >>>> A git repository >>>> >>>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git >>>> >>>> >>>> A JIRA issue tracker >>>> >>>> https://issues.apache.org/jira/browse/ASTERIXDB >>>> >>>> >>>> Initial Committers >>>> >>>> The following is a list of the planned initial Apache committers (the >>>> active subset of the committers for the current repository at Google >>>> code). >>>> >>>> Abdullah Alamoudi (bamou...@gmail.com) >>>> Cameron Samak (euf...@gmail.com) >>>> Chen Li (che...@gmail.com) >>>> Ian Maxon (ima...@uci.edu) >>>> Ildar Absalyamov (ildar.absalya...@gmail.com) >>>> Jianfeng Jia (jianfeng....@gmail.com) >>>> Karen Ouaknine (ker...@gmail.com) >>>> Markus Dreseler (apa...@dreseler.de) >>>> Mike Carey (dtab...@apache.org) >>>> Murtadha Hubail (hubail...@gmail.com) >>>> Pouria Pirzadeh (pouria.pirza...@gmail.com) >>>> Preston Carman (prest...@apache.org) >>>> Raman Grover (ramangrove...@gmail.com) >>>> Sattam Alsubaiee (salsuba...@gmail.com) >>>> Steven Jacobs (sjaco...@apache.org) >>>> Taewoo Kim (wangs...@gmail.com) >>>> Till Westmann (ti...@apache.org) >>>> Vinayak Borkar (vinay...@apache.org) >>>> Yingyi Bu (buyin...@gmail.com) >>>> Young-Seok Kim (kiss...@gmail.com) >>>> Zach Heilbron (zheilb...@gmail.com) >>>> >>>> >>>> Affiliations >>>> >>>> UC Irvine >>>> - Mike Carey >>>> - Chen Li >>>> - Ian Maxon >>>> - Yingyi Bu >>>> - Raman Grover >>>> - Pouria Pirzadeh >>>> - Young-Seok Kim >>>> - Cameron Samak >>>> - Taewoo Kim >>>> - Jianfeng Jia >>>> - Murtadha Hubail >>>> - Markus Dreseler >>>> >>>> UC Riverside >>>> - Ildar Absalyamov >>>> - Preston Carman >>>> - Steven Jacobs >>>> >>>> Hebrew University >>>> - Keren Ouaknine >>>> >>>> Oracle >>>> - Till Westmann >>>> >>>> X15 Software >>>> - Vinayak Borkar >>>> - Zach Heilbron >>>> >>>> KACST Saudi Arabia >>>> - Sattam Alsubaiee >>>> >>>> Saudi Aramco >>>> - Abdullah Alamoudi >>>> >>>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI >>>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The >>>> non-UC committers are a mix of alumni who continue to contribute to >>>> the effort and individuals working with permission part-time (or in >>>> spare time) on this project. >>>> >>>> >>>> Sponsors >>>> >>>> Champion >>>> >>>> Chris Mattmann (NASA/JPL) >>>> >>>> Nominated Mentors >>>> >>>> TBD >>>> >>>> Sponsoring Entity >>>> >>>> The Apache Incubator >>>> >>>> >>>> >>>> >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Chief Architect >>>> Instrument Software and Science Data Systems Section (398) >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 168-519, Mailstop: 168-527 >>>> Email: chris.a.mattm...@nasa.gov >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Associate Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail