Should be fine.
Regards, Alan > On Jan 19, 2015, at 8:27 PM, Till Westmann <t...@westmann.org> wrote: > > Thank you. > So for we’ve added 3 slots for mentors on the proposal - I hope that’ll be > sufficient even for the relatively large number of new committers. > > Till > >> On Jan 19, 2015, at 8:17 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: >> >> Thanks Till, >> >> Will try to solicit more mentors to help. >> Especially with initial committers mostly have not been exposed to >> contributing the Apache way. >> >> - Henry >> >> On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann <t...@westmann.org> wrote: >>> Hi Henry, >>> >>> thanks! It’s great that you’ve seen (and liked) AsterixDB before. >>> >>> Even if your time is very limited we would be very happy to have you on >>> board as a mentor. >>> I’ll add you to the proposal. >>> >>> Cheers, >>> Till >>> >>>> On Jan 19, 2015, at 10:26 AM, Henry Saputra <henry.sapu...@gmail.com> >>>> wrote: >>>> >>>> +1 This is GREAT News! >>>> >>>> Was watching and trying AsterixDB last year and looked in awesome shape. >>>> >>>> I have my plate full but would love to help mentor this project to get >>>> it going to ASF if needed! >>>> >>>> - Henry >>>> >>>> On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) >>>> <chris.a.mattm...@jpl.nasa.gov> wrote: >>>>> Hi Folks, >>>>> >>>>> I am pleased to bring forth the Apache AsterixDB proposal to the >>>>> Apache Incubator as Champion, working in collaboration with the >>>>> team. Please find the wiki proposal here: >>>>> >>>>> https://wiki.apache.org/incubator/AsterixDBProposal >>>>> >>>>> >>>>> Full text of the proposal is below. Please discuss and enjoy. I’ll >>>>> leave the discussion open for a week, and then look to call a VOTE >>>>> hopefully end of next week if all is well. >>>>> >>>>> Cheers! >>>>> Chris Mattmann >>>>> >>>>> ============================================================= >>>>> Apache AsterixDB Proposal >>>>> >>>>> Abstract >>>>> >>>>> Apache AsterixDB is a scalable big data management system (BDMS) that >>>>> provides storage, management, and query capabilities for large >>>>> collections of semi-structured data. >>>>> >>>>> Proposal >>>>> >>>>> AsterixDB is a big data management system (BDMS) that makes it >>>>> well-suited to needs such as web data warehousing and social data >>>>> storage and analysis. Feature-wise, AsterixDB has: >>>>> >>>>> * A NoSQL style data model (ADM) based on extending JSON with object >>>>> database concepts. >>>>> * An expressive and declarative query language (AQL) for querying >>>>> semi-structured data. >>>>> * A runtime query execution engine, Hyracks, for partitioned-parallel >>>>> execution of query plans. >>>>> * Partitioned LSM-based data storage and indexing for efficient >>>>> ingestion of newly arriving data. >>>>> * Support for querying and indexing external data (e.g., in HDFS) as >>>>> well as data stored within AsterixDB. >>>>> * A rich set of primitive data types, including support for spatial, >>>>> temporal, and textual data. >>>>> * Indexing options that include B+ trees, R trees, and inverted >>>>> keyword index support. >>>>> * Basic transactional (concurrency and recovery) capabilities akin to >>>>> those of a NoSQL store. >>>>> >>>>> >>>>> Background and Rationale >>>>> >>>>> In the world of relational databases, the need to tackle data volumes >>>>> that exceed the capabilities of a single server led to the >>>>> development of “shared-nothing” parallel database systems several >>>>> decades ago. These systems spread data over a cluster based on a >>>>> partitioning strategy, such as hash partitioning, and queries are >>>>> processed by employing partitioned-parallel divide-and-conquer >>>>> techniques. Since these systems are fronted by a high-level, >>>>> declarative language (SQL), their users are shielded from the >>>>> complexities of parallel programming. Parallel database systems have >>>>> been an extremely successful application of parallel computing, and >>>>> quite a number of commercial products exist today. >>>>> >>>>> In the distributed systems world, the Web brought a need to index and >>>>> query its huge content. SQL and relational databases were not the >>>>> answer, though shared-nothing clusters again emerged as the hardware >>>>> platform of choice. Google developed the Google File System (GFS) and >>>>> MapReduce programming model to allow programmers to store and process >>>>> Big Data by writing a few user-defined functions. The MapReduce >>>>> framework applies these functions in parallel to data instances in >>>>> distributed files (map) and to sorted groups of instances sharing a >>>>> common key (reduce) -- not unlike the partitioned parallelism in >>>>> parallel database systems. Apache's Hadoop MapReduce platform is the >>>>> most prominent implementation of this paradigm for the rest of the >>>>> Big Data community. On top of Hadoop and HDFS sit declarative >>>>> languages like Pig and Hive that each compile down to Hadoop >>>>> MapReduce jobs. >>>>> >>>>> The big Web companies were also challenged by extreme user bases >>>>> (100s of millions of users) and needed fast simple lookups and >>>>> updates to very large keyed data sets like user profiles. SQL >>>>> databases were deemed either too expensive or not scalable, so the >>>>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two >>>>> popular key-value stores, in this space. MongoDB and Couchbase are >>>>> other open source alternatives (document stores). >>>>> >>>>> It is evident from the rapidly growing popularity of "NoSQL" stores, >>>>> as well as the strong demand for Big Data analytics engines today, >>>>> that there is a strong (and growing!) need to store, process, *and* >>>>> query large volumes of semi-structured data in many application >>>>> areas. Until very recently, developers have had to ``choose'' between >>>>> using big data analytics engines like Apache Hive or Apache Spark, >>>>> which can do complex query processing and analysis over HDFS-resident >>>>> files, and flexible but low-function data stores like MongoDB or >>>>> Apache HBase. (The Apache Phoenix project, >>>>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that >>>>> aims to bridge between these choices.) >>>>> >>>>> AsterixDB is a highly scalable data management system that can store, >>>>> index, and manage semi-structured data, e.g., much like MongoDB, but >>>>> it also supports a full-power query language with the expressiveness >>>>> of SQL (and more). Unlike analytics engines like Hive or Spark, it >>>>> stores and manages data, so AsterixDB can exploit its knowledge of >>>>> data partitioning and the availability of indexes to avoid always >>>>> scanning data set(s) to process queries. Somewhat surprisingly, there >>>>> is no open source parallel database system (relational or otherwise) >>>>> available to developers today -- AsterixDB aims to fill this need. >>>>> Since Apache is where the majority of the today's most important Big >>>>> Data technologies live, the ASF seems like the obvious home for a >>>>> system like AsterixDB. >>>>> >>>>> Current Status >>>>> >>>>> The current version of AsterixDB was co-developed by a team of >>>>> faculty, staff, and students at UC Irvine and UC Riverside. The >>>>> project was initiated as a large NSF-sponsored project in 2009, the >>>>> goal of which was to combine the best ideas from the parallel >>>>> database world, the then new Hadoop world, and the semi-structured >>>>> (e.g., XML/JSON) data world in order to create a next-generation >>>>> BDMS. A first informal open source release was made four years later, >>>>> in June of 2013, under the Apache Software License 2.0. >>>>> >>>>> >>>>> Meritocracy >>>>> >>>>> The current developers are familiar with meritocratic open source >>>>> development at Apache. Apache was chosen specifically because we want >>>>> to encourage this style of development for the project. >>>>> >>>>> >>>>> Community >>>>> >>>>> While AsterixDB started as a university project it has developed into >>>>> a community. A number of the initial committers started contributing >>>>> in academia and continue to actively participate and contribute after >>>>> graduation. And we seek to further develop developer and user >>>>> communities. One way to broaden the community that is ongoing is >>>>> through academic collaborations (currently with IIT Mumbai in India >>>>> and TU Berlin in Germany). During incubation we will also explicitly >>>>> seek increased industrial participation. >>>>> >>>>> Some indicators of the effort's development community and history can >>>>> be >>>>> found at: >>>>> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo, >>>>> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo >>>>> >>>>> >>>>> Core Developers >>>>> >>>>> The core developers of the project are diverse, although initially UC >>>>> Irvine heavy (roughly 50) due to the project's origins at UCI. The >>>>> other 50 are from other academic institutions (UC Riverside and the >>>>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook, >>>>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software). >>>>> >>>>> >>>>> Alignment >>>>> >>>>> Apache is, by far, the most natural home for taking the AsterixDB >>>>> project forward. A large fraction of today's top Big Data >>>>> technologies have their homes in Apache, including Hadoop, YARN, Pig, >>>>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a >>>>> significant gap -- the parallel data management system gap -- that >>>>> exists in the Big Data open source world. It is well-aligned with a >>>>> number of the Apache projects, e.g., it has strong support for >>>>> accessing and indexing external data in HDFS, and it uses YARN as an >>>>> answer to basic cluster resource management. AsterixDB also seeks to >>>>> achieve an Apache-style development model; it is seeking a broader >>>>> community of contributors and users in order to achieve its full >>>>> potential and value to the Big Data community. >>>>> >>>>> There are also a number of related Apache projects and dependencies >>>>> that will be mentioned below in the Relationships with Other Apache >>>>> products section. >>>>> >>>>> >>>>> Known Risks >>>>> >>>>> Orphaned products >>>>> >>>>> Given the current level of intellectual investment in AsterixDB, the >>>>> risk of the project being abandoned is very small. The UCI/UCR >>>>> faculty team leads are highly incentivized to continue development >>>>> since the database groups at UC Irvine and UC Riverside are both >>>>> reliant on AsterixDB as a platform for long-term graduate research >>>>> projects. UC San Diego is also beginning to contribute to the code >>>>> base, and a collaboration involving public health applications is >>>>> forming with UCLA. The work on AsterixDB is managed via a mix of >>>>> mailing list discussions supplemented by weekly project status >>>>> meetings which are summarized on the mailing list. Typical (local >>>>> plus Skype-in) attendance to the weekly status meetings runs at about >>>>> 20 active contributors. >>>>> >>>>> >>>>> Inexperience with Open Source >>>>> >>>>> AsterixDB and Hyracks were completely developed in Open Source under >>>>> the ASL 2.0. The source code repositories, issue tracker, and mailing >>>>> lists are available on Google Code and discussions and decisions >>>>> happen on the mailing lists (which is necessary due to the geographic >>>>> distribution of the current developers). >>>>> >>>>> Also a few of the initial committers have contributed to Apache >>>>> projects. Vinayak Borkar is a committer on the Apache Helix and >>>>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF >>>>> and an IPMC member. Preston Carman and Steven Jacobs are committers >>>>> on the Apache VXQuery project. >>>>> >>>>> >>>>> Relationships with Other Apache Products >>>>> >>>>> Apache VXQuery is based on the Hyracks data-parallel runtime, which >>>>> is also included in the AsterixDB code base. >>>>> >>>>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB >>>>> is support for accessing external data in HDFS (and Hive formats), >>>>> and resource management and system administration features are in the >>>>> process of being migrated to YARN. >>>>> >>>>> AsterixDB's AQL query facilities offer comparable query power to >>>>> Apache's Pig and Hive systems for big data analytics. AsterixDB >>>>> differs in storing and indexing data and thus being able to quickly >>>>> answer small and medium queries without large HDFS data scans - >>>>> thereby targeting a different class of use cases. >>>>> >>>>> AsterixDB's data storage and indexing facilities are similar to those >>>>> of HBase, but AsterixDB differs in being a much more complete and >>>>> queryable BDMS (not just a key-value style store). >>>>> >>>>> AsterixDB's target use cases are not in-memory processing or >>>>> iterative algorithm support, making AsterixDB complementary to the >>>>> Apache Spark platform. (Spark interoperability is on our longer-term >>>>> to-do wishlist.) >>>>> >>>>> >>>>> Homogeneous Developers >>>>> >>>>> As mentioned before the current community is already organizationally >>>>> and geographically distributed - and we would like to increase the >>>>> heterogeneity. >>>>> >>>>> >>>>> Reliance on Salaried Developers >>>>> >>>>> Of the initial committers only 3 are full-time UCI staff. The other >>>>> committers are a mix of students, alumni who continue to contribute >>>>> to the effort, and individuals working with permission part-time (or >>>>> in spare time) on this project. >>>>> >>>>> >>>>> A Excessive Fascination with the Apache Brand >>>>> >>>>> We believe in the processes, systems, and framework Apache has put in >>>>> place. Apache is also known to foster a great community around their >>>>> projects and provide exposure. While brand is important, our >>>>> fascination with it is not excessive. We believe that the ASF is the >>>>> right home for AsterixDB and that having AsterixDB inside of the ASF >>>>> will lead to a better long-term outcome for the Big Data community. >>>>> >>>>> >>>>> Documentation >>>>> >>>>> Documentation and publications related to AsterixDB can be found at >>>>> http://asterixdb.ics.uci.edu/. >>>>> >>>>> >>>>> Initial Source >>>>> >>>>> Current source resides in Google code: >>>>> https://code.google.com/p/asterixdb/ (query language and upper system >>>>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime >>>>> system and storage management libraries). >>>>> >>>>> >>>>> External Dependencies >>>>> >>>>> AsterixDB depends on a number of Apache projects: >>>>> >>>>> - Ant >>>>> - Avro >>>>> - ApacheDB JDO >>>>> - Commons >>>>> - Derby >>>>> - Hadoop >>>>> - Hive >>>>> - HTTPComponents >>>>> - Jakarta ORO >>>>> - Maven >>>>> - Tomcat >>>>> - Thrift >>>>> - Velocity >>>>> - Wicket >>>>> - Xerces >>>>> >>>>> and other open source projects (organized by license): >>>>> >>>>> -- ASL 2.0: >>>>> - Jackson >>>>> - Google Guava >>>>> - Google Guice >>>>> - JSON-simple >>>>> - BoneCP >>>>> - Microsoft Azure SDK >>>>> - Netty >>>>> - Rome >>>>> - JetS3t >>>>> - Groovy >>>>> - Jettison >>>>> - Plexus >>>>> - Datanucleus (JDO) >>>>> - Jetty >>>>> - Twitter4J >>>>> - Snappy-java >>>>> >>>>> -- BSD: >>>>> - Antlr >>>>> - ObjectWeb ASM >>>>> - Protobuf >>>>> - JSCH >>>>> - JavaCC >>>>> - Paranamer >>>>> - JLine >>>>> - Stax >>>>> - StringTemplate >>>>> - xmlEnc >>>>> >>>>> -- MIT >>>>> - AppAssembler >>>>> - SimpleLog4J >>>>> >>>>> -- CDDL 1.0 >>>>> - Java Activation Framework >>>>> - Java Transactions >>>>> - Java Servlet API >>>>> - Grizzly >>>>> - gmbal >>>>> - Glassfish >>>>> >>>>> -- CDDL 1.1 >>>>> - Jersey >>>>> - JAXB Reference Implementation >>>>> >>>>> -- JSON License >>>>> - JSON >>>>> >>>>> -- EPL 1.0 >>>>> - JUnit >>>>> >>>>> -- JDOM License >>>>> - JDOM >>>>> >>>>> -- Public Domain >>>>> - xz >>>>> - AOPAlliance >>>>> >>>>> As all dependencies are managed using Apache Maven, none of the >>>>> external libraries need to be packaged in a source distribution. >>>>> >>>>> >>>>> Required Resources >>>>> >>>>> Developer and user mailing lists >>>>> >>>>> priv...@asterixdb.incubator.apache.org (with moderated subscriptions) >>>>> comm...@asterixdb.incubator.apache.org >>>>> d...@asterixdb.incubator.apache.org >>>>> us...@asterixdb.incubator.apache.org >>>>> >>>>> >>>>> A git repository >>>>> >>>>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git >>>>> >>>>> >>>>> A JIRA issue tracker >>>>> >>>>> https://issues.apache.org/jira/browse/ASTERIXDB >>>>> >>>>> >>>>> Initial Committers >>>>> >>>>> The following is a list of the planned initial Apache committers (the >>>>> active subset of the committers for the current repository at Google >>>>> code). >>>>> >>>>> Abdullah Alamoudi (bamou...@gmail.com) >>>>> Cameron Samak (euf...@gmail.com) >>>>> Chen Li (che...@gmail.com) >>>>> Ian Maxon (ima...@uci.edu) >>>>> Ildar Absalyamov (ildar.absalya...@gmail.com) >>>>> Jianfeng Jia (jianfeng....@gmail.com) >>>>> Karen Ouaknine (ker...@gmail.com) >>>>> Markus Dreseler (apa...@dreseler.de) >>>>> Mike Carey (dtab...@apache.org) >>>>> Murtadha Hubail (hubail...@gmail.com) >>>>> Pouria Pirzadeh (pouria.pirza...@gmail.com) >>>>> Preston Carman (prest...@apache.org) >>>>> Raman Grover (ramangrove...@gmail.com) >>>>> Sattam Alsubaiee (salsuba...@gmail.com) >>>>> Steven Jacobs (sjaco...@apache.org) >>>>> Taewoo Kim (wangs...@gmail.com) >>>>> Till Westmann (ti...@apache.org) >>>>> Vinayak Borkar (vinay...@apache.org) >>>>> Yingyi Bu (buyin...@gmail.com) >>>>> Young-Seok Kim (kiss...@gmail.com) >>>>> Zach Heilbron (zheilb...@gmail.com) >>>>> >>>>> >>>>> Affiliations >>>>> >>>>> UC Irvine >>>>> - Mike Carey >>>>> - Chen Li >>>>> - Ian Maxon >>>>> - Yingyi Bu >>>>> - Raman Grover >>>>> - Pouria Pirzadeh >>>>> - Young-Seok Kim >>>>> - Cameron Samak >>>>> - Taewoo Kim >>>>> - Jianfeng Jia >>>>> - Murtadha Hubail >>>>> - Markus Dreseler >>>>> >>>>> UC Riverside >>>>> - Ildar Absalyamov >>>>> - Preston Carman >>>>> - Steven Jacobs >>>>> >>>>> Hebrew University >>>>> - Keren Ouaknine >>>>> >>>>> Oracle >>>>> - Till Westmann >>>>> >>>>> X15 Software >>>>> - Vinayak Borkar >>>>> - Zach Heilbron >>>>> >>>>> KACST Saudi Arabia >>>>> - Sattam Alsubaiee >>>>> >>>>> Saudi Aramco >>>>> - Abdullah Alamoudi >>>>> >>>>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI >>>>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The >>>>> non-UC committers are a mix of alumni who continue to contribute to >>>>> the effort and individuals working with permission part-time (or in >>>>> spare time) on this project. >>>>> >>>>> >>>>> Sponsors >>>>> >>>>> Champion >>>>> >>>>> Chris Mattmann (NASA/JPL) >>>>> >>>>> Nominated Mentors >>>>> >>>>> TBD >>>>> >>>>> Sponsoring Entity >>>>> >>>>> The Apache Incubator >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Chris Mattmann, Ph.D. >>>>> Chief Architect >>>>> Instrument Software and Science Data Systems Section (398) >>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>> Office: 168-519, Mailstop: 168-527 >>>>> Email: chris.a.mattm...@nasa.gov >>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Adjunct Associate Professor, Computer Science Department >>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org