Added my name to the mentor list.
On Tue, Jan 20, 2015 at 8:37 AM, Mike Carey <dtab...@gmail.com> wrote: > Wonderful; thanks, Ted!! > Cheers, > Mike > > On 1/19/15 11:29 PM, Ted Dunning wrote: > > > Chris just asked me under separate cover. > > I am happy to help out as mentor. > > > > On Mon, Jan 19, 2015 at 8:17 PM, Henry Saputra <henry.sapu...@gmail.com> > wrote: > >> Thanks Till, >> >> Will try to solicit more mentors to help. >> Especially with initial committers mostly have not been exposed to >> contributing the Apache way. >> >> - Henry >> >> On Mon, Jan 19, 2015 at 5:28 PM, Till Westmann <t...@westmann.org> wrote: >> > Hi Henry, >> > >> > thanks! It’s great that you’ve seen (and liked) AsterixDB before. >> > >> > Even if your time is very limited we would be very happy to have you on >> board as a mentor. >> > I’ll add you to the proposal. >> > >> > Cheers, >> > Till >> > >> >> On Jan 19, 2015, at 10:26 AM, Henry Saputra <henry.sapu...@gmail.com> >> wrote: >> >> >> >> +1 This is GREAT News! >> >> >> >> Was watching and trying AsterixDB last year and looked in awesome >> shape. >> >> >> >> I have my plate full but would love to help mentor this project to get >> >> it going to ASF if needed! >> >> >> >> - Henry >> >> >> >> On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980) >> >> <chris.a.mattm...@jpl.nasa.gov> wrote: >> >>> Hi Folks, >> >>> >> >>> I am pleased to bring forth the Apache AsterixDB proposal to the >> >>> Apache Incubator as Champion, working in collaboration with the >> >>> team. Please find the wiki proposal here: >> >>> >> >>> https://wiki.apache.org/incubator/AsterixDBProposal >> >>> >> >>> >> >>> Full text of the proposal is below. Please discuss and enjoy. I’ll >> >>> leave the discussion open for a week, and then look to call a VOTE >> >>> hopefully end of next week if all is well. >> >>> >> >>> Cheers! >> >>> Chris Mattmann >> >>> >> >>> ============================================================= >> >>> Apache AsterixDB Proposal >> >>> >> >>> Abstract >> >>> >> >>> Apache AsterixDB is a scalable big data management system (BDMS) that >> >>> provides storage, management, and query capabilities for large >> >>> collections of semi-structured data. >> >>> >> >>> Proposal >> >>> >> >>> AsterixDB is a big data management system (BDMS) that makes it >> >>> well-suited to needs such as web data warehousing and social data >> >>> storage and analysis. Feature-wise, AsterixDB has: >> >>> >> >>> * A NoSQL style data model (ADM) based on extending JSON with object >> >>> database concepts. >> >>> * An expressive and declarative query language (AQL) for querying >> >>> semi-structured data. >> >>> * A runtime query execution engine, Hyracks, for partitioned-parallel >> >>> execution of query plans. >> >>> * Partitioned LSM-based data storage and indexing for efficient >> >>> ingestion of newly arriving data. >> >>> * Support for querying and indexing external data (e.g., in HDFS) as >> >>> well as data stored within AsterixDB. >> >>> * A rich set of primitive data types, including support for spatial, >> >>> temporal, and textual data. >> >>> * Indexing options that include B+ trees, R trees, and inverted >> >>> keyword index support. >> >>> * Basic transactional (concurrency and recovery) capabilities akin to >> >>> those of a NoSQL store. >> >>> >> >>> >> >>> Background and Rationale >> >>> >> >>> In the world of relational databases, the need to tackle data volumes >> >>> that exceed the capabilities of a single server led to the >> >>> development of “shared-nothing” parallel database systems several >> >>> decades ago. These systems spread data over a cluster based on a >> >>> partitioning strategy, such as hash partitioning, and queries are >> >>> processed by employing partitioned-parallel divide-and-conquer >> >>> techniques. Since these systems are fronted by a high-level, >> >>> declarative language (SQL), their users are shielded from the >> >>> complexities of parallel programming. Parallel database systems have >> >>> been an extremely successful application of parallel computing, and >> >>> quite a number of commercial products exist today. >> >>> >> >>> In the distributed systems world, the Web brought a need to index and >> >>> query its huge content. SQL and relational databases were not the >> >>> answer, though shared-nothing clusters again emerged as the hardware >> >>> platform of choice. Google developed the Google File System (GFS) and >> >>> MapReduce programming model to allow programmers to store and process >> >>> Big Data by writing a few user-defined functions. The MapReduce >> >>> framework applies these functions in parallel to data instances in >> >>> distributed files (map) and to sorted groups of instances sharing a >> >>> common key (reduce) -- not unlike the partitioned parallelism in >> >>> parallel database systems. Apache's Hadoop MapReduce platform is the >> >>> most prominent implementation of this paradigm for the rest of the >> >>> Big Data community. On top of Hadoop and HDFS sit declarative >> >>> languages like Pig and Hive that each compile down to Hadoop >> >>> MapReduce jobs. >> >>> >> >>> The big Web companies were also challenged by extreme user bases >> >>> (100s of millions of users) and needed fast simple lookups and >> >>> updates to very large keyed data sets like user profiles. SQL >> >>> databases were deemed either too expensive or not scalable, so the >> >>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two >> >>> popular key-value stores, in this space. MongoDB and Couchbase are >> >>> other open source alternatives (document stores). >> >>> >> >>> It is evident from the rapidly growing popularity of "NoSQL" stores, >> >>> as well as the strong demand for Big Data analytics engines today, >> >>> that there is a strong (and growing!) need to store, process, *and* >> >>> query large volumes of semi-structured data in many application >> >>> areas. Until very recently, developers have had to ``choose'' between >> >>> using big data analytics engines like Apache Hive or Apache Spark, >> >>> which can do complex query processing and analysis over HDFS-resident >> >>> files, and flexible but low-function data stores like MongoDB or >> >>> Apache HBase. (The Apache Phoenix project, >> >>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that >> >>> aims to bridge between these choices.) >> >>> >> >>> AsterixDB is a highly scalable data management system that can store, >> >>> index, and manage semi-structured data, e.g., much like MongoDB, but >> >>> it also supports a full-power query language with the expressiveness >> >>> of SQL (and more). Unlike analytics engines like Hive or Spark, it >> >>> stores and manages data, so AsterixDB can exploit its knowledge of >> >>> data partitioning and the availability of indexes to avoid always >> >>> scanning data set(s) to process queries. Somewhat surprisingly, there >> >>> is no open source parallel database system (relational or otherwise) >> >>> available to developers today -- AsterixDB aims to fill this need. >> >>> Since Apache is where the majority of the today's most important Big >> >>> Data technologies live, the ASF seems like the obvious home for a >> >>> system like AsterixDB. >> >>> >> >>> Current Status >> >>> >> >>> The current version of AsterixDB was co-developed by a team of >> >>> faculty, staff, and students at UC Irvine and UC Riverside. The >> >>> project was initiated as a large NSF-sponsored project in 2009, the >> >>> goal of which was to combine the best ideas from the parallel >> >>> database world, the then new Hadoop world, and the semi-structured >> >>> (e.g., XML/JSON) data world in order to create a next-generation >> >>> BDMS. A first informal open source release was made four years later, >> >>> in June of 2013, under the Apache Software License 2.0. >> >>> >> >>> >> >>> Meritocracy >> >>> >> >>> The current developers are familiar with meritocratic open source >> >>> development at Apache. Apache was chosen specifically because we want >> >>> to encourage this style of development for the project. >> >>> >> >>> >> >>> Community >> >>> >> >>> While AsterixDB started as a university project it has developed into >> >>> a community. A number of the initial committers started contributing >> >>> in academia and continue to actively participate and contribute after >> >>> graduation. And we seek to further develop developer and user >> >>> communities. One way to broaden the community that is ongoing is >> >>> through academic collaborations (currently with IIT Mumbai in India >> >>> and TU Berlin in Germany). During incubation we will also explicitly >> >>> seek increased industrial participation. >> >>> >> >>> Some indicators of the effort's development community and history can >> >>> be >> >>> found at: >> >>> >> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo >> , >> >>> >> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo >> >>> >> >>> >> >>> Core Developers >> >>> >> >>> The core developers of the project are diverse, although initially UC >> >>> Irvine heavy (roughly 50) due to the project's origins at UCI. The >> >>> other 50 are from other academic institutions (UC Riverside and the >> >>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook, >> >>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software). >> >>> >> >>> >> >>> Alignment >> >>> >> >>> Apache is, by far, the most natural home for taking the AsterixDB >> >>> project forward. A large fraction of today's top Big Data >> >>> technologies have their homes in Apache, including Hadoop, YARN, Pig, >> >>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a >> >>> significant gap -- the parallel data management system gap -- that >> >>> exists in the Big Data open source world. It is well-aligned with a >> >>> number of the Apache projects, e.g., it has strong support for >> >>> accessing and indexing external data in HDFS, and it uses YARN as an >> >>> answer to basic cluster resource management. AsterixDB also seeks to >> >>> achieve an Apache-style development model; it is seeking a broader >> >>> community of contributors and users in order to achieve its full >> >>> potential and value to the Big Data community. >> >>> >> >>> There are also a number of related Apache projects and dependencies >> >>> that will be mentioned below in the Relationships with Other Apache >> >>> products section. >> >>> >> >>> >> >>> Known Risks >> >>> >> >>> Orphaned products >> >>> >> >>> Given the current level of intellectual investment in AsterixDB, the >> >>> risk of the project being abandoned is very small. The UCI/UCR >> >>> faculty team leads are highly incentivized to continue development >> >>> since the database groups at UC Irvine and UC Riverside are both >> >>> reliant on AsterixDB as a platform for long-term graduate research >> >>> projects. UC San Diego is also beginning to contribute to the code >> >>> base, and a collaboration involving public health applications is >> >>> forming with UCLA. The work on AsterixDB is managed via a mix of >> >>> mailing list discussions supplemented by weekly project status >> >>> meetings which are summarized on the mailing list. Typical (local >> >>> plus Skype-in) attendance to the weekly status meetings runs at about >> >>> 20 active contributors. >> >>> >> >>> >> >>> Inexperience with Open Source >> >>> >> >>> AsterixDB and Hyracks were completely developed in Open Source under >> >>> the ASL 2.0. The source code repositories, issue tracker, and mailing >> >>> lists are available on Google Code and discussions and decisions >> >>> happen on the mailing lists (which is necessary due to the geographic >> >>> distribution of the current developers). >> >>> >> >>> Also a few of the initial committers have contributed to Apache >> >>> projects. Vinayak Borkar is a committer on the Apache Helix and >> >>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF >> >>> and an IPMC member. Preston Carman and Steven Jacobs are committers >> >>> on the Apache VXQuery project. >> >>> >> >>> >> >>> Relationships with Other Apache Products >> >>> >> >>> Apache VXQuery is based on the Hyracks data-parallel runtime, which >> >>> is also included in the AsterixDB code base. >> >>> >> >>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB >> >>> is support for accessing external data in HDFS (and Hive formats), >> >>> and resource management and system administration features are in the >> >>> process of being migrated to YARN. >> >>> >> >>> AsterixDB's AQL query facilities offer comparable query power to >> >>> Apache's Pig and Hive systems for big data analytics. AsterixDB >> >>> differs in storing and indexing data and thus being able to quickly >> >>> answer small and medium queries without large HDFS data scans - >> >>> thereby targeting a different class of use cases. >> >>> >> >>> AsterixDB's data storage and indexing facilities are similar to those >> >>> of HBase, but AsterixDB differs in being a much more complete and >> >>> queryable BDMS (not just a key-value style store). >> >>> >> >>> AsterixDB's target use cases are not in-memory processing or >> >>> iterative algorithm support, making AsterixDB complementary to the >> >>> Apache Spark platform. (Spark interoperability is on our longer-term >> >>> to-do wishlist.) >> >>> >> >>> >> >>> Homogeneous Developers >> >>> >> >>> As mentioned before the current community is already organizationally >> >>> and geographically distributed - and we would like to increase the >> >>> heterogeneity. >> >>> >> >>> >> >>> Reliance on Salaried Developers >> >>> >> >>> Of the initial committers only 3 are full-time UCI staff. The other >> >>> committers are a mix of students, alumni who continue to contribute >> >>> to the effort, and individuals working with permission part-time (or >> >>> in spare time) on this project. >> >>> >> >>> >> >>> A Excessive Fascination with the Apache Brand >> >>> >> >>> We believe in the processes, systems, and framework Apache has put in >> >>> place. Apache is also known to foster a great community around their >> >>> projects and provide exposure. While brand is important, our >> >>> fascination with it is not excessive. We believe that the ASF is the >> >>> right home for AsterixDB and that having AsterixDB inside of the ASF >> >>> will lead to a better long-term outcome for the Big Data community. >> >>> >> >>> >> >>> Documentation >> >>> >> >>> Documentation and publications related to AsterixDB can be found at >> >>> http://asterixdb.ics.uci.edu/. >> >>> >> >>> >> >>> Initial Source >> >>> >> >>> Current source resides in Google code: >> >>> https://code.google.com/p/asterixdb/ (query language and upper system >> >>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime >> >>> system and storage management libraries). >> >>> >> >>> >> >>> External Dependencies >> >>> >> >>> AsterixDB depends on a number of Apache projects: >> >>> >> >>> - Ant >> >>> - Avro >> >>> - ApacheDB JDO >> >>> - Commons >> >>> - Derby >> >>> - Hadoop >> >>> - Hive >> >>> - HTTPComponents >> >>> - Jakarta ORO >> >>> - Maven >> >>> - Tomcat >> >>> - Thrift >> >>> - Velocity >> >>> - Wicket >> >>> - Xerces >> >>> >> >>> and other open source projects (organized by license): >> >>> >> >>> -- ASL 2.0: >> >>> - Jackson >> >>> - Google Guava >> >>> - Google Guice >> >>> - JSON-simple >> >>> - BoneCP >> >>> - Microsoft Azure SDK >> >>> - Netty >> >>> - Rome >> >>> - JetS3t >> >>> - Groovy >> >>> - Jettison >> >>> - Plexus >> >>> - Datanucleus (JDO) >> >>> - Jetty >> >>> - Twitter4J >> >>> - Snappy-java >> >>> >> >>> -- BSD: >> >>> - Antlr >> >>> - ObjectWeb ASM >> >>> - Protobuf >> >>> - JSCH >> >>> - JavaCC >> >>> - Paranamer >> >>> - JLine >> >>> - Stax >> >>> - StringTemplate >> >>> - xmlEnc >> >>> >> >>> -- MIT >> >>> - AppAssembler >> >>> - SimpleLog4J >> >>> >> >>> -- CDDL 1.0 >> >>> - Java Activation Framework >> >>> - Java Transactions >> >>> - Java Servlet API >> >>> - Grizzly >> >>> - gmbal >> >>> - Glassfish >> >>> >> >>> -- CDDL 1.1 >> >>> - Jersey >> >>> - JAXB Reference Implementation >> >>> >> >>> -- JSON License >> >>> - JSON >> >>> >> >>> -- EPL 1.0 >> >>> - JUnit >> >>> >> >>> -- JDOM License >> >>> - JDOM >> >>> >> >>> -- Public Domain >> >>> - xz >> >>> - AOPAlliance >> >>> >> >>> As all dependencies are managed using Apache Maven, none of the >> >>> external libraries need to be packaged in a source distribution. >> >>> >> >>> >> >>> Required Resources >> >>> >> >>> Developer and user mailing lists >> >>> >> >>> priv...@asterixdb.incubator.apache.org (with moderated subscriptions) >> >>> comm...@asterixdb.incubator.apache.org >> >>> d...@asterixdb.incubator.apache.org >> >>> us...@asterixdb.incubator.apache.org >> >>> >> >>> >> >>> A git repository >> >>> >> >>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git >> >>> >> >>> >> >>> A JIRA issue tracker >> >>> >> >>> https://issues.apache.org/jira/browse/ASTERIXDB >> >>> >> >>> >> >>> Initial Committers >> >>> >> >>> The following is a list of the planned initial Apache committers (the >> >>> active subset of the committers for the current repository at Google >> >>> code). >> >>> >> >>> Abdullah Alamoudi (bamou...@gmail.com) >> >>> Cameron Samak (euf...@gmail.com) >> >>> Chen Li (che...@gmail.com) >> >>> Ian Maxon (ima...@uci.edu) >> >>> Ildar Absalyamov (ildar.absalya...@gmail.com) >> >>> Jianfeng Jia (jianfeng....@gmail.com) >> >>> Karen Ouaknine (ker...@gmail.com) >> >>> Markus Dreseler (apa...@dreseler.de) >> >>> Mike Carey (dtab...@apache.org) >> >>> Murtadha Hubail (hubail...@gmail.com) >> >>> Pouria Pirzadeh (pouria.pirza...@gmail.com) >> >>> Preston Carman (prest...@apache.org) >> >>> Raman Grover (ramangrove...@gmail.com) >> >>> Sattam Alsubaiee (salsuba...@gmail.com) >> >>> Steven Jacobs (sjaco...@apache.org) >> >>> Taewoo Kim (wangs...@gmail.com) >> >>> Till Westmann (ti...@apache.org) >> >>> Vinayak Borkar (vinay...@apache.org) >> >>> Yingyi Bu (buyin...@gmail.com) >> >>> Young-Seok Kim (kiss...@gmail.com) >> >>> Zach Heilbron (zheilb...@gmail.com) >> >>> >> >>> >> >>> Affiliations >> >>> >> >>> UC Irvine >> >>> - Mike Carey >> >>> - Chen Li >> >>> - Ian Maxon >> >>> - Yingyi Bu >> >>> - Raman Grover >> >>> - Pouria Pirzadeh >> >>> - Young-Seok Kim >> >>> - Cameron Samak >> >>> - Taewoo Kim >> >>> - Jianfeng Jia >> >>> - Murtadha Hubail >> >>> - Markus Dreseler >> >>> >> >>> UC Riverside >> >>> - Ildar Absalyamov >> >>> - Preston Carman >> >>> - Steven Jacobs >> >>> >> >>> Hebrew University >> >>> - Keren Ouaknine >> >>> >> >>> Oracle >> >>> - Till Westmann >> >>> >> >>> X15 Software >> >>> - Vinayak Borkar >> >>> - Zach Heilbron >> >>> >> >>> KACST Saudi Arabia >> >>> - Sattam Alsubaiee >> >>> >> >>> Saudi Aramco >> >>> - Abdullah Alamoudi >> >>> >> >>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI >> >>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The >> >>> non-UC committers are a mix of alumni who continue to contribute to >> >>> the effort and individuals working with permission part-time (or in >> >>> spare time) on this project. >> >>> >> >>> >> >>> Sponsors >> >>> >> >>> Champion >> >>> >> >>> Chris Mattmann (NASA/JPL) >> >>> >> >>> Nominated Mentors >> >>> >> >>> TBD >> >>> >> >>> Sponsoring Entity >> >>> >> >>> The Apache Incubator >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Chris Mattmann, Ph.D. >> >>> Chief Architect >> >>> Instrument Software and Science Data Systems Section (398) >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> Office: 168-519, Mailstop: 168-527 >> >>> Email: chris.a.mattm...@nasa.gov >> >>> WWW: http://sunset.usc.edu/~mattmann/ >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Adjunct Associate Professor, Computer Science Department >> >>> University of Southern California, Los Angeles, CA 90089 USA >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>> >> >>> >> >>> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > >