Re: [PROPOSAL] Apache AsterixDB Incubator

Mike Carey Mon, 19 Jan 2015 17:48:57 -0800

Ditto - thanks for the support!
Cheers,
Mike

On 1/19/15 5:39 PM, Till Westmann wrote:

On Jan 19, 2015, at 11:34 AM, jan i <j...@apache.org<mailto:j...@apache.org>> wrote:
Looks like a real challenging project, and the proposal looks as ifit has already been through a couple of refinement rounds.
Count on my +1, when it comes to voting.


Will do!

Thanks,
Till


rgds
jan i

On 19 January 2015 at 19:26, Henry Saputra <henry.sapu...@gmail.com<mailto:henry.sapu...@gmail.com>> wrote:


    +1 This is GREAT News!

    Was watching and trying AsterixDB last year and looked in awesome
    shape.

    I have my plate full but would love to help mentor this project
    to get
    it going to ASF if needed!

    - Henry

    On Wed, Jan 14, 2015 at 6:21 PM, Mattmann, Chris A (3980)
    <chris.a.mattm...@jpl.nasa.gov
    <mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
    > Hi Folks,
    >
    > I am pleased to bring forth the Apache AsterixDB proposal to the
    > Apache Incubator as Champion, working in collaboration with the
    > team. Please find the wiki proposal here:
    >
    > https://wiki.apache.org/incubator/AsterixDBProposal
    >
    >
    > Full text of the proposal is below. Please discuss and enjoy. I’ll
    > leave the discussion open for a week, and then look to call a VOTE
    > hopefully end of next week if all is well.
    >
    > Cheers!
    > Chris Mattmann
    >
    > =============================================================
    > Apache AsterixDB Proposal
    >
    > Abstract
    >
    > Apache AsterixDB is a scalable big data management system
    (BDMS) that
    > provides storage, management, and query capabilities for large
    > collections of semi-structured data.
    >
    > Proposal
    >
    > AsterixDB is a big data management system (BDMS) that makes it
    > well-suited to needs such as web data warehousing and social data
    > storage and analysis. Feature-wise, AsterixDB has:
    >
    > * A NoSQL style data model (ADM) based on extending JSON with
    object
    >   database concepts.
    > * An expressive and declarative query language (AQL) for querying
    >   semi-structured data.
    > * A runtime query execution engine, Hyracks, for
    partitioned-parallel
    >   execution of query plans.
    > * Partitioned LSM-based data storage and indexing for efficient
    >   ingestion of newly arriving data.
    > * Support for querying and indexing external data (e.g., in
    HDFS) as
    >   well as data stored within AsterixDB.
    > * A rich set of primitive data types, including support for
    spatial,
    >   temporal, and textual data.
    > * Indexing options that include B+ trees, R trees, and inverted
    >   keyword index support.
    > * Basic transactional (concurrency and recovery) capabilities
    akin to
    >   those of a NoSQL store.
    >
    >
    > Background and Rationale
    >
    > In the world of relational databases, the need to tackle data
    volumes
    > that exceed the capabilities of a single server led to the
    > development of “shared-nothing” parallel database systems several
    > decades ago. These systems spread data over a cluster based on a
    > partitioning strategy, such as hash partitioning, and queries are
    > processed by employing partitioned-parallel divide-and-conquer
    > techniques. Since these systems are fronted by a high-level,
    > declarative language (SQL), their users are shielded from the
    > complexities of parallel programming. Parallel database systems
    have
    > been an extremely successful application of parallel computing, and
    > quite a number of commercial products exist today.
    >
    > In the distributed systems world, the Web brought a need to
    index and
    > query its huge content. SQL and relational databases were not the
    > answer, though shared-nothing clusters again emerged as the
    hardware
    > platform of choice. Google developed the Google File System
    (GFS) and
    > MapReduce programming model to allow programmers to store and
    process
    > Big Data by writing a few user-defined functions. The MapReduce
    > framework applies these functions in parallel to data instances in
    > distributed files (map) and to sorted groups of instances sharing a
    > common key (reduce) -- not unlike the partitioned parallelism in
    > parallel database systems. Apache's Hadoop MapReduce platform
    is the
    > most prominent implementation of this paradigm for the rest of the
    > Big Data community. On top of Hadoop and HDFS sit declarative
    > languages like Pig and Hive that each compile down to Hadoop
    > MapReduce jobs.
    >
    > The big Web companies were also challenged by extreme user bases
    > (100s of millions of users) and needed fast simple lookups and
    > updates to very large keyed data sets like user profiles. SQL
    > databases were deemed either too expensive or not scalable, so the
    > “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
    > popular key-value stores, in this space. MongoDB and Couchbase are
    > other open source alternatives (document stores).
    >
    > It is evident from the rapidly growing popularity of "NoSQL"
    stores,
    > as well as the strong demand for Big Data analytics engines today,
    > that there is a strong (and growing!) need to store, process, *and*
    > query large volumes of semi-structured data in many application
    > areas. Until very recently, developers have had to ``choose''
    between
    > using big data analytics engines like Apache Hive or Apache Spark,
    > which can do complex query processing and analysis over
    HDFS-resident
    > files, and flexible but low-function data stores like MongoDB or
    > Apache HBase. (The Apache Phoenix project,
    > http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
    > aims to bridge between these choices.)
    >
    > AsterixDB is a highly scalable data management system that can
    store,
    > index, and manage semi-structured data, e.g., much like
    MongoDB, but
    > it also supports a full-power query language with the
    expressiveness
    > of SQL (and more). Unlike analytics engines like Hive or Spark, it
    > stores and manages data, so AsterixDB can exploit its knowledge of
    > data partitioning and the availability of indexes to avoid always
    > scanning data set(s) to process queries. Somewhat surprisingly,
    there
    > is no open source parallel database system (relational or
    otherwise)
    > available to developers today -- AsterixDB aims to fill this need.
    > Since Apache is where the majority of the today's most
    important Big
    > Data technologies live, the ASF seems like the obvious home for a
    > system like AsterixDB.
    >
    > Current Status
    >
    > The current version of AsterixDB was co-developed by a team of
    > faculty, staff, and students at UC Irvine and UC Riverside. The
    > project was initiated as a large NSF-sponsored project in 2009, the
    > goal of which was to combine the best ideas from the parallel
    > database world, the then new Hadoop world, and the semi-structured
    > (e.g., XML/JSON) data world in order to create a next-generation
    > BDMS. A first informal open source release was made four years
    later,
    > in June of 2013, under the Apache Software License 2.0.
    >
    >
    > Meritocracy
    >
    > The current developers are familiar with meritocratic open source
    > development at Apache. Apache was chosen specifically because
    we want
    > to encourage this style of development for the project.
    >
    >
    > Community
    >
    > While AsterixDB started as a university project it has
    developed into
    > a community. A number of the initial committers started
    contributing
    > in academia and continue to actively participate and contribute
    after
    > graduation. And we seek to further develop developer and user
    > communities. One way to broaden the community that is ongoing is
    > through academic collaborations (currently with IIT Mumbai in India
    > and TU Berlin in Germany). During incubation we will also
    explicitly
    > seek increased industrial participation.
    >
    > Some indicators of the effort's development community and
    history can
    > be
    > found at:
    >
    https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo,
    >
    https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo
    >
    >
    > Core Developers
    >
    > The core developers of the project are diverse, although
    initially UC
    > Irvine heavy (roughly 50) due to the project's origins at UCI. The
    > other 50 are from other academic institutions (UC Riverside and the
    > Hebrew University in Jerusalem) and companies (Couchbase, Facebook,
    > IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software).
    >
    >
    > Alignment
    >
    > Apache is, by far, the most natural home for taking the AsterixDB
    > project forward. A large fraction of today's top Big Data
    > technologies have their homes in Apache, including Hadoop,
    YARN, Pig,
    > Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a
    > significant gap -- the parallel data management system gap -- that
    > exists in the Big Data open source world. It is well-aligned with a
    > number of the Apache projects, e.g., it has strong support for
    > accessing and indexing external data in HDFS, and it uses YARN
    as an
    > answer to basic cluster resource management. AsterixDB also
    seeks to
    > achieve an Apache-style development model; it is seeking a broader
    > community of contributors and users in order to achieve its full
    > potential and value to the Big Data community.
    >
    > There are also a number of related Apache projects and dependencies
    > that will be mentioned below in the Relationships with Other Apache
    > products section.
    >
    >
    > Known Risks
    >
    > Orphaned products
    >
    > Given the current level of intellectual investment in
    AsterixDB, the
    > risk of the project being abandoned is very small. The UCI/UCR
    > faculty team leads are highly incentivized to continue development
    > since the database groups at UC Irvine and UC Riverside are both
    > reliant on AsterixDB as a platform for long-term graduate research
    > projects. UC San Diego is also beginning to contribute to the code
    > base, and a collaboration involving public health applications is
    > forming with UCLA. The work on AsterixDB is managed via a mix of
    > mailing list discussions supplemented by weekly project status
    > meetings which are summarized on the mailing list. Typical (local
    > plus Skype-in) attendance to the weekly status meetings runs at
    about
    > 20 active contributors.
    >
    >
    > Inexperience with Open Source
    >
    > AsterixDB and Hyracks were completely developed in Open Source
    under
    > the ASL 2.0. The source code repositories, issue tracker, and
    mailing
    > lists are available on Google Code and discussions and decisions
    > happen on the mailing lists (which is necessary due to the
    geographic
    > distribution of the current developers).
    >
    > Also a few of the initial committers have contributed to Apache
    > projects. Vinayak Borkar is a committer on the Apache Helix and
    > Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF
    > and an IPMC member. Preston Carman and Steven Jacobs are committers
    > on the Apache VXQuery project.
    >
    >
    > Relationships with Other Apache Products
    >
    > Apache VXQuery is based on the Hyracks data-parallel runtime, which
    > is also included in the AsterixDB code base.
    >
    > AsterixDB is closely related to Apache Hadoop. Included in
    AsterixDB
    > is support for accessing external data in HDFS (and Hive formats),
    > and resource management and system administration features are
    in the
    > process of being migrated to YARN.
    >
    > AsterixDB's AQL query facilities offer comparable query power to
    > Apache's Pig and Hive systems for big data analytics. AsterixDB
    > differs in storing and indexing data and thus being able to quickly
    > answer small and medium queries without large HDFS data scans -
    > thereby targeting a different class of use cases.
    >
    > AsterixDB's data storage and indexing facilities are similar to
    those
    > of HBase, but AsterixDB differs in being a much more complete and
    > queryable BDMS (not just a key-value style store).
    >
    > AsterixDB's target use cases are not in-memory processing or
    > iterative algorithm support, making AsterixDB complementary to the
    > Apache Spark platform. (Spark interoperability is on our
    longer-term
    > to-do wishlist.)
    >
    >
    > Homogeneous Developers
    >
    > As mentioned before the current community is already
    organizationally
    > and geographically distributed - and we would like to increase the
    > heterogeneity.
    >
    >
    > Reliance on Salaried Developers
    >
    > Of the initial committers only 3 are full-time UCI staff. The other
    > committers are a mix of students, alumni who continue to contribute
    > to the effort, and individuals working with permission
    part-time (or
    > in spare time) on this project.
    >
    >
    > A Excessive Fascination with the Apache Brand
    >
    > We believe in the processes, systems, and framework Apache has
    put in
    > place. Apache is also known to foster a great community around
    their
    > projects and provide exposure. While brand is important, our
    > fascination with it is not excessive. We believe that the ASF
    is the
    > right home for AsterixDB and that having AsterixDB inside of
    the ASF
    > will lead to a better long-term outcome for the Big Data community.
    >
    >
    > Documentation
    >
    > Documentation and publications related to AsterixDB can be found at
    > http://asterixdb.ics.uci.edu/.
    >
    >
    > Initial Source
    >
    > Current source resides in Google code:
    > https://code.google.com/p/asterixdb/ (query language and upper
    system
    > layers) and https://code.google.com/p/hyracks/ (dataflow runtime
    > system and storage management libraries).
    >
    >
    > External Dependencies
    >
    > AsterixDB depends on a number of Apache projects:
    >
    > - Ant
    > - Avro
    > - ApacheDB JDO
    > - Commons
    > - Derby
    > - Hadoop
    > - Hive
    > - HTTPComponents
    > - Jakarta ORO
    > - Maven
    > - Tomcat
    > - Thrift
    > - Velocity
    > - Wicket
    > - Xerces
    >
    > and other open source projects (organized by license):
    >
    > -- ASL 2.0:
    >  - Jackson
    >  - Google Guava
    >  - Google Guice
    >  - JSON-simple
    >  - BoneCP
    >  - Microsoft Azure SDK
    >  - Netty
    >  - Rome
    >  - JetS3t
    >  - Groovy
    >  - Jettison
    >  - Plexus
    >  - Datanucleus (JDO)
    >  - Jetty
    >  - Twitter4J
    >  - Snappy-java
    >
    > -- BSD:
    >  - Antlr
    >  - ObjectWeb ASM
    >  - Protobuf
    >  - JSCH
    >  - JavaCC
    >  - Paranamer
    >  - JLine
    >  - Stax
    >  - StringTemplate
    >  - xmlEnc
    >
    > -- MIT
    >  - AppAssembler
    >  - SimpleLog4J
    >
    > -- CDDL 1.0
    >  - Java Activation Framework
    >  - Java Transactions
    >  - Java Servlet API
    >  - Grizzly
    >  - gmbal
    >  - Glassfish
    >
    > -- CDDL 1.1
    >  - Jersey
    >  - JAXB Reference Implementation
    >
    > -- JSON License
    >  - JSON
    >
    > -- EPL 1.0
    >  - JUnit
    >
    > -- JDOM License
    >  - JDOM
    >
    > -- Public Domain
    >  - xz
    >  - AOPAlliance
    >
    > As all dependencies are managed using Apache Maven, none of the
    > external libraries need to be packaged in a source distribution.
    >
    >
    > Required Resources
    >
    > Developer and user mailing lists
    >
    > priv...@asterixdb.incubator.apache.org
    <mailto:priv...@asterixdb.incubator.apache.org> (with moderated
    subscriptions)
    > comm...@asterixdb.incubator.apache.org
    <mailto:comm...@asterixdb.incubator.apache.org>
    > d...@asterixdb.incubator.apache.org
    <mailto:d...@asterixdb.incubator.apache.org>
    > us...@asterixdb.incubator.apache.org
    <mailto:us...@asterixdb.incubator.apache.org>
    >
    >
    > A git repository
    >
    > https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git
    >
    >
    > A JIRA issue tracker
    >
    > https://issues.apache.org/jira/browse/ASTERIXDB
    >
    >
    > Initial Committers
    >
    > The following is a list of the planned initial Apache
    committers (the
    > active subset of the committers for the current repository at
    Google
    > code).
    >
    > Abdullah Alamoudi (bamou...@gmail.com <mailto:bamou...@gmail.com>)
    > Cameron Samak (euf...@gmail.com <mailto:euf...@gmail.com>)
    > Chen Li (che...@gmail.com <mailto:che...@gmail.com>)
    > Ian Maxon (ima...@uci.edu <mailto:ima...@uci.edu>)
    > Ildar Absalyamov (ildar.absalya...@gmail.com
    <mailto:ildar.absalya...@gmail.com>)
    > Jianfeng Jia (jianfeng....@gmail.com
    <mailto:jianfeng....@gmail.com>)
    > Karen Ouaknine (ker...@gmail.com <mailto:ker...@gmail.com>)
    > Markus Dreseler (apa...@dreseler.de <mailto:apa...@dreseler.de>)
    > Mike Carey (dtab...@apache.org <mailto:dtab...@apache.org>)
    > Murtadha Hubail (hubail...@gmail.com <mailto:hubail...@gmail.com>)
    > Pouria Pirzadeh (pouria.pirza...@gmail.com
    <mailto:pouria.pirza...@gmail.com>)
    > Preston Carman (prest...@apache.org <mailto:prest...@apache.org>)
    > Raman Grover (ramangrove...@gmail.com
    <mailto:ramangrove...@gmail.com>)
    > Sattam Alsubaiee (salsuba...@gmail.com
    <mailto:salsuba...@gmail.com>)
    > Steven Jacobs (sjaco...@apache.org <mailto:sjaco...@apache.org>)
    > Taewoo Kim (wangs...@gmail.com <mailto:wangs...@gmail.com>)
    > Till Westmann (ti...@apache.org <mailto:ti...@apache.org>)
    > Vinayak Borkar (vinay...@apache.org <mailto:vinay...@apache.org>)
    > Yingyi Bu (buyin...@gmail.com <mailto:buyin...@gmail.com>)
    > Young-Seok Kim (kiss...@gmail.com <mailto:kiss...@gmail.com>)
    > Zach Heilbron (zheilb...@gmail.com <mailto:zheilb...@gmail.com>)
    >
    >
    > Affiliations
    >
    > UC Irvine
    > - Mike Carey
    > - Chen Li
    > - Ian Maxon
    > - Yingyi Bu
    > - Raman Grover
    > - Pouria Pirzadeh
    > - Young-Seok Kim
    > - Cameron Samak
    > - Taewoo Kim
    > - Jianfeng Jia
    > - Murtadha Hubail
    > - Markus Dreseler
    >
    > UC Riverside
    > - Ildar Absalyamov
    > - Preston Carman
    > - Steven Jacobs
    >
    > Hebrew University
    > - Keren Ouaknine
    >
    > Oracle
    > - Till Westmann
    >
    > X15 Software
    > - Vinayak Borkar
    > - Zach Heilbron
    >
    > KACST Saudi Arabia
    > - Sattam Alsubaiee
    >
    > Saudi Aramco
    > - Abdullah Alamoudi
    >
    > Carey, Li, and Maxon are full-time UCI staff, with the
    remaining UCI
    > (UC Irvine) and UCR (UC Riverside) affiliates being students. The
    > non-UC committers are a mix of alumni who continue to contribute to
    > the effort and individuals working with permission part-time (or in
    > spare time) on this project.
    >
    >
    > Sponsors
    >
    > Champion
    >
    > Chris Mattmann (NASA/JPL)
    >
    > Nominated Mentors
    >
    > TBD
    >
    > Sponsoring Entity
    >
    > The Apache Incubator
    >
    >
    >
    >
    >
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > Chris Mattmann, Ph.D.
    > Chief Architect
    > Instrument Software and Science Data Systems Section (398)
    > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    > Office: 168-519, Mailstop: 168-527
    > Email: chris.a.mattm...@nasa.gov <mailto:chris.a.mattm...@nasa.gov>
    > WWW: http://sunset.usc.edu/~mattmann/
    <http://sunset.usc.edu/%7Emattmann/>
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > Adjunct Associate Professor, Computer Science Department
    > University of Southern California, Los Angeles, CA 90089 USA
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    >
    >
    >
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
    <mailto:general-unsubscr...@incubator.apache.org>
    For additional commands, e-mail:
    general-h...@incubator.apache.org
    <mailto:general-h...@incubator.apache.org>

Re: [PROPOSAL] Apache AsterixDB Incubator

Reply via email to