+1 (non-binding) Regards Ram
-----Original Message----- From: James Taylor [mailto:jamestay...@apache.org] Sent: Monday, February 29, 2016 11:34 PM To: general@incubator.apache.org Subject: Re: [VOTE] Accept Mnemonic into the Apache Incubator +1 (binding) On Mon, Feb 29, 2016 at 10:03 AM, Phillip Rhodes <motley.crue....@gmail.com> wrote: > +1 > On Feb 29, 2016 12:57, "Henry Saputra" <henry.sapu...@gmail.com> wrote: > > > +1 (Binding) > > > > - Henry > > > > On Mon, Feb 29, 2016 at 9:37 AM, Patrick Hunt <ph...@apache.org> wrote: > > > > > Hi folks, > > > > > > OK the discussion is now completed. Please VOTE to accept Mnemonic > > > into the Apache Incubator. I’ll leave the VOTE open for at least > > > the next 72 hours, with hopes to close it Thursday the 3rd of > > > March, 2016 at 10am PT. > > > https://wiki.apache.org/incubator/MnemonicProposal > > > > > > [ ] +1 Accept Mnemonic as an Apache Incubator podling. > > > [ ] +0 Abstain. > > > [ ] -1 Don’t accept Mnemonic as an Apache Incubator podling because.. > > > > > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC > > > members are binding but all are welcome to VOTE! > > > > > > Regards, > > > > > > Patrick > > > > > > -------------------- > > > = Mnemonic Proposal = > > > === Abstract === > > > Mnemonic is a Java based non-volatile memory library for in-place > > > structured data processing and computing. It is a solution for > > > generic object and block persistence on heterogeneous block and > > > byte-addressable devices, such as DRAM, persistent memory, NVMe, > > > SSD, and cloud network storage. > > > > > > === Proposal === > > > Mnemonic is a structured data persistence in-memory in-place > > > library for Java-based applications and frameworks. It provides > > > unified interfaces for data manipulation on heterogeneous > > > block/byte-addressable devices, such as DRAM, persistent memory, > > > NVMe, SSD, and cloud network devices. > > > > > > The design motivation for this project is to create a non-volatile > > > programming paradigm for in-memory data object persistence, > > > in-memory data objects caching, and JNI-less IPC. > > > Mnemonic simplifies the usage of data object caching, persistence, > > > and JNI-less IPC for massive object oriented structural datasets. > > > > > > Mnemonic defines Non-Volatile Java objects that store data fields > > > in persistent memory and storage. During the program runtime, only > > > methods and volatile fields are instantiated in Java heap, > > > Non-Volatile data fields are directly accessed via GET/SET > > > operation to and from persistent memory and storage. Mnemonic > > > avoids SerDes and significantly reduces amount of garbage in Java heap. > > > > > > Major features of Mnemonic: > > > * Provides an abstract level of viewpoint to utilize heterogeneous > > > block/byte-addressable device as a whole (e.g., DRAM, persistent > > > memory, NVMe, SSD, HD, cloud network Storage). > > > > > > * Provides seamless support object oriented design and programming > > > without adding burden to transfer object data to different form. > > > > > > * Avoids the object data serialization/de-serialization for data > > > retrieval, caching and storage. > > > > > > * Reduces the consumption of on-heap memory and in turn to reduce > > > and stabilize Java Garbage Collection (GC) pauses for latency > > > sensitive applications. > > > > > > * Overcomes current limitations of Java GC to manage much larger > > > memory resources for massive dataset processing and computing. > > > > > > * Supports the migration data usage model from traditional > > > NVMe/SSD/HD to non-volatile memory with ease. > > > > > > * Uses lazy loading mechanism to avoid unnecessary memory > > > consumption if some data does not need to use for computing immediately. > > > > > > * Bypasses JNI call for the interaction between Java runtime > > > application and its native code. > > > > > > * Provides an allocation aware auto-reclaim mechanism to prevent > > > external memory resource leaking. > > > > > > > > > === Background === > > > Big Data and Cloud applications increasingly require both high > > > throughput and low latency processing. Java-based applications > > > targeting the Big Data and Cloud space should be tuned for better > > > throughput, lower latency, and more predictable response time. > > > Typically, there are some issues that impact BigData applications' > > > performance and scalability: > > > > > > 1) The Complexity of Data Transformation/Organization: In most > > > cases, during data processing, applications use their own > > > complicated data caching mechanism for SerDes data objects, > > > spilling to different storage and eviction large amount of data. > > > Some data objects contains complex values and structure that will > > > make it much more difficulty for data organization. To load and > > > then parse/decode its datasets from storage consumes high system resource > > > and computation power. > > > > > > 2) Lack of Caching, Burst Temporary Object Creation/Destruction > > > Causes Frequent Long GC Pauses: Big Data computing/syntax > > > generates large amount of temporary objects during processing, > > > e.g. lambda, SerDes, copying and etc. This will trigger frequent > > > long Java GC pause to scan references, to update references lists, > > > and to copy live objects from one memory location to another blindly. > > > > > > 3) The Unpredictable GC Pause: For latency sensitive applications, > > > such as database, search engine, web query, real-time/streaming > > > computing, require latency/request-response under control. But > > > current Java GC does not provide predictable GC activities with > > > large on-heap memory management. > > > > > > 4) High JNI Invocation Cost: JNI calls are expensive, but high > > > performance applications usually try to leverage native code to > > > improve performance, however, JNI calls need to convert Java > > > objects into something that C/C++ can understand. In addition, > > > some comprehensive native code needs to communicate with Java > > > based application that will cause frequently JNI call along with > > > stack marshalling. > > > > > > Mnemonic project provides a solution to address above issues and > > > performance bottlenecks for structured data processing and computing. > > > It also simplifies the massive data handling with much reduced GC > > > activity. > > > > > > === Rationale === > > > There are strong needs for a cohesive, easy-to-use non-volatile > > > programing model for unified heterogeneous memory resources > > > management and allocation. Mnemonic project provides a reusable > > > and flexible framework to accommodate other special type of > > > memory/block devices for better performance without changing client code. > > > > > > Most of the BigData frameworks (e.g., Apache Spark™, Apache™ > > > Hadoop®, Apache HBase™, Apache Flink™, Apache Kafka™, etc.) have > > > their own complicated memory management modules for caching and > > > checkpoint. Many approaches increase the complexity and are > > > error-prone to maintain code. > > > > > > We have observed heavy overheads during the operations of data > > > parse, SerDes, pack/unpack, code/decode for data loading, storage, > > > checkpoint, caching, marshal and transferring. Mnemonic provides a > > > generic in-memory persistence object model to address those > > > overheads for better performance. In addition, it manages its > > > in-memory persistence objects and blocks in the way that GC does, > > > which means their underlying memory resource is able to be > > > reclaimed without explicitly releasing it. > > > > > > Some existing Big Data applications suffer from poor Java GC > > > behaviors when they process their massive unstructured datasets. > > > Those behaviors either cause very long stop-the-world GC pauses or > > > take significant system resources during computing which impact > > > throughput and incur significant perceivable pauses for interactive > > > analytics. > > > > > > There are more and more computing intensive Big Data applications > > > moving down to rely on JNI to offload their computing tasks to > > > native code which dramatically increases the cost of JNI invocation and > > > IPC. > > > Mnemonic provides a mechanism to communicate with native code > > > directly through in-place object data update to avoid complex > > > object data type conversion and stack marshaling. In addition, > > > this project can be extended to support various lockers for > > > threads between Java code and native code. > > > > > > === Initial Goals === > > > Our initial goal is to bring Mnemonic into the ASF and transit the > > > engineering and governance processes to the "Apache Way." We > > > would like to enrich a collaborative development model that > > > closely aligns with current and future industry memory and storage > > > technologies. > > > > > > Another important goal is to encourage efforts to integrate > > > non-volatile programming model into data centric > > > processing/analytics frameworks/applications, (e.g., Apache > > > Spark™, Apache HBase™, Apache Flink™, Apache™ Hadoop®, Apache Cassandra™, > > > etc.). > > > > > > We expect Mnemonic project to be continuously developing new > > > functionalities in an open, community-driven way. We envision > > > accelerating innovation under ASF governance in order to meet the > > > requirements of a wide variety of use cases for in-memory > > > non-volatile and volatile data caching programming. > > > > > > === Current Status === > > > Mnemonic project is available at Intel’s internal repository and > > > managed by its designers and developers. It is also temporary > > > hosted at Github for general view > > > https://github.com/NonVolatileComputing/Mnemonic.git > > > > > > We have integrated this project for Apache Spark™ 1.5.0 and get 2X > > > performance improvement ratio for Spark™ MLlib k-means workload > > > and observed expected benefits of removing SerDes, reducing total > > > GC pause time by 40% from our experiments. > > > > > > ==== Meritocracy ==== > > > Mnemonic was originally created by Gang (Gary) Wang and Yanping > > > Wang in early 2015. The initial committers are the current > > > Mnemonic R&D team members from US, China, and India Big Data > > > Technologies Group at Intel. This group will form a base for much > > > broader community to collaborate on this code base. > > > > > > We intend to radically expand the initial developer and user > > > community by running the project in accordance with the "Apache > > > Way." Users and new contributors will be treated with respect and > > > welcomed. By participating in the community and providing quality > > > patches/support that move the project forward, they will earn > > > merit. They also will be encouraged to provide non-code > > > contributions (documentation, events, community management, etc.) > > > and will gain merit for doing so. Those with a proven support and > > > quality track record will be encouraged to become committers. > > > > > > ==== Community ==== > > > If Mnemonic is accepted for incubation, the primary initial goal > > > is to transit the core community towards embracing the Apache Way > > > of project governance. We would solicit major existing > > > contributors to become committers on the project from the start. > > > > > > ==== Core Developers ==== > > > Mnemonic core developers are all skilled software developers and > > > system performance engineers at Intel Corp with years of > > > experiences in their fields. They have contributed many code to Apache > > > projects. > > > There are PMCs and experienced committers have been working with > > > us from Apache Spark™, Apache HBase™, Apache Phoenix™, Apache™ > > > Hadoop® for this project's open source efforts. > > > > > > === Alignment === > > > The initial code base is targeted to data centric processing and > > > analyzing in general. Mnemonic has been building the connection > > > and integration for Apache projects and other projects. > > > > > > We believe Mnemonic will be evolved to become a promising project > > > for real-time processing, in-memory streaming analytics and more, > > > along with current and future new server platforms with persistent > > > memory as base storage devices. > > > > > > === Known Risks === > > > ==== Orphaned products ==== > > > Intel’s Big Data Technologies Group is actively working with > > > community on integrating this project to Big Data frameworks and > > > applications. > > > We are continuously adding new concepts and codes to this project > > > and support new usage cases and features for Apache Big Data ecosystem. > > > > > > The project contributors are leading contributors of Hadoop-based > > > technologies and have a long standing in the Hadoop community. As > > > we are addressing major Big Data processing performance issues, > > > there is minimal risk of this work becoming non-strategic and unsupported. > > > > > > Our contributors are confident that a larger community will be > > > formed within the project in a relatively short period of time. > > > > > > ==== Inexperience with Open Source ==== This project has long > > > standing experienced mentors and interested contributors from > > > Apache Spark™, Apache HBase™, Apache Phoenix™, Apache™ Hadoop® to > > > help us moving through open source process. We are actively > > > working with experienced Apache community PMCs and committers to > > > improve our project and further testing. > > > > > > ==== Homogeneous Developers ==== > > > All initial committers and interested contributors are employed at > > > Intel. As an infrastructure memory project, there are wide range > > > of Apache projects are interested in innovative memory project to > > > fit large sized persistent memory and storage devices. Various > > > Apache projects such as Apache Spark™, Apache HBase™, Apache > > > Phoenix™, Apache Flink™, Apache Cassandra™ etc. can take good > > > advantage of this project to overcome > > > serialization/de-serialization, Java GC, and caching issues. We > > > expect a wide range of interest will be generated after we open source > > > this project to Apache. > > > > > > ==== Reliance on Salaried Developers ==== All developers are paid > > > by their employers to contribute to this project. We welcome all > > > others to contribute to this project after it is open sourced. > > > > > > ==== Relationships with Other Apache Product ==== Relationship > > > with Apache™ Arrow: > > > Arrow's columnar data layout allows great use of CPU caches & > > > SIMD. It places all data that relevant to a column operation in a > > > compact format in memory. > > > > > > Mnemonic directly puts the whole business object graphs on > > > external heterogeneous storage media, e.g. off-heap, SSD. It is > > > not necessary to normalize the structures of object graphs for > > > caching, checkpoint or storing. It doesn’t require developers to > > > normalize their data object graphs. Mnemonic applications can > > > avoid indexing & join datasets compared to traditional approaches. > > > > > > Mnemonic can leverage Arrow to transparently re-layout qualified > > > data objects or create special containers that is able to > > > efficiently hold those data records in columnar form as one of > > > major performance optimization constructs. > > > > > > Mnemonic can be integrated into various Big Data and Cloud > > > frameworks and applications. > > > We are currently working on several Apache projects with Mnemonic: > > > For Apache Spark™ we are integrating Mnemonic to improve: > > > a) Local checkpoints > > > b) Memory management for caching > > > c) Persistent memory datasets input > > > d) Non-Volatile RDD operations > > > The best use case for Apache Spark™ computing is that the input > > > data is stored in form of Mnemonic native storage to avoid caching > > > its row data for iterative processing. Moreover, Spark > > > applications can leverage Mnemonic to perform data transforming in > > > persistent or non-persistent memory without SerDes. > > > > > > For Apache™ Hadoop®, we are integrating HDFS Caching with Mnemonic > > > instead of mmap. This will take advantage of persistent memory > > > related features. We also plan to evaluate to integrate in > > > Namenode Editlog, FSImage persistent data into Mnemonic persistent memory > > > area. > > > > > > For Apache HBase™, we are using Mnemonic for BucketCache and > > > evaluating performance improvements. > > > > > > We expect Mnemonic will be further developed and integrated into > > > many Apache BigData projects and so on, to enhance memory > > > management solutions for much improved performance and reliability. > > > > > > ==== An Excessive Fascination with the Apache Brand ==== While we > > > expect Apache brand helps to attract more contributors, our > > > interests in starting this project is based on the factors > > > mentioned in the Rationale section. > > > > > > We would like Mnemonic to become an Apache project to further > > > foster a healthy community of contributors and consumers in > > > BigData technology R&D areas. Since Mnemonic can directly benefit > > > many Apache projects and solves major performance problems, we > > > expect the Apache Software Foundation to increase interaction with the > > > larger community as well. > > > > > > === Documentation === > > > The documentation is currently available at Intel and will be > > > posted > > > under: https://mnemonic.incubator.apache.org/docs > > > > > > === Initial Source === > > > Initial source code is temporary hosted Github for general viewing: > > > https://github.com/NonVolatileComputing/Mnemonic.git > > > It will be moved to Apache http://git.apache.org/ after podling. > > > > > > The initial Source is written in Java code (88%) and mixed with > > > JNI C code (11%) and shell script (1%) for underlying native > > > allocation libraries. > > > > > > === Source and Intellectual Property Submission Plan === As soon > > > as Mnemonic is approved to join the Incubator, the source code > > > will be transitioned via the Software Grant Agreement onto ASF > > > infrastructure and in turn made available under the Apache > > > License, version 2.0. > > > > > > === External Dependencies === > > > The required external dependencies are all Apache licenses or > > > other compatible Licenses > > > Note: The runtime dependent licenses of Mnemonic are all declared > > > as Apache 2.0, the GNU licensed components are used for Mnemonic > > > build and deployment. The Mnemonic JNI libraries are built using > > > the GNU tools. > > > > > > maven and its plugins (http://maven.apache.org/ ) [Apache 2.0] > > > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK > > > License] Nvml (http://pmem.io ) [optional] [Open Source] PMalloc > > > (https://github.com/bigdata-memory/pmalloc ) [optional] > [Apache > > > 2.0] > > > > > > Build and test dependencies: > > > org.testng.testng v6.8.17 (http://testng.org) [Apache 2.0] > > > org.flowcomputing.commons.commons-resgc v0.8.7 [Apache 2.0] > > > org.flowcomputing.commons.commons-primitives v.0.6.0 [Apache 2.0] > > > com.squareup.javapoet v1.3.1-SNAPSHOT [Apache 2.0] > > > JDK8 or OpenJDK 8 (http://java.com/) [Oracle or Openjdk JDK > > > License] > > > > > > === Cryptography === > > > Project Mnemonic does not use cryptography itself, however, Hadoop > > > projects use standard APIs and tools for SSH and SSL communication > > > where necessary. > > > > > > === Required Resources === > > > We request that following resources be created for the project to > > > use > > > > > > ==== Mailing lists ==== > > > priv...@mnemonic.incubator.apache.org (moderated subscriptions) > > > comm...@mnemonic.incubator.apache.org > > > d...@mnemonic.incubator.apache.org > > > > > > ==== Git repository ==== > > > https://github.com/apache/incubator-mnemonic > > > > > > ==== Documentation ==== > > > https://mnemonic.incubator.apache.org/docs/ > > > > > > ==== JIRA instance ==== > > > https://issues.apache.org/jira/browse/mnemonic > > > > > > === Initial Committers === > > > * Gang (Gary) Wang (gang1 dot wang at intel dot com) > > > > > > * Yanping Wang (yanping dot wang at intel dot com) > > > > > > * Uma Maheswara Rao G (umamahesh at apache dot org) > > > > > > * Kai Zheng (drankye at apache dot org) > > > > > > * Rakesh Radhakrishnan Potty (rakeshr at apache dot org) > > > > > > * Sean Zhong (seanzhong at apache dot org) > > > > > > * Henry Saputra (hsaputra at apache dot org) > > > > > > * Hao Cheng (hao dot cheng at intel dot com) > > > > > > === Additional Interested Contributors === > > > * Debo Dutta (dedutta at cisco dot com) > > > > > > * Liang Chen (chenliang613 at Huawei dot com) > > > > > > === Affiliations === > > > * Gang (Gary) Wang, Intel > > > > > > * Yanping Wang, Intel > > > > > > * Uma Maheswara Rao G, Intel > > > > > > * Kai Zheng, Intel > > > > > > * Rakesh Radhakrishnan Potty, Intel > > > > > > * Sean Zhong, Intel > > > > > > * Henry Saputra, Independent > > > > > > * Hao Cheng, Intel > > > > > > === Sponsors === > > > ==== Champion ==== > > > Patrick Hunt > > > > > > ==== Nominated Mentors ==== > > > * Patrick Hunt <phunt at apache dot org> - Apache IPMC member > > > > > > * Andrew Purtell <apurtell at apache dot org > - Apache IPMC > > > member > > > > > > * James Taylor <jamestaylor at apache dot org> - Apache IPMC > > > member > > > > > > * Henry Saputra <hsaputra at apache dot org> - Apache IPMC member > > > > > > ==== Sponsoring Entity ==== > > > Apache Incubator PMC > > > > > > ------------------------------------------------------------------ > > > --- To unsubscribe, e-mail: > > > general-unsubscr...@incubator.apache.org > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org