Proposal looks very good and clear. This project should be a good addition.
+1 (non-binding) Regards, Uma On 3/17/16, 1:17 PM, "Daniel Dai" <dai...@gmail.com> wrote: >Hi, > >I would like to propose Omid as an Apache Incubator project: > >https://wiki.apache.org/incubator/OmidProposal > >I've posted posted the text of the proposal below: > >Thanks, >Daniel > >= Omid Proposal = > >=== Abstract === > >Omid is a flexible, reliable, high performant and scalable ACID >transactional framework that allows client applications to execute >transactions on top of MVCC key/value-based NoSQL datastores >(currently Apache HBase) providing Snapshot Isolation guarantees on >the accessed data. > > >=== Proposal === > >Omid is a flexible open-source transactional framework that provides >ACID transactions with Snapshot Isolation guarantees on top of NoSQL >datastores. In particular, the current codebase brings the concept of >transactions to the popular Apache HBase datastore. Omid offers great >performance, it is highly available, and scalable. Omid's current >version is able to scale to thousands of clients triggering concurrent >transactions on application data stored in HBase. Omid can scale >beyond 100K transactions per second on mid-range hardware while >incurring in a minimal impact on the speed of data access in the >datastore. We¹re currently experimenting with a prototype version that >can improve the performance up to ~380K TPS. > > >Omid has been publicly available as an open-source project in Github >under Apache License Version 2.0 since 2011 [1]. During these years, >it has generated certain interest in the open source community, >especially since the public presentation of the first version in >Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and >93 forks. Yahoo Inc. submits this proposal to the Apache Software >Foundation with the aim to transfer the Omid project -including its >source code and documentation- to Apache in order to start the build >of a stable open source community around it. > > >[1] https://github.com/yahoo/omid > >[2] Omid presentation at Hadoop Summit 2013: >https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyq >LU464Nxz4aQe7EPBus > > >=== Background === > >An Omid prototype was first released as an open-source project back in >2011. Inspired by Google Percolator [1], it offered a lock-free >approach to transactions in NoSQL datastores (See [2]). However, >during these years, the design of Omid has evolved significantly. >Whilst the current open-sourced version maintains many aspects of the >original implementation, it is the result of a major redesign of the >first prototype released in 2011. > > >Omid has now a more decentralized design that does not sacrifice the >consistency and performance of the original version. The current >design also enables Omid to scale to thousands of clients executing >transactions concurrently on application data stored in HBase. >Internally, Omid still utilizes a lock-free approach to support >multiple concurrent clients. Its design also relies on a centralized >conflict detection component, the TSO, which now resolves in an >efficient manner writeset collisions among concurrent transactions >without having to piggyback commit information to the clients. Another >important benefit of Omid is that it doesn't require any modification >of the underlying key-value datastore, HBase in this case. Moreover, >the recently added high availability algorithm allows to eliminate the >single point of failure represented by the TSO in those system >deployments requiring a higher degree of dependability. Last but not >least, the provided user API is very simple, mimicking transaction >managers in the relational world: begin, commit, rollback. > > >Omid is used internally at Yahoo. Sieve, Yahoo¹s web-scale content >management platform powering some of next-generation search and >personalization products is using Omid as a transaction manager in its >processing pipeline. Sieve essentially acts as a huge processing hub >between content feeds and serving systems. It provides an environment >for highly customizable, real-time, streamed information processing, >with typical discovery-to-service latencies of just a few seconds. In >terms of scale and availability, Omid¹s new design was largely driven >by Sieve¹s requirements. > > >At Yahoo, we are also making an effort to disseminate the current >status of the project through blog entries (See [3], [4] and [5]) and >submissions to technical and academic conferences such as ATC 2016, >Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also >appeared in a TechCrunch article in the last quarter of 2015 (See [6]) > > >[1] D. Peng and F. Dabek, Large-scale Incremental Processing Using >Distributed Transactions and Notifications. USENIX Symposium on >Operating Systems Design and Implementation, 2010 > >[2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. >Omid: Lock-free transactional support for distributed data stores. In >Proc. of ICDE, 2013. > >[3] >http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transacti >on-processing-for > >[4] >http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-prot >ocol > >[5] >http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid > >[6] >http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-sc >alable-transaction-processing-to-hbase/ > > >=== Rationale === > >Programming with ACID (Atomicity, Consistency, Isolation, Durability) >transactions is very popular and it is featured in relational >databases. However, in the Big Data ecosystem, applications typically >use NoSQL datastores, which do not provide ACID transactions. Such >NoSQL datastores used to give up transactional support for greater >agility and scalability. However, while early NoSQL data store >implementations did not include transaction support, the need for >transactions soon emerged in Big Data applications when accessing >shared data; for example, transactions are very important for >modern, scalable systems that process content incrementally. > > >NoSQL datastores -including HBase- don¹t provide transactional >frameworks to coordinate the access to the underlying data for >preserving consistency. By using Omid, Big Data applications that need >to bundle multiple read and write operations on HBase into logically >indivisible units of work can execute transactions with ACID >properties, just as they would use transactions in the relational >database world. Omid extends the HBase key-value access APl with >transaction semantics. It can be exercised either directly, or via >higher level data management API¹s. For example, Apache Phoenix >(SQL-on-top-of-HBase) might use Omid as its transaction management >component. > > >The following features make Omid an attractive choice for system >designers and other projects in the Apache community: > > >* Semantics. Omid implements Snapshot Isolation (SI,) supported by >major SQL and NoSQL technologies (e.g. Google Percolator). > > >* Performance and Scalability. Omid provides a highly scalable, >lock-free implementation of SI. To the best of our knowledge, it is >also one of the few open source NoSQL transactional platforms that can >execute more than 100K transactions per second [1]. A new prototype >still in development can go even further, up to ~380K TPS. > > >* Reliability. Omid has a high-availability (HA) mode, in which the >core service performing writeset conflict resolution operates as >primary-backup process pair with automatic failover. The HA support >has zero overhead on the mainstream operation. > > >* Adaptability. Omid current version provides transactions on data >stored in Apache HBase. However, Omid¹s components are generic enough >to be adapted to any other key-value NoSQL datasource that supports >MVCC. > > >* Development. Omid provides a very simple interface that mimics >standard HBase APIs, making it developer friendly. Only minimal >extensions to the standard interfaces have been introduced to enable >transactions. > > >* Simplicity. Omid leverages the HBase infrastructure for managing its >own metadata. It entails no additional services apart from those >provided and used by HBase. > > >* Track Record. As we have mentioned, Omid is already in use by >very-large-scale production systems at Yahoo. Also, Hortonworks is >integrating Omid in a metastore implementation for Hive based on >HBase. > >[1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance > > >=== Current Status === >Current Omid implementation is available in both, Yahoo¹s internal >Github repository for internal use at Yahoo as well as in Yahoo¹s >Github public repository (https://github.com/yahoo/omid.git). Both >repositories are managed by Omid¹s current developers at Yahoo. > >As it is mentioned above, Yahoo is currently using Omid for providing >transactions in Sieve, a web-scale content management platform that >powers Yahoo¹s next-generation search and personalization products. > > >==== Meritocracy ==== >The first version of Omid was originally created in 2011 by Maysam >Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio >Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain. > > >During the years after its inception, Omid has matured to operate at >Web scale and has been used internally by strategic projects at Yahoo >such as Sieve. The current base of committers belong to the Yahoo team >that took over the initial Omid prototype and rewrote it to meet the >high availability and scalability requirements of the Sieve project. >This base of committers has recently incorporated Hortonworks members >that helped in the Omid adaptation to HBase 1.x versions. > > >With this initial committer base, we aim to form a larger community >that can collaborate with new ideas over the current code base. This >new community will run the project following the "Apache Way" >(http://apache.org/foundation/governance/). Users and new contributors >will be treated with respect and welcomed. To grow the community, we >will encourage contributors to provide patches, review code, propose >new features improvements, talk at conferences such as Hadoop Summit, >HBaseCon, ApacheCon, etc. Committership and PMC membership will be >offered according to meritocracy. > >==== Community ==== > >The public Yahoo Omid repository at Github currently has 241 Stars and >93 forks, which means that there is an important interest for the >project in the open-source community, at least compared with other >similar projects (See https://github.com/yahoo/omid.git). > > >Recently, Hortonworks contributors to the Apache Hive project which >are working on storing Hive metadata in HBase (Apache Jira HIVE-9452) >manifested interest in using Omid. We started with them a fruitful >collaboration that resulted in Omid supporting HBase 1.x versions. > > >Salesforce is also interested in collaborating in doing a Proof of >Concept for integrating Omid as a pluggable transaction manager in >Apache Phoenix. > > >Yahoo, Hortonworks and Salesforce participants will constitute the >initial set of committers and mentors for the proposal. > >==== Core Developers ==== >The core developers of Omid are all skilled software developers and >research engineers at Yahoo Inc. and Hortonworks with years of >experiences in their fields. At this moment, developers are >distributed across U.S. and Israel. The aim is to incorporate more >committers from different organizations and locations over time. > > >The current set of developers include experienced committers from >Apache HBase, Hive and Hadoop projects that have been working with us >in the current codebase found in Github. > >Finally, some of the core developers are currently NOT affiliated with >the ASF and would require new ICLAs to be filed. > > >=== Alignment === >Omid enhances with transactions the already successful Apache HBase >datastore project. We have collaborated with other developers inside >and outside Yahoo which are involved in the Apache HBase community, so >we have had reliable feedback from them. > >Although Omid brings value into HBase, the design of the current >version provides a general transaction scheme that can potentially be >adapted to other MVCC key-value datastores such as Apache Cassandra. > > >Apache Phoenix is also a potential target. Phoenix is a SQL layer on >top of HBase that can potentially integrate Omid in order to provide >the well-know concept of transactions to Phoenix-based applications. > > >=== Known Risks === >==== Orphaned products ==== >Yahoo¹s Research and Search organizations have been taking care of >Omid development since the first prototype creation in 2011. Yahoo has >a long history participating in open-source projects, and has been >also a long time contributor to the Apache community. For example, in >Apache, Yahoo is an important contributor in many projects in the >Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also >open-sourced other well-known projects outside Hadoop, such as >Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make >Omid also a successful open-source Apache product. If this happens, we >are sure that a larger community will be formed around the project in >a relatively short period of time, contributing to the diversification >and stabilization of the base of committers. > > >==== Inexperience with Open Source ==== >This project has long standing experienced mentors and interested >contributors from Apache HBase, Hive and Phoenix to help us moving >through the open source process. We are actively working with >experienced Apache community members to improve our project and >further testing. > >==== Homogeneous Developers ==== >Omid has been supported by Yahoo since its inception in 2011. However, >all current committers are employed by their respective companies >shown in the Affiliations section. > > >==== Reliance on Salaried Developers ==== > >All the current developers are paid by their employers to contribute >to this project. Yahoo developers will also continuing maintaining the >internal Omid repository at their company. > >Of course, other developers are welcomed to contribute to this project >after it is open sourced in Apache. > >==== Relationships with Other Apache Product ==== > >Current Omid incarnation serves transactional contexts to applications >storing their data in HBase. However Omid design potentially allows to >be adapted to serve transactions on top of other MVCC-based key-value >datastores in Apache community such as Cassandra. > > >As a transactional framework, many other Apache projects such as >Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could >potentially benefit from Omid to get transactional contexts. In >particular, Apache Phoenix -a SQL layer on top of HBase- might use >Omid as its transaction management component. Once we open source Omid >as an Apache project, we expect to generate more interest in the >surrounded communities. > > >Very recently, a new incubator proposal for a similar project called >Tephra, has been submitted to the ASF. We think this is good for the >Apache community, and we believe that there¹s room for both proposals >as the design of each of them is based on different principles (e.g. >Omid does not require to maintain the state of ongoing transactions on >the server-side component) and due to the fact that both -Tephra and >Omid- have also gained certain traction in the open-source community. > > >With regard to the Apache projects that Omid uses, apart from HBase, >Omid relies on Apache Zookeeper and Curator projects in order to >coordinate the (re)connection of transaction managers (acting as >clients) to the conflict resolution component for transactions (server >side.) They¹re also used in order to coordinate the master and backup >replicas in high availability scenarios. > > >==== An Excessive Fascination with the Apache Brand ==== > >We are applying to the Incubator process because we think that it is >the logical next step for the Omid project after we open-sourced the >code in Github some years ago. Yahoo has a long-standing history of >contributing to Apache projects. The developers and contributors >understand the implications of making it an Apache project, and >strongly believe that the growing community can benefit from the >Apache environment, ecosystem, and infrastrastructure. > > >=== Documentation === >Current documentation about the project is available in the wiki of >Omid¹s Github repository: https://github.com/yahoo/omid/wiki . It will >be moved under https://omid.incubator.apache.org/docs if the project >is accepted as an Apache Incubator. > >=== Initial Source === >Initial source code is currently hosted in Github for general viewing >and contribution: > >https://github.com/yahoo/omid.git > > >Omid source code is written in Java code (99%) mixed with some shell >script (1%) in order to configure and trigger the execution of main >components. > > >The code will be moved to Apache http://git.apache.org/ if accepted as >an Incubator project. > >=== Source and Intellectual Property Submission Plan === > >The current Omid License for the code published in Github is Apache >2.0. If Omid fulfills and passes the conditions for being an Incubator >project in the ASF, the source code will be transitioned via the >Software Grant Agreement onto the ASF infrastructure and in turn made >available under the Apache License, version 2.0. > >=== External Dependencies === > > >The required external dependencies that are not Apache projects are >all Apache licenses or other compatible Licenses: > >Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0] > >JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License] > >Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0] > >Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0] > >Testng v6.8.8 (http://testng.org) [Apache 2.0] > >SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License] > >Netty (http://netty.io) v3.2.6.Final [Apache 2.0] > >Google Protocol Buffers v2.5.0 >(https://developers.google.com/protocol-buffers/) [BSD License] > >Mockito (http://mockito.org/) v1.9.5 [MIT License] > >LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) >[Apache 2.0] > >Coda Hale/Yammer.com Dropwizard Metrics v3.0.1 >(http://metrics.dropwizard.io/3.1.0/) [Apache 2.0] > >C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0] > >Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License] > > >=== Cryptography === >Omid project does not use cryptography itself. However, Apache HBase >-the datastore on top of which Omid works in its current version- uses >standard APIs and tools for SSH and SSL communication where necessary. > >=== Required Resources === >We request that following resources be created for the project to use: > >==== Mailing lists ==== > >omid-private (moderated subscriptions) > >omid-commits (commit notification) >omid-dev (technical discussions) > >==== Git repository ==== >https://github.com/apache/incubator-omid > >==== Documentation ==== >https://omid.incubator.apache.org/docs/ > >==== JIRA instance ==== >https://issues.apache.org/jira/browse/omid > >=== Initial Committers === > >* Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com) > > >* Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com) > > >* Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org) > > >* Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org) > > >* Igor Katkov (katkovi<AT>yahoo-inc<DOT>com) > > >* Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com) > >* Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com) > > >* Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com) > > >* Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com) > > >* Ohad Shacham (ohads<AT>yahoo-inc<DOT>com) > >* James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>) > > >=== Additional Interested Contributors === >* Ivan Kelly (ivank<AT>apache<DOT>org) > >* Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com) > > >=== Affiliations === > >* Edward Bortnikov, Yahoo Inc. > > >* Daniel Dai, Hortonworks > > >* Flavio P. Junqueira, Confluent > > >* Igor Katkov, Yahoo Inc. > > >* Ivan Kelly, Midokura > > >* Francis C. Liu, Yahoo Inc. > > >* Sameer Paranjpye, Arimo > >* Francisco Perez-Sorrosal, Yahoo Inc. > > >* Ohad Shacham, Yahoo Inc. > > >* Maysam Yabandeh, Dropbox Inc. > > >=== Sponsors === > >==== Champion ==== > >Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com) > >==== Nominated Mentors ==== > >Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com) > >Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org) > >Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org) > >Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com) > >James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>) > > >==== Sponsoring Entity ==== >Apache Incubator PMC > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org