Thanks for clarifying. +1 (binding)
Julian On Fri, Oct 9, 2015 at 2:09 PM, Atri Sharma <atri.j...@gmail.com> wrote: > Hi, > > Please find answers below: > > 1) The main source code on Github wasn't updated for a while. However, the > original and main core was written in 2013 and has been open source since > then. As we discussed earlier current code base is only starting point for > complete development and will be first integrated with silo work done > independent and then used as starting implementation. > > 2) The JNI native API when optimized can provide great performance ( I have > written an application using it and it is on production systems for many > years). I think we can still provide a high performance API to the C++ core > and that is something I am personally working on right now. > On 10 Oct 2015 02:31, "Julian Hyde" <jh...@apache.org> wrote: > >> I have agreed to be a mentor to Concerted and I think it is an >> interesting idea. I am inclined to vote for it entering the incubator. >> >> However since the project has not released any source code yet, there >> are a couple of questions I'd like to get answered for the record: >> >> 1. How many lines of existing code are there? What is their approximate >> age? >> >> 2. Concerted is in C/C++ but you mention interfacing with JVM-based >> products like Hive. How you would interface with other languages? Is >> it a goal of the project to create APIs to other languages such as >> Java? Would access from those languages be as efficient as native >> access? >> >> I apologize that I didn't bring these up in the discussion thread. >> >> Julian >> >> >> On Fri, Oct 9, 2015 at 11:53 AM, Ayrton Gomesz <com.ayr...@gmail.com> >> wrote: >> > +1 >> > @henry.saputra thanks man >> > On Oct 9, 2015 5:50 PM, "Henry Saputra" <henry.sapu...@gmail.com> wrote: >> > >> >> +1 (binding) >> >> Good luck guys! >> >> >> >> On Fri, Oct 9, 2015 at 8:55 AM, Atri Sharma <a...@apache.org> wrote: >> >> > Hi all, >> >> > >> >> > Following the discussion about Concerted I would like to call a vote >> for >> >> > accepting Concerted as a new incubator project. >> >> > >> >> > The proposal text is included below, and available on the wiki: >> >> > >> >> > https://wiki.apache.org/incubator/ConcertedProposal >> >> > >> >> > The vote is open for 72 hours: >> >> > >> >> > [ ] +1 accept Concerted in the Incubator >> >> > [ ] ±0 >> >> > [ ] -1 (please give reason) >> >> > >> >> > Regards, >> >> > >> >> > Atri >> >> > >> >> > = Abstract = >> >> > >> >> > Concerted is an in memory write less read more engine aimed to provide >> >> > extreme read performance with very high degree of concurrency and >> >> > scalability and focus on minimizing own resource footprint. >> >> > >> >> > = Proposal = >> >> > Concerted is built on the principal that a new type of workload is >> >> > dominating the scene and is now needed to be supported. These are the >> >> large >> >> > data set analytical workloads being analyzed or used on large >> clusters or >> >> > high power machines. Large analytical workloads depend on the ability >> to >> >> > query large data sets efficiently and in high concurrency while >> >> maintaining >> >> > semantics such as immediate consistency. An in memory engine designed >> to >> >> > support extreme read queries while providing support for aggregation >> >> > through various features (such as multidimensional representation of >> >> > tuples) will accelerate many usecases around large scale analytics. >> >> > >> >> > Concerted believes that best understanding of user application lies >> with >> >> > user application developer. The need for massive read scaling should >> be >> >> on >> >> > demand and should be flexible to the level that user can decide as to >> >> which >> >> > representation and access of data suits his/her current requirements. >> >> > Hence, Concerted is not built in a traditional client/server model. >> >> > Concerted provides users with an API which can be used to load, read, >> >> > update and delete data. User chooses which data structure has to be >> used >> >> > for his current requirements. All API access is covered by Concerted's >> >> > internal systems like lock manager, transaction manager and cache >> manager >> >> > which ensure that reads scale to high level in every API call. >> >> > >> >> > Concerted is a Do It Yourself in memory platform for making in memory >> >> > supporting engines. The use case we think of is supporting big data >> >> > warehouses like Hive, but there are endless use cases for a custom, >> >> highly >> >> > scalable in memory platform. >> >> > >> >> > The goal of this proposal is to leverage an existing code base >> available >> >> on >> >> > Github and licensed under the Apache License 2.0 to build a community >> >> > around the project. Currently the community consists of existing >> hackers >> >> of >> >> > Concerted as well as people who have been following and associated >> with >> >> the >> >> > project since a while as well as database experts who are excited >> about >> >> > building a project like this. We are hoping that entering into Apache >> >> would >> >> > help us attract more contributors as well as connect with existing big >> >> data >> >> > projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo, >> Apache >> >> > Spark, Apache Geode to leverage their community base while assisting >> in >> >> > their use cases with Concerted. We had a discussion with founders of >> >> Apache >> >> > Tajo and they showed interest in using Concerted for some of their use >> >> > cases. >> >> > = Background = >> >> > Relational databases were built with the cost of physical memory in >> mind. >> >> > The cost is no longer very relevant and physical memory is now >> available >> >> on >> >> > demand. Another driving factor behind Concerted is that there is a >> >> paradigm >> >> > shift with big data coming into picture. Disk IO speeds are more of a >> >> > bottleneck than ever before. Combining the read dominance of >> analytical >> >> > workload with the speed of in memory structures, Concerted fits the >> >> current >> >> > scene. Also, supporting OLAP workloads with in memory support for >> faster >> >> > read constant queries and joins will be useful. >> >> > >> >> > = Rationale = >> >> > As explained above, large analytical workloads need an in memory >> >> > lightweight engine which supports massive read concurrency, ground >> level >> >> > support for aggregations and analytics, extreme scalability and high >> read >> >> > performance, along with the engine being very light itself. Concerted >> >> aims >> >> > to solve these needs. Concerted is designed and built with three >> goals as >> >> > objectives: >> >> > >> >> > >> >> > Performance >> >> > To provide high performance access to data from a large number of >> >> rows, >> >> > Concerted uses efficient representation and in memory indexing of data >> >> > coupled with high performance transactions, custom transactions and >> >> > lightweight locking and lockless techniques and an intelligent locking >> >> > manager. >> >> > >> >> > Scalability >> >> > Concerted is built with extreme concurrency and scalability in >> mind. >> >> > >> >> > Efficiency >> >> > Concerted aims to give expected performance under vast variety of >> >> > workloads and aims to have as low footprint as possible. >> >> > >> >> > = Initial Goals = >> >> > The initial goal is to leverage an existing code base and invest in >> >> > building a community around the project. We anticipate a lot of >> initial >> >> > restructuring of the existing code so that it becomes easier to >> include >> >> new >> >> > contributors and minimize ramp up time. We plan to approach this >> >> > refactoring in a fully transparent, community-driven way thus >> starting to >> >> > practice the "Apache Way" governance model from the get go. >> >> > >> >> > Various contributors are getting individual changes into branches in >> >> github >> >> > repository and our initial major goal will be to merge in all those >> >> changes >> >> > in master repository. >> >> > >> >> > = Current Status = >> >> > Concerted is currently under restructuring to suit the needs of an >> open >> >> > source project. Current source is available at >> >> > https://github.com/atris/Concerted (Please note that updated >> codebase is >> >> > not yet present on github) Concerted is currently being licensed under >> >> > Apache License 2.0. Most of the code base is implemented in C and C++ >> and >> >> > has external dependencies listed later. >> >> > >> >> > == Meritocracy == >> >> > >> >> > We plan to drive the technical roadmap and implementation in a fully >> >> > transparent, community-driven way soliciting feedback from all of the >> >> > community members and building a consensus-driven approach to evolving >> >> the >> >> > code base and the community itself. Users and new contributors will be >> >> > treated with respect and welcomed. By participating in the community >> and >> >> > providing quality patches/support that move the project forward, >> >> > contributors will earn merit. They also will be encouraged to provide >> >> > non-code contributions (documentation, events, community management, >> >> etc.) >> >> > and will gain merit for doing so. Those with a proven support and >> quality >> >> > track record will be encouraged to become committers. >> >> > >> >> > == Community == >> >> > In memory is the new cutting edge thing and a new community around >> >> > performance oriented systems and enhancing relational database >> >> performance >> >> > by having complete in memory OLTP engines will greatly benefit >> >> performance. >> >> > So we expect data warehousing projects and communities as well as >> >> projects >> >> > and companies looking for high performance OLTP performance. In >> addition, >> >> > Ingenium Data Systems is building products around Concerted and will >> have >> >> > salaried developers contribute to the project as part of job >> >> responsibility. >> >> > >> >> > == Core Developers == >> >> > Core developers are a diverse group of developers, many of which are >> very >> >> > experienced in open source and the Apache Hadoop ecosystem. >> Specifically, >> >> > Atri is an Apache Apex committer and Atri and Pavel are major >> >> contributors >> >> > to PostgreSQL project.Atri is also committer for other open source >> >> projects. >> >> > >> >> > * Amrish <amrishs AT ingeniumsys DOT com> >> >> > * Nupur S <nupurs AT ingeniumsys DOT com> >> >> > * Pavel Stehule <pavel DOT stehule AT gmail.com> >> >> > * Atri Sharma <atri AT apache DOT org> >> >> > * Nishith Singhal <nishsinghal AT gmail DOT com> >> >> > * Michael Down <michael AT dowuk DOT com> >> >> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> >> >> > * Wang Albert <albertwang87 AT gmail DOT com> >> >> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at> >> >> > * Kris Popat <krispopat AT apache DOT org> >> >> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> >> >> > >> >> > == Alignment == >> >> > Concerted will be helpful to systems like Tajo which can benefit with >> in >> >> > memory structures optimized for heavy reads and joins (dimension >> tables). >> >> > In addition Concerted will benefit projects looking for in memory >> >> > relational database as a metadata store, which is the case for most of >> >> the >> >> > Apache Big Data projects. We expect Apache HAWQ (incubating), Apache >> >> Hive, >> >> > Apache Storm, Apache Tajo to be utilizing Concerted as a supporting >> >> engine. >> >> > For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize >> >> Concerted >> >> > as an in memory engine for querying and joining dimensional tables. >> >> > >> >> > = Known Risks = >> >> > >> >> > == Orphaned Products == >> >> > Most of the code is developed by a small group of core developers and >> >> this >> >> > may be a risk for orphaned product. However, the code base is simple >> as >> >> > compared to other open source projects and the interest level in >> >> Concerted >> >> > has risen exponentially over the years with many computer >> professionals >> >> > expressing interest in the project and doing some use cases of the >> >> > same.Specifically, there were some projects done around Concerted in >> >> JIIT, >> >> > Noida (an engineering school) and Wang is a student in Lehigh >> University >> >> > who has been following Concerted's progress over many years. The core >> >> > developers are aligned with this project and since the code base is >> >> simple, >> >> > future committers will have a quick ramp up and the risk shall be >> >> > mitigated. Besides, Ingenium Data Systems is launching a product >> based on >> >> > Concerted and will be having all its salaried developers contribute to >> >> > Concerted as a part of their job functions. >> >> > >> >> > == Inexperience with Open Source == >> >> > Most of the initial committers have experience working on open source >> >> > projects. In particular, Atri is an active member of many open source >> >> > projects. >> >> > >> >> > == Homogeneous Developers == >> >> > Although initial core developers were based out of India, community >> now >> >> > consists of computer professionals from various parts of the world >> hence >> >> > diversity should not be an issue. In addition, we will be documenting >> >> > internals of the project in public facing documents and it shall allow >> >> more >> >> > contributors to join in. >> >> > >> >> > == Reliance on Salaried Developers == >> >> > It is expected that Concerted development will occur on both salaried >> >> time >> >> > and on volunteer time. Nupur and Amrish belong to Ingenium and are >> >> > committed to building this project along with their team. Atri, as the >> >> > originator of this project, will be actively working on the project >> and >> >> is >> >> > now pushing Concerted into major data warehousing projects, since he >> is >> >> > involved in architecture of data platforms. Developers are expected >> to be >> >> > contributing in their volunteer time. In addition, we will be working >> >> with >> >> > various open source projects which will be benefited by Concerted and >> >> will >> >> > be involving those communities into Concerted's development as well. >> For >> >> > eg, Apache Tajo has shown interest and will be supporting development >> of >> >> > the project. >> >> > >> >> > == Relationships with Other Apache Products == >> >> > Concerted has some overlapping function with Apache Geode(Incubating). >> >> > However, Geode is an in memory key value store whereas Concerted is a >> >> write >> >> > less read many engine. Concerted will complement Geode and increase >> the >> >> use >> >> > cases Geode can support with Concerted's help. >> >> > >> >> > A major objective for Concerted is supporting OLAP workloads and data >> >> > warehouses with in memory performance and highly performant reads and >> >> > joins. Concerted will be collaborating with many open source projects >> >> such >> >> > as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support >> >> their >> >> > OLAP workloads hence enabling them to support larger set of usecases >> >> with a >> >> > better throughput. For eg, a star schema in Hive will benefit from >> having >> >> > dimension tables in Concerted with highly efficient and scalable reads >> >> and >> >> > joins will be very fast. Similar workload for Tajo. >> >> > >> >> > Concerted will fit in many other use cases in Apache spectrum as well. >> >> For >> >> > eg, Concerted can be used with Apache Geode for in memory aggregation >> >> > indexing. Concerted can also be used with Apache Flink for streaming >> real >> >> > time data into in memory, perform in memory aggregation and then >> >> performing >> >> > batch processing for efficiency. >> >> > >> >> > >> >> > == A Excessive Fascination with the Apache Brand == >> >> > We believe that the "Apache Way" governance model will provide >> additional >> >> > help to us in finding contributors and growing the community. The >> >> community >> >> > and development process will make this project more stable and help >> >> > establish ubiquitous APIs. In addition, Concerted is looking to >> support >> >> > multiple Apache projects in their use cases and accelerate their >> >> > performance while soliciting their support in development of the >> project. >> >> > We will not be using Apache brand for excessive branding or with any >> >> > commercial aspects of Concerted. Apache brand will primarily be used >> for >> >> > community building. >> >> > >> >> > = Documentation = >> >> > Public documents are currently in development and will be published >> soon. >> >> > >> >> > = Initial Source = >> >> > The initial source is written in C++ and is heavily in development. It >> >> will >> >> > be restructured and released publicly. >> >> > We understand that there might be concerns around github source being >> >> > developed by only a single person and development not happening after >> >> 2013. >> >> > The source on github is only the source initially developed as an >> >> > independent project hence the limitation. However, due to reason that >> >> > project has been present on github for a while now, it has attracted >> >> > attention and people have been using and developing it locally. For >> eg, >> >> > Ingenium Data System took an interest in the project and locally >> >> developed >> >> > it and used it in an upcoming product they are going to release soon. >> The >> >> > project now wants to accumulate all independent development efforts >> and >> >> > help attract people to grow the community and project. We are >> currently >> >> in >> >> > process of updating github repository and making branches for all >> local >> >> > development efforts. >> >> > >> >> > = Source and Intellectual Property Submission Plan = >> >> > >> >> > We intend the entire code base to be licensed under the Apache >> License, >> >> > Version 2.0. >> >> > >> >> > = External Dependencies = >> >> > Currently, Concerted only depends on g++ compiler and pthreads. >> pthreads >> >> > will be replaced by Boost in next release. >> >> > >> >> > = Cryptography = >> >> > >> >> > N/A >> >> > >> >> > = Required Resources = >> >> > == Mailling List == >> >> > *priv...@concerted.incubator.apache.org (moderated subscriptions) >> >> > *comm...@concerted.incubator.apache.org >> >> > *d...@concerted.incubator.apache.org >> >> > *iss...@concerted.incubator.apache.org >> >> > >> >> > == Git Repository == >> >> > >> >> > https://git-wip-us.apache.org/repos/asf/incubator-concerted.git >> >> > >> >> > == Issue Tracking == >> >> > Jira Concerted (CONCERTED) >> >> > >> >> > == Other Resources == >> >> > * Continuous Integration >> >> > * Jenkins >> >> > * Wiki >> >> > * cwiki.apache.org/confluence/display/CONCERTED >> >> > >> >> > = Initial Committers = >> >> > * Roman Shaposhnik <rvs AT apache DOT org> >> >> > * Daniel Dai <daijy AT apache DOT org> >> >> > * Jake Farrell <jfarrell AT apache DOT org> >> >> > * Lars Hofhansl <larsh AT apache DOT org> >> >> > * Julian Hyde <jhyde AT apache DOT org> >> >> > * Chris Nauroth <cnauroth AT hortonworks DOT com> >> >> > * Pavel Stehule <pavel DOT stehule AT gmail.com> >> >> > * Amrish <amrishs AT ingeniumsys DOT com> >> >> > * Nupur S <nupurs AT ingeniumsys DOT com> >> >> > * Atri Sharma <atri AT apache DOT org> >> >> > * Nishith Singhal <nishsinghal AT gmail DOT com> >> >> > * Michael Down <michael AT dowuk DOT com> >> >> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com> >> >> > * Wang Albert <albertwang87 AT gmail DOT com> >> >> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at> >> >> > * Kris Popat <krispopat AT apache DOT org> >> >> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com> >> >> > >> >> > = Affiliations = >> >> > * Roman Shaposhnik (Pivotal) >> >> > * Daniel Dai (HortonWorks) >> >> > * Jake Farrell (Acquia) >> >> > * Lars Hofhansl (Salesforce) >> >> > * Julian Hyde (HortonWorks) >> >> > * Chris Nauroth (HortonWorks) >> >> > * Pavel Stehule (GoodData) >> >> > * Amrish (Ingenium Data Systems) >> >> > * Nupur S (Ingenium Data Systems) >> >> > * Atri Sharma (Barclays) >> >> > * Nishith Singhal (Wipro) >> >> > * Michael Down (Barclays) >> >> > * Vijayakumar Ramdoss (EMC) >> >> > * Wang Albert (Lehigh University) >> >> > * Hans- Jurgen Schonig (CyberTec) >> >> > * Kris Popat (CETIS LLP) >> >> > * Ayrton Gomesz (IQLabs) >> >> > >> >> > The nominated mentors are employees of HortonWorks, Acquia, and >> >> Salesforce. >> >> > >> >> > * Daniel Dai (HortonWorks) >> >> > * Jake Farrell (Acquia) >> >> > * Lars Hofhansl (Salesforce) >> >> > * Julian Hyde (HortonWorks) >> >> > * Chris Nauroth (HortonWorks) >> >> > >> >> > = Sponsors = >> >> > >> >> > == Champion == >> >> > >> >> > * Roman Shaposhnik (rvs AT apache DOT org) >> >> > >> >> > == Nominated Mentors == >> >> > >> >> > * Daniel Dai <daijy AT apache DOT org> >> >> > * Jake Farrell <jfarrell AT apache DOT org> >> >> > * Lars Hofhansl <larsh AT apache DOT org> >> >> > * Julian Hyde <jhyde AT apache DOT org> >> >> > * Chris Nauroth <cnauroth AT hortonworks DOT com> >> >> > >> >> > == Sponsoring Entity == >> >> > Apache Incubator >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org