I just want to say, I'm very excited about this project, and am happy to contribute any way I can. I've been thinking a lot lately about how to build a scalable RDF system using something like Spark, so this definitely intrigues me.
Phil This message optimized for indexing by NSA PRISM On Thu, Sep 3, 2015 at 9:03 AM, Adina Crainiceanu <ad...@usna.edu> wrote: > Hi, > > We would like to start a discussion on accepting Rya, a scalable RDF data > management system built on top of Accumulo. into Apache Incubator. > > The proposal is available online at > https://wiki.apache.org/incubator/RyaProposal and also at the end of this > email. > > We are looking for additional mentors to help us with the project. Any > advice and help will be appreciated. > > Thank you very much, > Adina > > > > = Rya Proposal = > > == Abstract == > > Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that > supports SPARQL queries. > > == Proposal == > > Rya is a scalable RDF data management system built on top of Accumulo. Rya > uses novel storage methods, indexing schemes, and query processing > techniques that scale to billions of triples across multiple nodes. Rya > provides fast and easy access to the data through SPARQL, a conventional > query mechanism for RDF data. > > == Background == > > RDF is a World Wide Web Consortium (W3C) standard used in describing > resources on the Web. The smallest data unit is a triple consisting of > subject, predicate, and object. Using this framework, it is very easy to > describe any resource, not just Web related. For example, if you want to > say that Alice is a professor, you can represent this as an RDF triple like > (Alice, rdf:type, Professor). In general, RDF is an open world framework > that allows anyone to make any statement about any resource, which makes it > a popular choice for expressing a large variety of data. > > RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a > framework for describing models or ontologies for RDF. It defines concepts, > relationships, and/or structure of RDF documents. These models can be used > to 'reason/infer' information about entities within a given domain. For > example, you can express that a Professor is a sub class of Faculty, > (Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type, > Professor), it can be inferred that (Alice, rdf:type, Faculty). > > SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and > WHERE clauses; however, it is based on querying and retrieving RDF triples. > > Work on Rya, a large scale distributed system for storing and querying RDF > data, started in 2010. > > == Rationale == > > With the increase in data size, there is a need for scalable systems for > storing and retrieving RDF data in a cluster of nodes. We believe that Rya > can fulfil that role. We expect that communities within government, health > care, finance, and others who generate large amounts of RDF data will be > most interested in this project. > > From its inception, the project operated with an Apache-style license, but > it was open to mostly US government-related projects only. We believe that > having the project and the development open for all will benefit both the > project and the interested communities. > > == Current Status == > > The project source code and documentation are currently hosted in a private > repository on Github. New users are added to the repository upon request. > > === Meritocracy === > > Meritocracy is the model that we currently follow, and we want to build a > larger and more diverse developer community by becoming an Apache project. > > === Community === > > Rya has being building a community of users and developers for the past 3 > years. There is currently an active workgroup with monthly meetings and the > number of participants in the meeting is increasing. > > === Core Developers === > > The core developers are a diverse group of people who are either government > employees or former / current government contractors from different > companies. > > === Alignment === > > Rya is built on top of Accumulo, an Apache project. > > == Known Risks == > > === Orphaned Products === > > There is a very small risk of becoming orphaned. The current contributors > are strongly committed to the project, there is a large enough number of > developers interested in contributing to the project, and we believe that > the support for the project will continue to grow from the interested > communities. > > === Inexperience with Open Source === > > The initial committers have various degrees of experience with open source > projects - from very new to experienced. This project was open source > within government from the beginning. We do not expect to have difficulties > in operating under Apache's development process. > > === Homogenous Developers === > > The current list of developers form a heterogeneous group, with people for > academia, government, and industry, collaborating from distributed > geographic locations. We aim to expand the list of contributors with the > help of the Apache incubation process. > > === Reliance on Salaried Developers === > > Many but not all of the developers working on the project are salaried > employees, paid to work on this project. They will continue to contribute > to the open source project. Some of the initial committers continued as > volunteers even if no longer employed to work on this project and they plan > to continue supporting the project. > > === Relationships with Other Apache Products === > > Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven. > > === Apache Brand === > > Rya has generated interest in the government. It also generated interest > within academia and industry. We believe that everyone could benefit from > having Rya as an open source project. Due to its strong ties to Accumulo, > an Apache project, and due to the values of the Apache Foundation, we > believe that Apache incubator is the right place for Rya. > > == Documentation == > > Two peer-reviewed publications [1,2] about Rya were published in 2012 and > 2015. More documentation is available in the code. > > [1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF > Triple Store for the Clouds. Proceedings of the 1st International Workshop > on Cloud Intelligence, Pages 4:1-4:8, August 2012 > > [2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds > Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015 > (Available online 23 July 2013) > > == Initial Source == > > The code is currently available in a private Github repository. > https://github.com/LAS-NCSU/rya > > == Source and Intellectual Property Submission Plan == > > The source code has been released under the Apache License, Version 2. > Software grant, and CCLAs have been submitted. ICLAs for initial committers > have been submitted or are in progress. > > == External Dependencies == > > * Open RDF (BSD license) > * GeoMesa (Apache License, Version 2.0) > * Accumulo (Apache License, Version 2.0) > * Hadoop (Apache License, Version 2.0) > * TinkerPop (Apache License, Version 2.0) > * IndexingSail (Apache License, Version 2.0) > > == Cryptography == > > The proposal does not involve any cryptographic code. > > == Required Resources == > > === Mailing lists === > > * priv...@rya.incubator.apache.org > * d...@rya.incubator.apache.org > * comm...@rya.incubator.apache.org > > === Git Repository === > > https://git-wip-us.apache.org/repos/asf/incubator-rya.git > > === Issue Tracking === > > JIRA Rya > > == Initial Committers == > > * Roshan Punnoose, roshanp at gmail dot com > * David Rapp, dnrapp at ncsu dot edu > * Adina Crainiceanu, adinancr at gmail dot com > * Aaron Mihalik, aaron.mihalik at gmail dot com > * Puja Valiyil, pujav65 at gmail dot com > * Jennifer Brown, jennifer.brown at parsons dot com > * Steve Wagner, steve.r.wagner at gmail dot com > > == Affiliations == > > * Roshan Punnoose, Enlighten IT Consulting > * David Rapp, North Carolina State University > * Adina Crainiceanu, US Naval Academy > * Aaron Mihalik, Parsons > * Puja Valiyil, Parsons > * Jennifer Brown, Parsons > * Steve Wagner, Enlighten IT Consulting > > == Sponsors == > > === Champion === > > Adam Fuchs, ASF Member, afuchs at apache dot org > > === Nominated Mentors === > > Josh Elser josh dot elser at gmail dot com > > We are seeking additional mentors > > === Sponsoring Entity === > > Apache Incubator >