Re: [PROPOSAL] Blur for the Apache Incubator

Aaron McCurry Wed, 18 Jul 2012 11:09:28 -0700

Is there anymore feedback before I call the vote?

Thanks!


Aaron

On Sat, Jul 14, 2012 at 11:48 AM, Dave Fisher <dave2w...@comcast.net> wrote:
>
> On Jul 13, 2012, at 2:23 PM, Aaron McCurry wrote:
>
>> Hello!
>>
>> I would like to propose Blur to be an Apache Incubator project.  Blur
>> is a distributed search platform built for low latency searches over
>> large amounts of data.  Blur is scalable and fault tolerant through
>> the use of Hadoop and ZooKeeper.  Thrift is used as the RPC library
>> and the underlying search implementation uses Lucene and the Lucene
>> query syntax.
>>
>> The proposal can be found here:
>> http://wiki.apache.org/incubator/BlurProposal
>>
>> I have included the contexts of the proposal below.
>
> Very cool!
>
> Regards,
> Dave
>
>
>>
>> Thanks!
>> Aaron
>>
>> = Blur Proposal =
>>
>> == Abstract ==
>> Blur is a search platform capable of searching massive amounts of data
>> in a cloud computing environment. Blur leverages several existing
>> Apache projects, including Apache Lucene, Apache Hadoop, Apache
>> !ZooKeeper and Apache Thrift.  Both bulk and near real time (NRT)
>> updates are possible with Blur.  Bulk updates are accomplished using
>> Hadoop Map/Reduce and NRT are performed through direct Thrift calls.
>>
>> == Proposal ==
>> Blur is an open source search platform capable of querying massive
>> amounts of data at incredible speeds. Rather than using the flat,
>> document-like data model used by most search solutions, Blur allows
>> you to build rich data models and search them in a semi-relational
>> manner similar to joins while querying a relational database. Using
>> Blur, you can get precise search results against terabytes of data at
>> Google-like speeds.  Blur leverages multiple open source projects
>> including Hadoop, Lucene, Thrift and !ZooKeeper to create an
>> environment where structured data can be transformed into an index
>> that runs on a Hadoop cluster.  Blur uses the power of Map/Reduce for
>> bulk indexing into Blur.  Server failures are handled automatically by
>> using !ZooKeeper for cluster state and HDFS for index storage.
>>
>> == Background ==
>> Blur was created by Aaron !McCurry in 2010. Blur was developed to
>> solve the challenges in dealing with searching huge quantities of data
>> that the traditional RDBMS solutions could not cope with while still
>> providing JOIN-like capabilities to query the data.  Several other
>> open source projects have implemented aspects of this design including
>> elasticsearch, Katta and Apache Solr.
>>
>> == Rationale ==
>> There is a need for a distributed search capability within the Hadoop
>> ecosystem. Currently, there are no other search solutions that
>> natively leverage HDFS and the failover features of Hadoop in the same
>> manner as the Blur project. The communities we expect to be most
>> interested in such a project are government, health care, and other
>> industries where scalability is a concern. We have made much progress
>> in developing this project over the past 2 years and believe both the
>> project and the interested communities would benefit from this work
>> being openly available and having open development.  In future
>> versions of Blur the API will more closely follow the API’s provided
>> in Lucene so that systems that already use Lucene can more easily
>> scale with Blur. Blur can be viewed as a query execution engine that
>> Lucene based solutions can utilize when scale becomes an issue.
>>
>> == Initial Goals ==
>> The initial goals of the project are:
>> * To migrate the Blur codebase, issue tracking and wiki from
>> github.com and integrate the project with the ASF infrastructure.
>> * Add new committers to the project and grow the community in "The Apache 
>> Way".
>>
>> == Current Status ==
>>
>> === Meritocracy ===
>> Blur was initially developed by Aaron !McCurry in June 2010.  Since
>> then Blur has continued to evolve with the support of a small
>> development team at Near Infinity.  As a part of the Apache Software
>> Foundation, the Apache Blur team intends to strongly encourage the
>> community to help with and contribute to the project.  Apache Blur
>> will actively seek potential committers and help them become familiar
>> with the codebase.
>>
>> === Community ===
>> A small community has developed around Blur and several project teams
>> are currently using Blur for their big data search capability. The
>> source code is currently available on GitHub and there is a dedicated
>> website (blur.io) that provides an overview of the project. Blur has
>> been shared with several members of the Apache community and has been
>> presented at the Bay Area HUG (see
>> http://www.meetup.com/hadoop/events/20109471/).
>>
>> === Core Developers ===
>> The current developers are employed by Near Infinity Corporation, but
>> we anticipate interest developing among other companies.
>>
>> === Alignment ===
>> Blur is built on top of a number of Apache projects; Hadoop, Lucene,
>> !ZooKeeper, and Thrift. It builds with Maven.  During the course of
>> Blur development, a couple of patches have been committed back to the
>> Lucene project, including LUCENE-2205 and LUCENE-2215.  Due to the
>> strong relationship with the before mentioned Apache projects, the
>> incubator is a good match for Blur.
>>
>> == Known Risks ==
>>
>> === Orphaned Products ===
>> There is only a small risk of being orphaned. The customers that
>> currently use Blur are committed to improving the codebase of the
>> project due to its fulfilling needs not addressed by any other
>> software. In addition, one customer is providing financial support to
>> further develop Blur given its importance on mission-critical
>> projects.
>>
>> === Inexperience with Open Source ===
>> The codebase has been treated internally as an open source project
>> since its beginning, and Near Infinity has extensive experience
>> developing and releasing open source projects
>> (http://www.nearinfinity.com/products/open_source). We do not
>> anticipate difficulty in operating under the Apache Way.
>>
>> === Homogeneous Developers ===
>> Current developers are all employed by Near Infinity but we are
>> actively seeking contributors from different companies and would
>> welcome their participation.
>>
>> === Reliance on Salaried Developers ===
>> Blur was originally created by Aaron !McCurry as a personal project
>> and he remains the primary contributor.  Currently, Aaron’s employer
>> (Near Infinity) fully supports his continued participation with paid,
>> dedicated time to work on Blur. All other current developers are paid
>> by Near Infinity to work on Blur as well.
>>
>> === Relationships with Other Apache Products ===
>> Blur dependencies:
>>
>> * Apache Hadoop
>> * Apache Lucene
>> * Apache !ZooKeeper
>> * Apache Thrift
>> * Apache log4j
>>
>> === Apache Brand ===
>> Our interest in releasing this code as an Apache project is due to its
>> strong relationship with other Apache projects, i.e. Blur has
>> dependencies on Hadoop, Lucene, !ZooKeeper, and Thrift and its
>> uniqueness within the Hadoop ecosystem.
>>
>> == Documentation ==
>> Current documentation can be found at http://blur.io and
>> https://github.com/nearinfinity/blur.
>>
>> == Initial Source ==
>> Blur has been in development since summer 2010. The core codebase
>> consists of about ~29,000 (~10,000 if the generated RPC code is not
>> included) lines of code mainly Java.
>>
>> == Source and Intellectual Property Submission Plan ==
>> Blur core code, examples, documentation, and training materials will
>> be submitted by Near Infinity Corporation.
>>
>> == External Dependencies ==
>> * concurrentlinkedhashmap - Apache 2.0 License -
>> http://code.google.com/p/concurrentlinkedhashmap/
>>
>> == Cryptography ==
>> none
>>
>> == Required Resources ==
>> * Mailing Lists
>>   * blur-private
>>   * blur-dev
>>   * blur-commits
>>   * blur-user
>> * Subversion Directory
>>   * https://git-wip-us.apache.org/repos/asf/blur.git
>> * Issue Tracking
>>   * JIRA
>> * Continuous Integration
>>   * Jenkins
>> * Web
>>   * http://incubator.apache.org/blur/wiki at http://wiki.apache.org
>> or http://cwiki.apache.org
>>
>> == Initial Committers ==
>> * Aaron !McCurry (aaron.mccurry at nearinfinity dot com)
>> * Scott Leberknight (scott.leberknight at nearinfinity dot com)
>> * Ryan Gimmy (ryan.gimmy at nearinfinity dot com)
>> * Tim Williams (twilliams at apache dot org)
>> * Patrick Hunt (phunt at apache dot org)
>> * Doug Cutting (cutting at apache dot org)
>>
>> == Affiliations ==
>> * Aaron !McCurry, Near Infinity
>> * Scott Leberknight, Near Infinity
>> * Ryan Gimmy, Near Infinity
>> * Patrick Hunt, Cloudera
>> * Doug Cutting, Cloudera
>>
>> == Sponsors ==
>> * Champion: Patrick Hunt
>>
>> == Nominated Mentors ==
>> * Tim Williams  (twilliams at apache dot org)
>> * Doug Cutting (cutting at apache dot org)
>> * Patrick Hunt (phunt at apache dot org)
>>
>> == Sponsoring Entity ==
>> * Apache Incubator
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Blur for the Apache Incubator

Reply via email to