[ 
https://issues.apache.org/jira/browse/LUCENE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891677#action_12891677
 ] 

Yonik Seeley commented on LUCENE-2456:
--------------------------------------

It seems like integrations such as this would be best run as separate projects 
(i.e. a google code project or something).  There are so many possible 
integrations, and it would add too much burden to core developers to maintain 
them all.

> A Column-Oriented Cassandra-Based Lucene Directory
> --------------------------------------------------
>
>                 Key: LUCENE-2456
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2456
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*, Store
>    Affects Versions: 3.0.1
>            Reporter: Karthick Sankarachary
>         Attachments: LUCENE-2456.patch, LUCENE-2456.zip
>
>
> Herein, we describe a type of Lucene directory that stores its file in a 
> Cassandra server, which makes for a scalable and robust store for Lucene 
> indices.
> In brief, the CassandraDirectory maps the concept of a Lucene directory to a 
> column family that belongs to a certain keyspace located in a given Cassandra 
> server. Further, it stores each file under this directory as a row in that 
> column family.
> Specifically, its files are broken down into blocks (whose sizes are capped), 
> where each block (see FileBlock) is stored as the value of a column in the 
> corresponding row. As per 
> http://wiki.apache.org/cassandra/CassandraLimitations, this is the 
> recommended approach for dealing with large objects, which Lucene files tend 
> to be. In addition, a descriptor of the file (see FileDescriptor) that 
> outlines a map of blocks therein is stored as one of the columns in that row 
> as well. Think of this descriptor as an inode for Cassandra-based files.
> The exhaustive mapping of a Lucene directory (file) to a Cassandra column 
> family (row) is captured in the ColumnOrientedDirectory (ColumnOrientedFile) 
> inner-class. Specifically, it interprets Cassandra's data model in terms of 
> Lucene's, and vice verca. More importantly, these are the only two 
> inner-classes that have a foot in both the Lucene and Cassandra camps.
> All writes to a file in this directory occur through a CassandraIndexOutput, 
> which puts the data flushed from a write-behind buffer into the fitting set 
> of blocks. By the same token, all reads from a file in this directory occur 
> through a CassandraIndexInput, which gets the data needed by a read-ahead 
> buffer from the right set of blocks.
> The last (but not the least) inner-class, CassandraClient, acts as a facade 
> over a Thrift-based Cassandra client. In short, it provides operations to 
> get/put rows/columns in the column family and keyspace associated with this 
> directory.
> Unlike Lucandra, which attempts to bridge the gap between Lucene and 
> Cassandra at the document-level, the CassandraDirectory is self-sufficient in 
> the sense that it does not require a re-write of any other component in the 
> Lucene stack. In other words, one may use the CassandraDirectory in 
> conjunction with the Lucene IndexWriter and IndexReader, as you would any 
> other kind of Lucene Directory. Moreover, given the the data unit that is 
> transferred to and from Cassandra is a large-sized block, one may expect 
> fewer round trips, and hence better throughputs, from the CassandraDirectory.
> In conclusion, this directory attempts to marry the rich search-based query 
> language of Lucene with the distributed fault-tolerant database that is 
> Cassandra. By delegating the responsibilities of replication, durability and 
> elasticity to the directory, we free the layers above from such 
> non-functional concerns. Our hope is that users will choose to make their 
> large-scale indices instantly scalable by seamlessly migrating them to this 
> type of directory (using Directory#copyTo(Directory)).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to