[
https://issues.apache.org/jira/browse/LUCENE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891677#action_12891677
]
Yonik Seeley commented on LUCENE-2456:
--------------------------------------
It seems like integrations such as this would be best run as separate projects
(i.e. a google code project or something). There are so many possible
integrations, and it would add too much burden to core developers to maintain
them all.
> A Column-Oriented Cassandra-Based Lucene Directory
> --------------------------------------------------
>
> Key: LUCENE-2456
> URL: https://issues.apache.org/jira/browse/LUCENE-2456
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*, Store
> Affects Versions: 3.0.1
> Reporter: Karthick Sankarachary
> Attachments: LUCENE-2456.patch, LUCENE-2456.zip
>
>
> Herein, we describe a type of Lucene directory that stores its file in a
> Cassandra server, which makes for a scalable and robust store for Lucene
> indices.
> In brief, the CassandraDirectory maps the concept of a Lucene directory to a
> column family that belongs to a certain keyspace located in a given Cassandra
> server. Further, it stores each file under this directory as a row in that
> column family.
> Specifically, its files are broken down into blocks (whose sizes are capped),
> where each block (see FileBlock) is stored as the value of a column in the
> corresponding row. As per
> http://wiki.apache.org/cassandra/CassandraLimitations, this is the
> recommended approach for dealing with large objects, which Lucene files tend
> to be. In addition, a descriptor of the file (see FileDescriptor) that
> outlines a map of blocks therein is stored as one of the columns in that row
> as well. Think of this descriptor as an inode for Cassandra-based files.
> The exhaustive mapping of a Lucene directory (file) to a Cassandra column
> family (row) is captured in the ColumnOrientedDirectory (ColumnOrientedFile)
> inner-class. Specifically, it interprets Cassandra's data model in terms of
> Lucene's, and vice verca. More importantly, these are the only two
> inner-classes that have a foot in both the Lucene and Cassandra camps.
> All writes to a file in this directory occur through a CassandraIndexOutput,
> which puts the data flushed from a write-behind buffer into the fitting set
> of blocks. By the same token, all reads from a file in this directory occur
> through a CassandraIndexInput, which gets the data needed by a read-ahead
> buffer from the right set of blocks.
> The last (but not the least) inner-class, CassandraClient, acts as a facade
> over a Thrift-based Cassandra client. In short, it provides operations to
> get/put rows/columns in the column family and keyspace associated with this
> directory.
> Unlike Lucandra, which attempts to bridge the gap between Lucene and
> Cassandra at the document-level, the CassandraDirectory is self-sufficient in
> the sense that it does not require a re-write of any other component in the
> Lucene stack. In other words, one may use the CassandraDirectory in
> conjunction with the Lucene IndexWriter and IndexReader, as you would any
> other kind of Lucene Directory. Moreover, given the the data unit that is
> transferred to and from Cassandra is a large-sized block, one may expect
> fewer round trips, and hence better throughputs, from the CassandraDirectory.
> In conclusion, this directory attempts to marry the rich search-based query
> language of Lucene with the distributed fault-tolerant database that is
> Cassandra. By delegating the responsibilities of replication, durability and
> elasticity to the directory, we free the layers above from such
> non-functional concerns. Our hope is that users will choose to make their
> large-scale indices instantly scalable by seamlessly migrating them to this
> type of directory (using Directory#copyTo(Directory)).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]