Re: BinaryMemtable and collisions

2010-05-07 Thread Jake Luciani
Got it. I'm working on making term vectors optional and just store frequency in this case. Just FYI. On Sat, May 8, 2010 at 1:17 AM, Tobias Jungen wrote: > Without going into too much depth: Our retrieval model is a bit more > structured than standard lucene retrieval, and I'm trying to leverag

Re: BinaryMemtable and collisions

2010-05-07 Thread Tobias Jungen
Without going into too much depth: Our retrieval model is a bit more structured than standard lucene retrieval, and I'm trying to leverage that structure. Some of the terms we're going to retrieve against have high occurrence, and because of that I'm worried about getting killed by processing large

Re: BinaryMemtable and collisions

2010-05-07 Thread Jake Luciani
Any reason why you aren't using Lucandra directly? On Fri, May 7, 2010 at 8:21 PM, Tobias Jungen wrote: > Greetings, > > Started getting my feet wet with Cassandra in earnest this week. I'm > building a custom inverted index of sorts on top of Cassandra, in part > inspired by the work of Jake Luc

Re: BinaryMemtable and collisions

2010-05-07 Thread Tobias Jungen
> Yes. When you flush from BMT, its like any other SSTable. Cassandra will > merge them through compaction. > > That's good news, thanks for clarifying! A few more related questions: Are there any problems with issuing the flush command directly from code at the end up a bulk insert? The BMT exam

Re: BinaryMemtable and collisions

2010-05-07 Thread Chris Goffinet
> > So my question is: If I properly flush every node after performing a larger > bulk insert, can Cassandra merge multiple writes on a single row & column > family when using the BMT interface? Or is using BMT only feasible for > loading data on rows that don't exist yet? > Yes. When you flu

BinaryMemtable and collisions

2010-05-07 Thread Tobias Jungen
Greetings, Started getting my feet wet with Cassandra in earnest this week. I'm building a custom inverted index of sorts on top of Cassandra, in part inspired by the work of Jake Luciani in Lucandra. I've successfully loaded nearly a million documents over a 3-node cluster, and initial query test