> Yes. When you flush from BMT, its like any other SSTable. Cassandra will > merge them through compaction. > > That's good news, thanks for clarifying!
A few more related questions: Are there any problems with issuing the flush command directly from code at the end up a bulk insert? The BMT example mentions running nodetool, but poking around the Cassandra source seems to indicate it should be doable programmatically. Also, in my BMT prototype I've noticed that the JVM won't exit after completion, so I have to hard kill it (ctrl-c). A thread dump shows that some of Cassandra's network threads are still open, keeping the JVM from exiting. Some digging revealed that Cassandra isn't designed with a "clean" shutdown in mind, so perhaps such behavior is expected. Still, it is a bit unsettling since the cluster nodes log an error after I kill the client node. Is calling StorageService.stopClient enough to ensure that any client-side buffers are flushed and writes are completed? Finally, the wiki page on BMT ( http://wiki.apache.org/cassandra/BinaryMemtable) suggests using StorageProxy, but the example in contrib does not. Under the hood, both StorageProxy and the contrib example call MessagingService.sendOneWay. The additional code in StorageProxy seems mostly related to the extra bookkeeping associated with hinted handoff and waiting on write acks. Perhaps that extra work may not be entirely necessary for a bulk load operation? That should be enough questions from me for a while. :) -Toby