I have spent the last few days playing with Cassandra and I have attempted to create a simple "Java->Thrift->Cassandra" Discussion Group Server (because the world needs another one) to teach myself the data model and try everything out.
With all the great blog posts on cassandra out there, I am now able to read/write/delete/modify a nested discussion server. YEA!!! I decided to have two simple ColumnFamilies. One called Posts Post = { '7561a442-24e2-11df-8924-001ff3591711': { //UUID 'id': '7561a442-24e2-11df-8924-001ff3591711', //ID == UUID 'parent_id': '89da3178-24e2-11df-8924-001ff3591711' //Parent Post UUID 'author': 'a4a70900-24e1-11df-8924-001ff3591711', //Users UUID 'subject': 'This is a forum post', //Subject 'body': 'Forum post body. This is awesome!', //Body '_ts': '89da3178-24e2-11df-8924-001ff3596713', //TimeUUID }, } Where the key is a simple UUID and the columns are the Forum/Post/Replies. A Forum has a hardcoded Parent UUID which I store in Java, while the Posts and Replies are tied to their parent posts/forums/etc by the parent_id. I sort by UTF8Type, but it really doesn't matter in this case as I drive into this map always by the Key and always get all columns (6 of them). All queries drive into the second ColumnFamily called Threads Thread = { '7561a442-24e2-11df-8924-001ff3591711': { //Parent thread UUID #timestamp of post: post UUID '89da3178-24e2-11df-8924-001ff3596713': '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post UUID value }, } With a Parent UUID I can drive into Threads which will give me the list of Posts/Replies at that level sorted by TimeUUID. Column name is the post TimeUUID and the value is the Post UUID. This ColumnFamily is sorted by TimeUUID. Thus I can walk the tree (of any depth) of Forum/Post/Replies with the Thread table. I have this all working on a single cassandra node and it works great. Inserts go to both tables while deletes need to use the Thread ColumnFamily to recursively delete all child posts, the Column in the Parent key of Thread and all associated data in Post. Any comments on whether this is a good/terrible data model, etc so far are welcome. :) My question comes from the fact that during this process I have written/read/deleted many "key->Columns" to these ColumnFamilies (many of which failed half-way through) so I decided to write a "clean" script to remove all data from these ColumnFamilies (much like a truncate table command in SQL). Using the following Java code //get the ID column for each KEY we find List<byte[]> l_columns = new ArrayList<byte[]>(); l_columns.add(Transcoder.encode(ID)); SlicePredicate l_slicePredicate = new SlicePredicate(); l_slicePredicate.setColumn_names(l_columns); //get 100 keys at a time KeyRange keyRange = new KeyRange(100); keyRange.setStart_key(""); keyRange.setEnd_key(""); List<KeySlice> l_keySlices = p_context.getClient().get_range_slices("Discussions", new ColumnParent("Posts"), l_slicePredicate, keyRange, ConsistencyLevel.ONE); I get ALL of the KEYS I ever wrote to the server. Most of them have no Columns associated with them. In fact if I query the same key with SlicePredicate l_slicePredicate = new SlicePredicate(); SliceRange l_sliceRange = new SliceRange(); l_sliceRange.setStart(new byte[] {}); l_sliceRange.setFinish(new byte[] {}); l_slicePredicate.setSlice_range(l_sliceRange); List<ColumnOrSuperColumn> l_result = p_context.getClient().get_slice("Discussions", <KEY FROM GET_RANGE_SLICES>, new ColumnParent("Posts"), l_slicePredicate, ConsistencyLevel.ONE); it returns a empty array list (the same if I give it a KEY it has never seen). It is OK with me if get_range_slices returns keys with no columns (although it makes it a little harder to explain to others -- is there garbage collection that will clean these out in the future?), however I am stuck on how to simply truncate the table without looping through all the values looking for something that has a Column associated with it and then deleting that key->value. It is possible I am not deleting correctly as well. For that I simply do: p_context.getClient().remove("Discussions", p_postUUID.toString(), new ColumnPath("Posts"), l_rightNow, ConsistencyLevel.ALL); Just trying to understand what I am getting and compare it against what I expected. I am also still trying to write a simple "clean" command. If you read this far, thanks.... If you can add some clarity it would help me. I have tried to find it in archives and blog posts, but I didn't see anything. Thanks, Kevin This email and any attachments may contain confidential and proprietary information of Xythos that is for the sole use of the intended recipient. If you are not the intended recipient, disclosure, copying, re-distribution or other use of any of this information is strictly prohibited. Please immediately notify the sender and delete this transmission if you received this email in error.