http://wiki.apache.org/cassandra/FAQ#range_ghosts
On Sun, Apr 11, 2010 at 5:12 PM, Kevin Wiggen <kwig...@xythos.com> wrote: > > I have spent the last few days playing with Cassandra and I have attempted > to create a simple "Java->Thrift->Cassandra" Discussion Group Server > (because the world needs another one) to teach myself the data model and try > everything out. > With all the great blog posts on cassandra out there, I am now able to > read/write/delete/modify a nested discussion server. YEA!!! > I decided to have two simple ColumnFamilies. > One called Posts > Post = { > '7561a442-24e2-11df-8924-001ff3591711': { //UUID > 'id': '7561a442-24e2-11df-8924-001ff3591711', //ID == > UUID > 'parent_id': '89da3178-24e2-11df-8924-001ff3591711' //Parent > Post UUID > 'author': 'a4a70900-24e1-11df-8924-001ff3591711', //Users > UUID > 'subject': 'This is a forum post', //Subject > 'body': 'Forum post body. This is awesome!', //Body > '_ts': '89da3178-24e2-11df-8924-001ff3596713', //TimeUUID > }, > } > Where the key is a simple UUID and the columns are the Forum/Post/Replies. > A Forum has a hardcoded Parent UUID which I store in Java, while the Posts > and Replies are tied to their parent posts/forums/etc by the parent_id. I > sort by UTF8Type, but it really doesn't matter in this case as I drive into > this map always by the Key and always get all columns (6 of them). > All queries drive into the second ColumnFamily called Threads > Thread = { > '7561a442-24e2-11df-8924-001ff3591711': { //Parent > thread UUID > #timestamp of post: post UUID > '89da3178-24e2-11df-8924-001ff3596713': > '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post > UUID value > }, > } > With a Parent UUID I can drive into Threads which will give me the list of > Posts/Replies at that level sorted by TimeUUID. Column name is the post > TimeUUID and the value is the Post UUID. This ColumnFamily is sorted by > TimeUUID. > Thus I can walk the tree (of any depth) of Forum/Post/Replies with the > Thread table. > I have this all working on a single cassandra node and it works great. > Inserts go to both tables while deletes need to use the Thread ColumnFamily > to recursively delete all child posts, the Column in the Parent key of > Thread and all associated data in Post. > Any comments on whether this is a good/terrible data model, etc so far are > welcome. :) > My question comes from the fact that during this process I have > written/read/deleted many "key->Columns" to these ColumnFamilies (many of > which failed half-way through) so I decided to write a "clean" script to > remove all data from these ColumnFamilies (much like a truncate table > command in SQL). > Using the following Java code > //get the ID column for each KEY we find > List<byte[]> l_columns = new ArrayList<byte[]>(); > l_columns.add(Transcoder.encode(ID)); > SlicePredicate l_slicePredicate = new SlicePredicate(); > l_slicePredicate.setColumn_names(l_columns); > //get 100 keys at a time > KeyRange keyRange = new KeyRange(100); > keyRange.setStart_key(""); > keyRange.setEnd_key(""); > List<KeySlice> l_keySlices = > p_context.getClient().get_range_slices("Discussions", new > ColumnParent("Posts"), > > l_slicePredicate, keyRange, ConsistencyLevel.ONE); > I get ALL of the KEYS I ever wrote to the server. Most of them have no > Columns associated with them. In fact if I query the same key with > SlicePredicate l_slicePredicate = new SlicePredicate(); > SliceRange l_sliceRange = new SliceRange(); > l_sliceRange.setStart(new byte[] {}); > l_sliceRange.setFinish(new byte[] {}); > l_slicePredicate.setSlice_range(l_sliceRange); > List<ColumnOrSuperColumn> l_result = > p_context.getClient().get_slice("Discussions", <KEY FROM > GET_RANGE_SLICES>, new ColumnParent("Posts"), > l_slicePredicate, > ConsistencyLevel.ONE); > it returns a empty array list (the same if I give it a KEY it has never > seen). > It is OK with me if get_range_slices returns keys with no columns (although > it makes it a little harder to explain to others -- is there garbage > collection that will clean these out in the future?), however I am stuck on > how to simply truncate the table without looping through all the values > looking for something that has a Column associated with it and then deleting > that key->value. > It is possible I am not deleting correctly as well. For that I simply do: > p_context.getClient().remove("Discussions", p_postUUID.toString(), > new ColumnPath("Posts"), l_rightNow, > ConsistencyLevel.ALL); > Just trying to understand what I am getting and compare it against what I > expected. I am also still trying to write a simple "clean" command. > If you read this far, thanks.... If you can add some clarity it would help > me. I have tried to find it in archives and blog posts, but I didn't see > anything. > Thanks, > Kevin > > > This email and any attachments may contain confidential and proprietary > information of Xythos that is for the sole use of the intended recipient. If > you are not the intended recipient, disclosure, copying, re-distribution or > other use of any of this information is strictly prohibited. Please > immediately notify the sender and delete this transmission if you received > this email in error. >