I have spent the last few days playing with Cassandra and I have attempted to 
create a simple "Java->Thrift->Cassandra" Discussion Group Server (because the 
world needs another one) to teach myself the data model and try everything out.

With all the great blog posts on cassandra out there, I am now able to 
read/write/delete/modify a nested discussion server.  YEA!!!

I decided to have two simple ColumnFamilies.

One called Posts

Post = {
    '7561a442-24e2-11df-8924-001ff3591711': {                    //UUID
        'id': '7561a442-24e2-11df-8924-001ff3591711',            //ID == UUID
        'parent_id': '89da3178-24e2-11df-8924-001ff3591711'      //Parent Post 
UUID
        'author': 'a4a70900-24e1-11df-8924-001ff3591711',        //Users UUID
        'subject': 'This is a forum post',                       //Subject
        'body': 'Forum post body. This is awesome!',             //Body
        '_ts': '89da3178-24e2-11df-8924-001ff3596713',           //TimeUUID
    },
   }

Where the key is a simple UUID and the columns are the Forum/Post/Replies.  A 
Forum has a hardcoded Parent UUID which I store in Java, while the Posts and 
Replies are tied to their parent posts/forums/etc by  the parent_id.  I sort by 
UTF8Type, but it really doesn't matter in this case as I drive into this map 
always by the Key and always get all columns (6 of them).

All queries drive into the second ColumnFamily called Threads

Thread = {
     '7561a442-24e2-11df-8924-001ff3591711': {                   //Parent 
thread UUID
        #timestamp of post: post UUID
        '89da3178-24e2-11df-8924-001ff3596713': 
'7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post UUID value
      },
    }

With a Parent UUID I can drive into Threads which will give me the list of 
Posts/Replies at that level sorted by TimeUUID.  Column name is the post 
TimeUUID and the value is the Post UUID.  This ColumnFamily is sorted by 
TimeUUID.

Thus I can walk the tree (of any depth) of Forum/Post/Replies with the Thread 
table.

I have this all working on a single cassandra node and it works great.  Inserts 
go to both tables while deletes need to use the Thread ColumnFamily to 
recursively delete all child posts, the Column in the Parent key of Thread and 
all associated data in Post.

Any comments on whether this is a good/terrible data model, etc so far are 
welcome.  :)

My question comes from the fact that during this process I have 
written/read/deleted many "key->Columns" to these ColumnFamilies (many of which 
failed half-way through) so I decided to write a "clean" script to remove all 
data from these ColumnFamilies (much like a truncate table command in SQL).

Using the following Java code

      //get the ID column for each KEY we find
      List<byte[]> l_columns = new ArrayList<byte[]>();
      l_columns.add(Transcoder.encode(ID));
      SlicePredicate l_slicePredicate = new SlicePredicate();
      l_slicePredicate.setColumn_names(l_columns);
      //get 100 keys at a time
      KeyRange keyRange = new  KeyRange(100);
      keyRange.setStart_key("");
      keyRange.setEnd_key("");

      List<KeySlice> l_keySlices = 
p_context.getClient().get_range_slices("Discussions", new ColumnParent("Posts"),
                                                                          
l_slicePredicate, keyRange, ConsistencyLevel.ONE);

I get ALL of the KEYS I ever wrote to the server.  Most of them have no Columns 
associated with them.  In fact if I query the same key with

      SlicePredicate l_slicePredicate =  new SlicePredicate();
      SliceRange l_sliceRange = new SliceRange();
      l_sliceRange.setStart(new byte[] {});
      l_sliceRange.setFinish(new byte[] {});
      l_slicePredicate.setSlice_range(l_sliceRange);
      List<ColumnOrSuperColumn> l_result =
        p_context.getClient().get_slice("Discussions", <KEY FROM 
GET_RANGE_SLICES>, new ColumnParent("Posts"),
                                        l_slicePredicate, ConsistencyLevel.ONE);

it returns a empty array list (the same if I give it a KEY it has never seen).

It is OK with me if get_range_slices returns keys with no columns (although it 
makes it a little harder to explain to others -- is there garbage collection 
that will clean these out in the future?), however I am stuck on how to simply 
truncate the table without looping through all the values looking for something 
that has a Column associated with it and then deleting that key->value.

It is possible I am not deleting correctly as well.  For that I simply do:

p_context.getClient().remove("Discussions", p_postUUID.toString(),
                             new ColumnPath("Posts"), l_rightNow,
                             ConsistencyLevel.ALL);

Just trying to understand what I am getting and compare it against what I 
expected.  I am also still trying to write a simple "clean" command.

If you read this far, thanks....  If you can add some clarity it would help me. 
 I have tried to find it in archives and blog posts, but I didn't see anything.

Thanks,
Kevin




This email and any attachments may contain confidential and proprietary 
information of Xythos that is for the sole use of the intended recipient. If 
you are not the intended recipient, disclosure, copying, re-distribution or 
other use of any of this information is strictly prohibited. Please immediately 
notify the sender and delete this transmission if you received this email in 
error.

Reply via email to