http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Sun, Apr 11, 2010 at 5:12 PM, Kevin Wiggen <kwig...@xythos.com> wrote:
>
> I have spent the last few days playing with Cassandra and I have attempted
> to create a simple "Java->Thrift->Cassandra" Discussion Group Server
> (because the world needs another one) to teach myself the data model and try
> everything out.
> With all the great blog posts on cassandra out there, I am now able to
> read/write/delete/modify a nested discussion server.  YEA!!!
> I decided to have two simple ColumnFamilies.
> One called Posts
> Post = {
>     '7561a442-24e2-11df-8924-001ff3591711': {                    //UUID
>         'id': '7561a442-24e2-11df-8924-001ff3591711',            //ID ==
> UUID
>         'parent_id': '89da3178-24e2-11df-8924-001ff3591711'      //Parent
> Post UUID
>         'author': 'a4a70900-24e1-11df-8924-001ff3591711',        //Users
> UUID
>         'subject': 'This is a forum post',                       //Subject
>         'body': 'Forum post body. This is awesome!',             //Body
>         '_ts': '89da3178-24e2-11df-8924-001ff3596713',           //TimeUUID
>     },
>    }
> Where the key is a simple UUID and the columns are the Forum/Post/Replies.
>  A Forum has a hardcoded Parent UUID which I store in Java, while the Posts
> and Replies are tied to their parent posts/forums/etc by  the parent_id.  I
> sort by UTF8Type, but it really doesn't matter in this case as I drive into
> this map always by the Key and always get all columns (6 of them).
> All queries drive into the second ColumnFamily called Threads
> Thread = {
>      '7561a442-24e2-11df-8924-001ff3591711': {                   //Parent
> thread UUID
>         #timestamp of post: post UUID
>         '89da3178-24e2-11df-8924-001ff3596713':
> '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post
> UUID value
>       },
>     }
> With a Parent UUID I can drive into Threads which will give me the list of
> Posts/Replies at that level sorted by TimeUUID.  Column name is the post
> TimeUUID and the value is the Post UUID.  This ColumnFamily is sorted by
> TimeUUID.
> Thus I can walk the tree (of any depth) of Forum/Post/Replies with the
> Thread table.
> I have this all working on a single cassandra node and it works great.
>  Inserts go to both tables while deletes need to use the Thread ColumnFamily
> to recursively delete all child posts, the Column in the Parent key of
> Thread and all associated data in Post.
> Any comments on whether this is a good/terrible data model, etc so far are
> welcome.  :)
> My question comes from the fact that during this process I have
> written/read/deleted many "key->Columns" to these ColumnFamilies (many of
> which failed half-way through) so I decided to write a "clean" script to
> remove all data from these ColumnFamilies (much like a truncate table
> command in SQL).
> Using the following Java code
>       //get the ID column for each KEY we find
>       List<byte[]> l_columns = new ArrayList<byte[]>();
>       l_columns.add(Transcoder.encode(ID));
>       SlicePredicate l_slicePredicate = new SlicePredicate();
>       l_slicePredicate.setColumn_names(l_columns);
>       //get 100 keys at a time
>       KeyRange keyRange = new  KeyRange(100);
>       keyRange.setStart_key("");
>       keyRange.setEnd_key("");
>       List<KeySlice> l_keySlices =
> p_context.getClient().get_range_slices("Discussions", new
> ColumnParent("Posts"),
>
> l_slicePredicate, keyRange, ConsistencyLevel.ONE);
> I get ALL of the KEYS I ever wrote to the server.  Most of them have no
> Columns associated with them.  In fact if I query the same key with
>       SlicePredicate l_slicePredicate =  new SlicePredicate();
>       SliceRange l_sliceRange = new SliceRange();
>       l_sliceRange.setStart(new byte[] {});
>       l_sliceRange.setFinish(new byte[] {});
>       l_slicePredicate.setSlice_range(l_sliceRange);
>       List<ColumnOrSuperColumn> l_result =
>         p_context.getClient().get_slice("Discussions", <KEY FROM
> GET_RANGE_SLICES>, new ColumnParent("Posts"),
>                                         l_slicePredicate,
> ConsistencyLevel.ONE);
> it returns a empty array list (the same if I give it a KEY it has never
> seen).
> It is OK with me if get_range_slices returns keys with no columns (although
> it makes it a little harder to explain to others -- is there garbage
> collection that will clean these out in the future?), however I am stuck on
> how to simply truncate the table without looping through all the values
> looking for something that has a Column associated with it and then deleting
> that key->value.
> It is possible I am not deleting correctly as well.  For that I simply do:
> p_context.getClient().remove("Discussions", p_postUUID.toString(),
>                              new ColumnPath("Posts"), l_rightNow,
>                              ConsistencyLevel.ALL);
> Just trying to understand what I am getting and compare it against what I
> expected.  I am also still trying to write a simple "clean" command.
> If you read this far, thanks....  If you can add some clarity it would help
> me.  I have tried to find it in archives and blog posts, but I didn't see
> anything.
> Thanks,
> Kevin
>
>
> This email and any attachments may contain confidential and proprietary
> information of Xythos that is for the sole use of the intended recipient. If
> you are not the intended recipient, disclosure, copying, re-distribution or
> other use of any of this information is strictly prohibited. Please
> immediately notify the sender and delete this transmission if you received
> this email in error.
>

Reply via email to