> I have heard it best to try and avoid the use of super columns for now. Yup.
Your model makes sense. If you are creating the CF using the cassandra-cli you will probably want to reverse order the column names see http://thelastpickle.com/2011/10/03/Reverse-Comparators/ If you want to use CQL 3 you could do something like this: CREATE TABLE InstagramPhotos ( user_name str, photo_seq timestamp, meta_1 str, meta_2 str PRIMARY KEY (user_name, phot_seq) ); That's pretty much the same. user_name is the row key, and photo_seq will be used as part of a composite column name internally. (You can do the same thing without CQL, just look up composite columns) You can do something similar for the annotations. Depending on your use case I would use UNIX epoch time if possible rather than a time uuid. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/12/2012, at 4:35 AM, Adam Venturella <aventure...@gmail.com> wrote: > My use case is capturing some information about Instagram photos from the > API. I have 2 use cases. One, I need to capture all of the media data for an > account and two I need to be able to privately annotate that data. There is > some nuance in this, multiple http queries for example, but ignoring that, > and assuming I have obtained all of the data surrounding an accounts photos > here is how I was thinking of storing that information for use case 1. > > ColumnFamily: InstagramPhotos > > Row Key: <account_username> > > Columns: > Coulmn Name: <date_posted_timestamp> > Coulumn Value: JSON representing the data for the individual photo (filter, > comments, likes etc, not the binary photo data). > > > > So the idea would be to keep adding columns to the row that contain that > serialized data (in JSON) with their timestamps as the name. Timestamps as > the column names, I figure, should help help to perform range queries, where > I make the 1st column inserted the earliest timestamp and the last column > inserted the most recent. I could probably also use TimeUUIDs here as well > since I will have things ordered prior to inserting. > > The question here, does this approach make sense? Is it common to store JSON > in columns like this? I know there are super columns as well, so I could use > those I suppose instead of JSON. The extra level of indexing would probably > be useful to query specific photos for use case 2. I have heard it best to > try and avoid the use of super columns for now. I have no information to back > that claim up other than some time spent in the IRC. So feel free to debunk > that statement if it is false. > > So that is use case one, use case two covers the private annotations. > > I figured here: > > ColumnFamily: InstagramAnnotations > row key: Canonical Media Id > > Column Name: TimeUUID > Column Value: JSON representing an annotation/internal comment > > > Writing out the above I can actually see where I might need to tighten some > things up around how I store the photos. I am clearly missing an obvious > connection between the InstagramPhotos and the InstagramAnnotations, maybe > super columns would help with the photos instead of JSON? Otherwise I would > need to build an index row where I tie the the canonical photo id to a > timestamp (column name) in the InstagramPhotos. I could also try to figure > out how to make a TimeUUID of my own that can double as the media's canonical > id or further look at Instagram's canonical id for photos and see if it > already counts up. In which case I could use that in place of a timestamp. > > Anyway, I figured I would see if anyone might help flush out other potential > pitfalls in the above. I am definitely new to cassandra and I am using this > project as a way to learn some more about assembling systems using it. > > > > >