Re: Data Model Review

aaron morton Tue, 18 Dec 2012 20:14:09 -0800

> I have heard it best to try and avoid the use of super columns for now. 
Yup.


Your model makes sense. If you are creating the CF using the cassandra-cli you 
will probably want to reverse order the column names see 
http://thelastpickle.com/2011/10/03/Reverse-Comparators/

If you want to use CQL 3 you could do something like this:

CREATE TABLE InstagramPhotos (

        user_name str,
        photo_seq timestamp,
        meta_1 str, 
        meta_2 str
        PRIMARY KEY (user_name, phot_seq)
);

That's pretty much the same. user_name is the row key, and photo_seq will be 
used as part of a composite column name internally. 
(You can do the same thing without CQL, just look up composite columns)

You can do something similar for the annotations. 

Depending on your use case I would use UNIX epoch time if possible rather than 
a time uuid.

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/12/2012, at 4:35 AM, Adam Venturella <aventure...@gmail.com> wrote:

> My use case is capturing some information about Instagram photos from the 
> API. I have 2 use cases. One, I need to capture all of the media data for an 
> account and two I need to be able to privately annotate that data. There is 
> some nuance in this, multiple http queries for example, but ignoring that, 
> and assuming I have obtained all of the data surrounding an accounts photos 
> here is how I was thinking of storing that information for use case 1. 
> 
> ColumnFamily: InstagramPhotos
> 
> Row Key: <account_username>
> 
> Columns:   
> Coulmn Name: <date_posted_timestamp>
> Coulumn Value: JSON representing the data for the individual photo (filter, 
> comments, likes etc, not the binary photo data).
> 
> 
> 
> So the idea would be to keep adding columns to the row that contain that 
> serialized data (in JSON) with their timestamps as the name.  Timestamps as 
> the column names, I figure, should help help to perform range queries, where 
> I make the 1st column inserted the earliest timestamp and the last column 
> inserted the most recent. I could probably also use TimeUUIDs here as well 
> since I will have things ordered prior to inserting.
> 
> The question here, does this approach make sense? Is it common to store JSON 
> in columns like this? I know there are super columns as well, so I could use 
> those I suppose instead of JSON. The extra level of indexing would probably 
> be useful to query specific photos for use case 2. I have heard it best to 
> try and avoid the use of super columns for now. I have no information to back 
> that claim up other than some time spent in the IRC. So feel free to debunk 
> that statement if it is false.
> 
> So that is use case one, use case two covers the private annotations.
> 
> I figured here:
> 
> ColumnFamily: InstagramAnnotations
> row key:  Canonical Media Id
> 
> Column Name: TimeUUID
> Column Value: JSON representing an annotation/internal comment
> 
> 
> Writing out the above I can actually see where I might need to tighten some 
> things up around how I store the photos. I am clearly missing an obvious 
> connection between the InstagramPhotos and the InstagramAnnotations, maybe 
> super columns would help with the photos instead of JSON? Otherwise I would 
> need to build an index row where I tie the the canonical photo id to a 
> timestamp (column name) in the InstagramPhotos. I could also try to figure 
> out how to make a TimeUUID of my own that can double as the media's canonical 
> id or further look at Instagram's canonical id for photos and see if it 
> already counts up. In which case I could use that in place of a timestamp.
> 
> Anyway, I figured I would see if anyone might help flush out other potential 
> pitfalls in the above. I am definitely new to cassandra and I am using this 
> project as a way to learn some more about assembling systems using it.
> 
> 
> 
> 
>

Re: Data Model Review

Reply via email to