Dave Tyler's answer already covers CFs etc..
We are using Cassandra to store user profile data for exactly the sort of use case you describe. We don't yet store _all_ the data in Cassandra; currently we are focusing on the stuff we need available for real-time access. We use Hadoop to analyse the profiles from within Cassandra. Dave On 23 February 2011 23:21, Dave Viner <davevi...@gmail.com> wrote: > Hi all, > > I'm wondering if anyone has used cassandra as a datastore for a > user-profile service. I'm thinking of applications like behavioral > targeting, where there are lots & lots of users (10s to 100s of millions), > and lots & lots of data about them intermixed in, say, weblogs (probably TBs > worth). The idea would be to use Cassandra as a datastore for distributed > parallel processing of the TBs of files (say on hadoop). Then the resulting > user-profiles would be query-able quickly. > > Anyone know of that sort of application of Cassandra? I'm trying to puzzle > out just what the column family might look like. Seems like a mix of > time-oriented information (user x visits site y at time z), location > information (user x appeared from ip x.y.z.a which is geo-location 31.20309, > 120.10923), and derived information (because user x visited site y 15 times > within a 10 day window, user x must be interested in buying a car). > > I don't have specifics as yet... just some general thoughts. But this > feels like a Cassandra type problem. (User profile can have lots of columns > per user, but the exact columns might differ from user to user... very > scalable, etc) > > Thanks > Dave Viner > >