Dave

Tyler's answer already covers CFs etc..

We are using Cassandra to store user profile data for exactly the sort of
use case you describe. We don't yet store _all_ the data in Cassandra;
currently we are focusing on the stuff we need available for real-time
access. We use Hadoop to analyse the profiles from within Cassandra.

Dave

On 23 February 2011 23:21, Dave Viner <davevi...@gmail.com> wrote:

> Hi all,
>
> I'm wondering if anyone has used cassandra as a datastore for a
> user-profile service.  I'm thinking of applications like behavioral
> targeting, where there are lots & lots of users (10s to 100s of millions),
> and lots & lots of data about them intermixed in, say, weblogs (probably TBs
> worth).  The idea would be to use Cassandra as a datastore for distributed
> parallel processing of the TBs of files (say on hadoop).  Then the resulting
> user-profiles would be query-able quickly.
>
> Anyone know of that sort of application of Cassandra?  I'm trying to puzzle
> out just what the column family might look like.  Seems like a mix of
> time-oriented information (user x visits site y at time z), location
> information (user x appeared from ip x.y.z.a which is geo-location 31.20309,
> 120.10923), and derived information (because user x visited site y 15 times
> within a 10 day window, user x must be interested in buying a car).
>
> I don't have specifics as yet... just some general thoughts.  But this
> feels like a Cassandra type problem.  (User profile can have lots of columns
> per user, but the exact columns might differ from user to user... very
> scalable, etc)
>
> Thanks
> Dave Viner
>
>

Reply via email to