Re: R on Cassandra

Paul Brown Wed, 09 Nov 2011 11:57:35 -0800

Hi, Brian --

A little late to reply, but I'm slowly catching up.

You're going to be better off, IMHO, to pull the data out of Cassandra with a 
tool like Pig (probably with a bit of aggregation and filtering) and then 
operate on it in R as a static delimited file.  If you need additional 
automation or batching (as well as cleaning and aggregation), you can automate 
that using various tools.  Some of this depends on your modeling workflow, but 
it's not unreasonable to expect that you'll want to return to exactly the same 
dataset and repeat some processes as you refine your approach.  It's 
difficult/impossible to do that against live data.

-- Paul

On Nov 1, 2011, at 2:02 PM, Brian O'Neill wrote:

> I saw a mention of R on Cassandra:
> http://comments.gmane.org/gmane.comp.db.cassandra.user/5681
> 
> Does anyone know if this has traction somewhere?
> 
> -brian
> 
> -- 
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>

Re: R on Cassandra

Reply via email to