Re: Machine Learning With Cassandra

2014-08-30 Thread Adaryl "Bob" Wakefield, MBA
Isn’t a bit overkill to use Storm and Spark in the architecture? You say load it “into” Spark. Is Spark separate storage? B. From: Alex Kamil Sent: Friday, August 29, 2014 10:46 PM To: user@cassandra.apache.org Subject: Re: Machine Learning With Cassandra Adaryl, most ML algorithms are bas

Re: Machine Learning With Cassandra

2014-08-30 Thread Shahab Yunus
Spark is not storage, rather it is a streaming framework supposed to be run on big data, distributed architecture (a very high-level intro/definition). It provides batched version of in-memory map/reduce like jobs. It is not completely streaming like Storm but rather batches collection of tuples an

Re: Machine Learning With Cassandra

2014-08-30 Thread Adaryl "Bob" Wakefield, MBA
Yes I remember this conversation. That was when I was just first stepping into this stuff. My current understanding is: Storm = Stream and micro batch Spark = Batch and micro batch Micro batching is what gets you to exactly once processing semantics. I’m clear on that. What I’m not clear on is

Re: Machine Learning With Cassandra

2014-08-30 Thread James Horey
If you want distributed machine learning, you can use either Mahout (runs on Hadoop) or Spark (MLLib). If you choose the Hadoop route, Datastax provides a connector (CFS) to interact with data stored in Cassandra. Otherwise you can try to use the Cassandra InputFormat (not as simple, but plenty

Re: Machine Learning With Cassandra

2014-08-30 Thread Adaryl "Bob" Wakefield, MBA
Ahh thanks. Yeah my searches for “machine learning with Cassandra” were not turning up much useful stuff. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: James Horey Sent: Saturday, August 30, 2014 3:34 PM

Re: Machine Learning With Cassandra

2014-08-30 Thread Peter Lin
there are other machine learning frameworks that scale better than hadoop + mahout http://hunch.net/~vw/ if the kind of machine learning you're doing is really large and speed matters, take a look at vowpal wabbit On Sat, Aug 30, 2014 at 4:58 PM, Adaryl "Bob" Wakefield, MBA < adaryl.wakefi...

Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-30 Thread Todd Nine
Hi all, I'm working on transferring our thrift DAOs over to CQL. It's going well, except for 2 cases that both use multi get. The use case is very simple. It is a narrow row, by design, with only a few columns. When I perform a multiget, I need to get up to 1k rows at a time. I do not want t

Re: Heterogenous cluster and vnodes

2014-08-30 Thread Ben Bromhead
> Hey, > > I have a few of VM host (bare metal) machines with varying amounts of free > hard drive space on them. For simplicity let’s say I have three machine like > so: > * Machine 1 > - Harddrive 1: 150 GB available. > * Machine 2: > - Harddrive 1: 150 GB available. > - Harddrive 2:

Scala driver

2014-08-30 Thread Gary Zhao
Hi Could you recommend a Scala driver and share your experiences of using it. Im thinking if i use java driver in Scala directly. Thanks