Isn’t a bit overkill to use Storm and Spark in the architecture? You say load
it “into” Spark. Is Spark separate storage?
B.
From: Alex Kamil
Sent: Friday, August 29, 2014 10:46 PM
To: user@cassandra.apache.org
Subject: Re: Machine Learning With Cassandra
Adaryl,
most ML algorithms are bas
Spark is not storage, rather it is a streaming framework supposed to be run
on big data, distributed architecture (a very high-level intro/definition).
It provides batched version of in-memory map/reduce like jobs. It is not
completely streaming like Storm but rather batches collection of tuples an
Yes I remember this conversation. That was when I was just first stepping into
this stuff. My current understanding is:
Storm = Stream and micro batch
Spark = Batch and micro batch
Micro batching is what gets you to exactly once processing semantics. I’m clear
on that. What I’m not clear on is
If you want distributed machine learning, you can use either Mahout (runs on
Hadoop) or Spark (MLLib). If you choose the Hadoop route, Datastax provides a
connector (CFS) to interact with data stored in Cassandra. Otherwise you can
try to use the Cassandra InputFormat (not as simple, but plenty
Ahh thanks. Yeah my searches for “machine learning with Cassandra” were not
turning up much useful stuff.
Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
From: James Horey
Sent: Saturday, August 30, 2014 3:34 PM
there are other machine learning frameworks that scale better than hadoop +
mahout
http://hunch.net/~vw/
if the kind of machine learning you're doing is really large and speed
matters, take a look at vowpal wabbit
On Sat, Aug 30, 2014 at 4:58 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefi...
Hi all,
I'm working on transferring our thrift DAOs over to CQL. It's going
well, except for 2 cases that both use multi get. The use case is very
simple. It is a narrow row, by design, with only a few columns. When I
perform a multiget, I need to get up to 1k rows at a time. I do not want
t
> Hey,
>
> I have a few of VM host (bare metal) machines with varying amounts of free
> hard drive space on them. For simplicity let’s say I have three machine like
> so:
> * Machine 1
> - Harddrive 1: 150 GB available.
> * Machine 2:
> - Harddrive 1: 150 GB available.
> - Harddrive 2:
Hi
Could you recommend a Scala driver and share your experiences of using it.
Im thinking if i use java driver in Scala directly.
Thanks