I’m planning to speak at a local meet-up and I need to know if what I have in
my head is even possible.
I want to give an example of working with data in Cassandra. I have data coming
in through Kafka and Storm and I’m saving it off to Cassandra (this is only on
paper at this point). I then want to run an ML algorithm over the data. My
problem here is, while my data is distributed, I don’t know how to do the
analysis in a distributed manner. I could certainly use R but processing the
data on a single machine would seem to defeat the purpose of all this
scalability.
What is my solution?
B.