Hi All,
Just started understanding / getting hands on with Spark,
Streaming and MLLIb. We are in the design phase and need suggestions on the
training data storage requirement.
Batch Layer: Our core systems generate data which we will be using as batch
data, currently SQL server is being used by core systems. Our requirement is to
pull data from core databases and transform the data using spark job and store
it into Cassandra. Train the model by pulling data from Cassandra and store the
prediction results in the Cassandra itself.
Real time Layer: We are also planning have real time layer which stores live
data from devices to Cassandra for further analysis using MLLib.
Heard that there is no need of Cassandra in this design as Spark itself
provides storage. Please provide suggestions whether Cassandra is required or
not and also suggest best way to handle:
[cid:[email protected]]
Aruna Veluru | Senior Lead Analyst | Bally
Technologies<http://www.ballytech.com> | (O) +1 702 532 2832 | (M) +91 99 7222
6213
May be privileged. May be confidential. Please delete if not the addressee.
Veluru Veluru
image001.emz
Description: image001.emz
