1. What data store do you want to store your data in ? HDFS, HBase, Cassandra, S3 or something else? 2. Have you looked at SparkSQL (https://spark.apache.org/sql/)?
One option is to process the data in Spark and then store it in the relational database of your choice. On Sat, Oct 25, 2014 at 11:18 PM, Peter Wolf <opus...@gmail.com> wrote: > Hello all, > > We are considering Spark for our organization. It is obviously a superb > platform for processing massive amounts of data... how about retrieving it? > > We are currently storing our data in a relational database in a star > schema. Retrieving our data requires doing many complicated joins across > many tables. > > Can we use Spark as a relational database? Or, if not, can we put Spark > on top of a relational database? > > Note that we don't care about SQL. Accessing our data via standard > queries is nice, but we are equally happy (or more happy) to write Scala > code. > > What is important to us is doing relational queries on huge amounts of > data. Is Spark good at this? > > Thank you very much in advance > Peter >