1. What data store do you want to store your data in ? HDFS, HBase,
Cassandra, S3 or something else?
2. Have you looked at SparkSQL (https://spark.apache.org/sql/)?

One option is to process the data in Spark and then store it in the
relational database of your choice.




On Sat, Oct 25, 2014 at 11:18 PM, Peter Wolf <opus...@gmail.com> wrote:

> Hello all,
>
> We are considering Spark for our organization.  It is obviously a superb
> platform for processing massive amounts of data... how about retrieving it?
>
> We are currently storing our data in a relational database in a star
> schema.  Retrieving our data requires doing many complicated joins across
> many tables.
>
> Can we use Spark as a relational database?  Or, if not, can we put Spark
> on top of a relational database?
>
> Note that we don't care about SQL.  Accessing our data via standard
> queries is nice, but we are equally happy (or more happy) to write Scala
> code.
>
> What is important to us is doing relational queries on huge amounts of
> data.  Is Spark good at this?
>
> Thank you very much in advance
> Peter
>

Reply via email to