Thanks for the feedback everyone. We've had a look at different SQL based
solutions, and have got good performance out of them, but some of the
reports we make can't be generated with a single bit of SQL. This is just
an investigation to see if Spark is a viable alternative.
I've got another quest
I agree with the others that a dedicated NoSQL datastore can make sense. You
should look at the lambda architecture paradigm. Keep in mind that more memory
does not necessarily mean more performance. It is the right data structure for
the queries of your users. Additionally, if your queries are
Any specific reason to choose Spark? It sounds like you have a
Write-Once-Read-Many Times dataset, which is logically partitioned across
customers, sitting in some data store. And essentially you are looking for
a fast way to access it, and most likely you will use the same partition
key for querin
Hi Allan,
Where is the data stored right now? If it's in a relational database, and you
are using Spark with Hadoop, I feel like it would make sense to move the import
the data into HDFS, just because it would be faster to access the data. You
could use Sqoop to do that.
In terms of having a l