Re: Spark Beginner: Correct approach for use case

2017-03-08 Thread Allan Richards
Thanks for the feedback everyone. We've had a look at different SQL based solutions, and have got good performance out of them, but some of the reports we make can't be generated with a single bit of SQL. This is just an investigation to see if Spark is a viable alternative. I've got another quest

Re: Spark Beginner: Correct approach for use case

2017-03-05 Thread Jörn Franke
I agree with the others that a dedicated NoSQL datastore can make sense. You should look at the lambda architecture paradigm. Keep in mind that more memory does not necessarily mean more performance. It is the right data structure for the queries of your users. Additionally, if your queries are

Re: Spark Beginner: Correct approach for use case

2017-03-05 Thread ayan guha
Any specific reason to choose Spark? It sounds like you have a Write-Once-Read-Many Times dataset, which is logically partitioned across customers, sitting in some data store. And essentially you are looking for a fast way to access it, and most likely you will use the same partition key for querin

Re: Spark Beginner: Correct approach for use case

2017-03-05 Thread Subhash Sriram
Hi Allan, Where is the data stored right now? If it's in a relational database, and you are using Spark with Hadoop, I feel like it would make sense to move the import the data into HDFS, just because it would be faster to access the data. You could use Sqoop to do that. In terms of having a l