Re: Is Spark right for us?

Guillaume Bilodeau Sun, 06 Mar 2016 09:26:58 -0800

The data is currently stored in a relational database, but a migration to a
document-oriented database such as MongoDb is something we are definitely
considering.  How does this factor in?


On Sun, Mar 6, 2016 at 12:23 PM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> That depends on a lot of things, but as a starting point I would ask
> whether you are planning to store your data in JSON format?
>
>
> Regards,
> Gourav Sengupta
>
> On Sun, Mar 6, 2016 at 5:17 PM, Laumegui Deaulobi <
> guillaume.bilod...@gmail.com> wrote:
>
>> Our problem space is survey analytics.  Each survey comprises a set of
>> questions, with each question having a set of possible answers.  Survey
>> fill-out tasks are sent to users, who have until a certain date to
>> complete
>> it.  Based on these survey fill-outs, reports need to be generated.  Each
>> report deals with a subset of the survey fill-outs, and comprises a set of
>> data points (average rating for question 1, min/max for question 2, etc.)
>>
>> We are dealing with rather large data sets - although reading the internet
>> we get the impression that everyone is analyzing petabytes of data...
>>
>> Users: up to 100,000
>> Surveys: up to 100,000
>> Questions per survey: up to 100
>> Possible answers per question: up to 10
>> Survey fill-outs / user: up to 10
>> Reports: up to 100,000
>> Data points per report: up to 100
>>
>> Data is currently stored in a relational database but a migration to a
>> different kind of store is possible.
>>
>> The naive algorithm for report generation can be summed up as this:
>>
>> for each report to be generated {
>>   for each report data point to be calculated {
>>     calculate data point
>>     add data point to report
>>   }
>>   publish report
>> }
>>
>> In order to deal with the upper limits of these values, we will need to
>> distribute this algorithm to a compute / data cluster as much as possible.
>>
>> I've read about frameworks such as Apache Spark but also Hadoop, GridGain,
>> HazelCast and several others, and am still confused as to how each of
>> these
>> can help us and how they fit together.
>>
>> Is Spark the right framework for us?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-us-tp26412.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Is Spark right for us?

Reply via email to