The data is currently stored in a relational database, but a migration to a document-oriented database such as MongoDb is something we are definitely considering. How does this factor in?
On Sun, Mar 6, 2016 at 12:23 PM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > > That depends on a lot of things, but as a starting point I would ask > whether you are planning to store your data in JSON format? > > > Regards, > Gourav Sengupta > > On Sun, Mar 6, 2016 at 5:17 PM, Laumegui Deaulobi < > guillaume.bilod...@gmail.com> wrote: > >> Our problem space is survey analytics. Each survey comprises a set of >> questions, with each question having a set of possible answers. Survey >> fill-out tasks are sent to users, who have until a certain date to >> complete >> it. Based on these survey fill-outs, reports need to be generated. Each >> report deals with a subset of the survey fill-outs, and comprises a set of >> data points (average rating for question 1, min/max for question 2, etc.) >> >> We are dealing with rather large data sets - although reading the internet >> we get the impression that everyone is analyzing petabytes of data... >> >> Users: up to 100,000 >> Surveys: up to 100,000 >> Questions per survey: up to 100 >> Possible answers per question: up to 10 >> Survey fill-outs / user: up to 10 >> Reports: up to 100,000 >> Data points per report: up to 100 >> >> Data is currently stored in a relational database but a migration to a >> different kind of store is possible. >> >> The naive algorithm for report generation can be summed up as this: >> >> for each report to be generated { >> for each report data point to be calculated { >> calculate data point >> add data point to report >> } >> publish report >> } >> >> In order to deal with the upper limits of these values, we will need to >> distribute this algorithm to a compute / data cluster as much as possible. >> >> I've read about frameworks such as Apache Spark but also Hadoop, GridGain, >> HazelCast and several others, and am still confused as to how each of >> these >> can help us and how they fit together. >> >> Is Spark the right framework for us? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-us-tp26412.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >