The closest use case to what I will describe, I believe, is the real-time ad serving that Yahoo is doing.
I am looking into using Spark as a sub-second latency decision engine service that a user-facing application calls, maybe via the Livy REST server or spark-jobserver. Instead of Terabytes of Data to sift through, it only need be a few GB or less. my question is, where on the spectrum of "an interesting use of Spark and there are some examples of projects that do this type of thing" to "thats so far outside of what Spark was designed for that its probably not the direction to go" is this idea? here is a silly example to illustrate what I mean, sorry if long-winded but just want to be clear on the type of iterative algorithm that Spark seems to be well suited for: rich song data objects (30 fields per song) exposed as an RDD, 100,000 to 1Million songs. we at Songiato think we can make the best 10 selections to put on a users playlist based on a users real time context, i.e. mood, play history, heart-rate(so cannot be pre-computed), how many cats he/she owns, etc. the call to the Spark decision engine is made and includes those context variables, and Songiato's secret algorithms are a series of mapping steps to compute a score for each song, followed by a single fold on top score to choose the first song. this updates history and thus context, so 9 more iterations happen to complete the 10 song selection. thank you ahead of time for any help. Ive been pulling my hair out trying to decide if Spark is the right tool for the job! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-non-canonical-use-of-the-Spark-computation-model-tp24855.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org