The closest use case to what I will describe, I believe, is the real-time ad
serving that Yahoo is doing.

I am looking into using Spark as a sub-second latency decision engine
service that a user-facing application calls, maybe via the Livy REST server
or spark-jobserver. Instead of Terabytes of Data to sift through, it only
need be a few GB or less. my question is, where on the spectrum of "an
interesting use of Spark and there are some examples of projects that do
this type of thing" to "thats so far outside of what Spark was designed for
that its probably not the direction to go" is this idea?

here is a silly example to illustrate what I mean, sorry if long-winded but
just want to be clear on the type of iterative algorithm that Spark seems to
be well suited for:

rich song data objects (30 fields per song) exposed as an RDD, 100,000 to
1Million songs. we at Songiato think we can make the best 10 selections to
put on a users playlist based on a users real time context, i.e. mood, play
history, heart-rate(so cannot be pre-computed), how many cats he/she owns,
etc. 

the call to the Spark decision engine is made and includes those context
variables, and Songiato's secret algorithms are a series of mapping steps to
compute a score for each song, followed by a single fold on top score to
choose the first song. this updates history and thus context, so 9 more
iterations happen to complete the 10 song selection. 

thank you ahead of time for any help. Ive been pulling my hair out trying to
decide if Spark is the right tool for the job! 







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/A-non-canonical-use-of-the-Spark-computation-model-tp24855.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to