Thanks for your support, This is my idea of the project, i'm a newbie so please forgive my misunderstandings:
Spark streaming will collect requests, for example: create a table, append records to a table, erase a table (it's just an example). With spark streaming i can filter the messages by key (kind of request) and send them (forEachRDD) to a specific function which should care about each kind of request. This is just fine when the requests are "self-contained", or say in other words, one-step request, for example, create table, drop table. But it's a bit more complicated if i need a connection that: 1) must survive to the scope of the function 2) must be shared across slaves. For example, a connection to the database. How do you think is the best approach for this scenario? 2014-03-26 10:30 GMT+01:00 Tathagata Das <[email protected]>: > When you say "launch long-running tasks" does it mean long running Spark > jobs/tasks, or long-running tasks in another system? > > If the rate of requests from Kafka is not low (in terms of records per > second), you could collect the records in the driver, and maintain the > "shared bag" in the driver. A separate thread in the driver could pick > stuff from the bag and launch "tasks". This is a slightly unorthodox use of > Spark Streaming, but should work. > > If the rate of request from Kafka is high, then I am not sure how you can > sustain that many long running tasks (assuming 1 task corresponding to each > request from Kafka). > > TD > > > On Wed, Mar 26, 2014 at 1:19 AM, Bryan Bryan <[email protected]>wrote: > >> Hi there, >> >> I have read about the two fundamental shared features in spark >> (broadcasting variables and accumulators), but this is what i need. >> >> I'm using spark streaming in order to get requests from Kafka, these >> requests may launch long-running tasks, and i need to control them: >> >> 1) Keep them in a shared bag, like a Hashmap, to retrieve them by ID, for >> example. >> 2) Retrieve an instance of this object/task whatever on-demand >> (on-request, in fact) >> >> >> Any idea about that? How can i share objects between slaves? May i use >> something out of spark (maybe hazelcast') >> >> >> Regards. >> > >
