-----Original Message----- From: Sean Owen <so...@cloudera.com> >On Tue, Nov 4, 2014 at 8:02 PM, spr <s...@yarcdata.com> wrote: >> To state this another way, it seems like there's no way to straddle the >> streaming world and the non-streaming world; to get input from both a >> (vanilla, Linux) file and a stream. Is that true? >> >> If so, it seems I need to turn my (vanilla file) data into a second >>stream. > >Hm, why do you say that? nothing prevents that at all. You can do >anything you like in your local code, or in functions you send to >remote workers. (Of course, if those functions depend on a local file, >it has to exist locally on the workers.) You do have to think about >the distributed model here, but what executes locally/remotely isn't >mysterious. It is things in calls to Spark API method that will be >executed remotely.
The distinction I was calling out was temporal, not local/distributed, though that is another important dimension. It sounds like I can do anything I want in the code before the ssc.start(), but that code runs once at the beginning of the program. What I'm searching for is some way to have code that runs repeatedly and potentially updates a variable that the Streaming code will see. Broadcast() almost does that, but apparently the underlying variable should be immutable. I'm not aware of any (Spark) way to have code run repeatedly other than as part of the Spark Streaming API, but that doesn't look at vanilla files. The distributed angle you raise makes my "vanilla file" approach not quite credible, in that the vanilla file would have to be distributed to all the nodes for the updates to be seen. So maybe the simplest way to do that is to have a vanilla Linux code monitoring the vanilla file (on a client node) and sending any changes to it into a (distinct) stream. If so, the remote code would need to monitor both that stream and the main data stream. Does that make sense? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org