Re: correct use of DStream foreachRDD

2015-08-28 Thread Carol McDonald
Thanks, this looks better // parse the lines of data into sensor objects val sensorDStream = ssc.textFileStream("/stream"). map(Sensor.parseSensor) sensorDStream.foreachRDD { rdd => // filter sensor data for low psi val alertRDD = rdd.filter(sensor => sensor.psi <

Re: correct use of DStream foreachRDD

2015-08-28 Thread Sean Owen
Yes, for example "val sensorRDD = rdd.map(Sensor.parseSensor)" is a line of code executed on the driver; it's part the function you supplied to foreachRDD. However that line defines an operation on an RDD, and the map function you supplied (parseSensor) will ultimately be carried out on the cluster

RE: correct use of DStream foreachRDD

2015-08-28 Thread Ewan Leith
sensor object rdd.saveAsHadoopDataset(jobConfig) } Will perform the bulk of the work in the distributed processing, before the results are returned to the driver for writing to HBase. Thanks, Ewan From: Carol McDonald [mailto:cmcdon...@maprtech.com] Sent: 28 August 2015 15:30 To: user Subject: correct u

correct use of DStream foreachRDD

2015-08-28 Thread Carol McDonald
I would like to make sure that I am using the DStream foreachRDD operation correctly. I would like to read from a DStream transform the input and write to HBase. The code below works , but I became confused when I read "Note that the function *func* is executed in the driver process" ? val