Just wanted to make sure. Thanks. Daniel
On Mon, Oct 12, 2015 at 1:07 PM, Adrian Tanase <atan...@adobe.com> wrote: > Not really, unless you’re doing something wrong (e.g. Call collect or > similar). > > In the foreach loop you’re typically registering a temp table, by > converting an RDD to data frame. All the subsequent queries are executed in > parallel on the workers. > > I haven’t built production apps with this pattern but I have successfully > built a prototype where I execute dynamic SQL on top of a 15 minute window > (obtained with .window on the Dstream) - and it works as expected. > > Check this out for code example: > https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala > > -adrian > > From: Daniel Haviv > Date: Monday, October 12, 2015 at 12:52 PM > To: user > Subject: SQLContext within foreachRDD > > Hi, > As things that run inside foreachRDD run at the driver, does that mean > that if we use SQLContext inside foreachRDD the data is sent back to the > driver and only then the query is executed or is it executed at the > executors? > > > Thank you. > Daniel > > >