Just wanted to make sure.

Thanks.
Daniel

On Mon, Oct 12, 2015 at 1:07 PM, Adrian Tanase <atan...@adobe.com> wrote:

> Not really, unless you’re doing something wrong (e.g. Call collect or
> similar).
>
> In the foreach loop you’re typically registering a temp table, by
> converting an RDD to data frame. All the subsequent queries are executed in
> parallel on the workers.
>
> I haven’t built production apps with this pattern but I have successfully
> built a prototype where I execute dynamic SQL on top of a 15 minute window
> (obtained with .window on the Dstream) - and it works as expected.
>
> Check this out for code example:
> https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala
>
> -adrian
>
> From: Daniel Haviv
> Date: Monday, October 12, 2015 at 12:52 PM
> To: user
> Subject: SQLContext within foreachRDD
>
> Hi,
> As things that run inside foreachRDD run at the driver, does that mean
> that if we use SQLContext inside foreachRDD the data is sent back to the
> driver and only then the query is executed or is it executed at the
> executors?
>
>
> Thank you.
> Daniel
>
>
>

Reply via email to