My specific and immediate need is this: We have a native function wrapped in 
JNI. To increase performance we'd like to avoid calling it record by record. 
mapPartitions() give us the ability to invoke this in bulk. We're looking for a 
similar approach in SQL.


________________________________
From: Ryan <ryan.hd....@gmail.com>
Sent: Sunday, June 25, 2017 7:18:32 PM
To: jeff saremi
Cc: user@spark.apache.org
Subject: Re: What is the equivalent of mapPartitions in SpqrkSQL?

Why would you like to do so? I think there's no need for us to explicitly ask 
for a forEachPartition in spark sql because tungsten is smart enough to figure 
out whether a sql operation could be applied on each partition or there has to 
be a shuffle.

On Sun, Jun 25, 2017 at 11:32 PM, jeff saremi 
<jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote:

You can do a map() using a select and functions/UDFs. But how do you process a 
partition using SQL?


Reply via email to