My specific and immediate need is this: We have a native function wrapped in JNI. To increase performance we'd like to avoid calling it record by record. mapPartitions() give us the ability to invoke this in bulk. We're looking for a similar approach in SQL.
________________________________ From: Ryan <ryan.hd....@gmail.com> Sent: Sunday, June 25, 2017 7:18:32 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: What is the equivalent of mapPartitions in SpqrkSQL? Why would you like to do so? I think there's no need for us to explicitly ask for a forEachPartition in spark sql because tungsten is smart enough to figure out whether a sql operation could be applied on each partition or there has to be a shuffle. On Sun, Jun 25, 2017 at 11:32 PM, jeff saremi <jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote: You can do a map() using a select and functions/UDFs. But how do you process a partition using SQL?