i have seen similar behavior in my standalone cluster, I tried to increase the number of partitions and at some point it seems all the executors or worker nodes start to make parallel connection to remote data store. But it would be nice if someone could point us to some references on how to make proper use of the repartition of data from a remote data store read by spark SQL, thanks a lot
zhou > On Jul 14, 2016, at 9:18 AM, Jakub Stransky <stransky...@gmail.com> wrote: > > <image.png> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org