Have a look here https://spark.apache.org/docs/latest/tuning.html
Thanks Best Regards On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng <pf...@cn.ibm.com> wrote: > Hi, Spark Experts > > I have played with Spark several weeks, after some time testing, a reduce > operation of DataFrame cost 40s on a cluster with 5 datanode executors. > And the back-end rows is about 6,000, is this a normal case? Such > performance looks too bad because in Java a loop for 6,000 rows cause just > several seconds > > I'm wondering any document I should read to make the job much more fast? > > > > > Thanks in advance > Proust >