subject:"RE\: is repartition very cost"

Re: is repartition very cost

2015-12-09 Thread Daniel Siegmann

Each node can have any number of partitions. Spark will try to have a node process partitions which are already on the node for best performance (if you look at the list of tasks in the UI, look under the locality level column). As a rule of thumb, you probably want 2-3 times the number of partiti

Re: is repartition very cost

2015-12-08 Thread Zhiliang Zhu

Thanks very much for Yong's help. Sorry that for one more issue, is it that different partitions must be in different nodes? that is, each node would only have one partition, in cluster mode ... On Wednesday, December 9, 2015 6:41 AM, "Young, Matthew T" wrote: #yiv1938266569 #yiv193

RE: is repartition very cost

2015-12-08 Thread Young, Matthew T

Shuffling large amounts of data over the network is expensive, yes. The cost is lower if you are just using a single node where no networking needs to be involved to do the repartition (using Spark as a multithreading engine). In general you need to do performance testing to see if a repartition

Re: is repartition very cost

Re: is repartition very cost

RE: is repartition very cost

3 matches

Site Navigation

Mail list logo

Footer information