Re: repartition in Spark

2020-11-09 Thread Mich Talebzadeh
As a generic answer in a distributed environment like spark, making sure that data is distributed evenly among all nodes (assuming every node is the same or similar) can help performance repartition thus controls the data distribution among all nodes. However, it is not that straight forward. Your

repartition in Spark

2020-11-09 Thread ashok34...@yahoo.com.INVALID
Hi, Just need some advise. - When we have multiple spark nodes running code, under what conditions a repartition make sense? - Can we repartition and cache the result --> df = spark.sql("select from ...").repartition(4).cache - If we choose a repartition (4), will that repartition ap