Re: repartition in df vs partitionBy in df

2019-04-24 Thread moqi
Hello, There is another link here that I hope will help you. https://stackoverflow.com/questions/33831561/pyspark-repartition-vs-partitionby In particular, when you are faced with possible data skew or have some partitioned parameters that need to be obtained at runtime, you can refer to this lin

Re: repartition in df vs partitionBy in df

2019-04-24 Thread moqi
Hello, there is another link to discuss the difference between the two methods. Https://stackoverflow.com/questions/33831561/pyspark-repartition-vs-partitionby In particular, when you are faced with possible data skew or have some partitioned parameters that need to be obtained at runtime, you ca

Re: repartition in df vs partitionBy in df

2019-04-24 Thread rajat kumar
hello, thanks for quick reply. got it . partitionBy is to create something like hive partitions. but when do we use repartition actually? how to decide whether to do repartition or not? because in development we are getting sample data. also what number should I give while repartition. thanks On

Re: repartition in df vs partitionBy in df

2019-04-24 Thread moqi
Hello, I think you can refer to this link and hope to help you. https://stackoverflow.com/questions/40416357/spark-sql-difference-between-df-repartition-and-dataframewriter-partitionby/40417992 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: repartition in df vs partitionBy in df

2019-04-24 Thread rajat kumar
Hi All, Can anyone explain? thanks rajat On Sun, 21 Apr 2019, 00:18 kumar.rajat20del Hi Spark Users, > > repartition and partitionBy seems to be very same in Df. > In which scenario we use one? > > As per my understanding repartition is very expensive operation as it needs > full shuffle then wh

repartition in df vs partitionBy in df

2019-04-20 Thread kumar.rajat20del
Hi Spark Users, repartition and partitionBy seems to be very same in Df. In which scenario we use one? As per my understanding repartition is very expensive operation as it needs full shuffle then when do we use repartition ? Thanks Rajat -- Sent from: http://apache-spark-user-list.1001560.n