Applying a limit after orderBy of big dataframe hangs spark

Saif.A.Ellafi Fri, 05 Aug 2016 11:54:40 -0700

Hi all,

I am working with a 1.5 billon rows dataframe in a small cluster and trying to 
apply an orderBy operation by one of the Long Types columns.


If I limit such output to some number, say 5 millon, then trying to count, 
persist or store the dataframe makes spark crash with losing executors and hang 
ups.
Not limiting the dataframe after the order by operation works normally, i.e. it 
works fine when trying to write the 1.5 billon rows again.

Any thoughts? Using spark 1.6.0 scala 2.11

Saif

Applying a limit after orderBy of big dataframe hangs spark

Reply via email to