Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson
Cc: "user @spark" Subject: Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS > I agree every time an OS file is created, it requires a context switch plus a > file descriptor. It is probably more time consuming to open and close th

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Mich Talebzadeh
te: Tuesday, April 5, 2016 at 3:49 PM > To: Andrew Davidson > Cc: "user @spark" > Subject: Re: Saving Spark streaming RDD with saveAsTextFiles ends up > creating empty files on HDFS > > Thanks Andy. > > Do we know if this is a known bug or simply a feature

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson
g RDD with saveAsTextFiles ends up creating empty files on HDFS > Thanks Andy. > > Do we know if this is a known bug or simply a feature that on the face of it > Spark cannot save RDD output to a text file? > > > > Dr Mich Talebzadeh > > > > LinkedI

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Mich Talebzadeh
Thanks Andy. Do we know if this is a known bug or simply a feature that on the face of it Spark cannot save RDD output to a text file? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson
Hi Mich Yup I was surprised to find empty files. Its easy to work around. Note I should probably use coalesce() and not repartition() In general I found I almost always need to reparation. I was getting thousands of empty partitions. It was really slowing my system down. private static void s