from:"Patanachai Tangchaisin"

Re: Removing empty partitions before we write to HDFS

2015-08-06 Thread Patanachai Tangchaisin

Currently, I use rdd.isEmpty() Thanks, Patanachai On 08/06/2015 12:02 PM, gpatcham wrote: Is there a way to filter out empty partitions before I write to HDFS other than using reparition and colasce ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Rem

Re: Spark Kinesis Checkpointing/Processing Delay

2015-08-06 Thread Patanachai Tangchaisin

Hi, I actually run into the same problem although our endpoint is not ElasticSearch. When the spark job is dead, we lose some data because Kinesis checkpoint is already beyond the last point that spark is processed. Currently, our workaround is to use spark's checkpoint mechanism with write