Re: Saving Data only if Dstream is not empty

2014-12-10 Thread manasdebashiskar
Can you do a countApprox as a condition to check non-empty RDD? ..Manas - Manas Kar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Data-only-if-Dstream-is-not-empty-tp20587p20617.html Sent from the Apache Spark User List mailing list archive at

Re: Saving Data only if Dstream is not empty

2014-12-09 Thread Gerard Maas
We have a similar case in which we don't want to save data to Cassandra if the data is empty. In our case, we filter the initial DStream to process messages that go to a given table. To do so, we're using something like this: dstream.foreachRDD{ (rdd,time) => tables.foreach{ table => val

Re: Saving Data only if Dstream is not empty

2014-12-09 Thread Sean Owen
I don't believe you can do this unless you implement the save to HDFS logic yourself. To keep the semantics consistent, these saveAs* methods will always output a file per partition. On Mon, Dec 8, 2014 at 11:53 PM, Hafiz Mujadid wrote: > Hi Experts! > > I want to save DStream to HDFS only if it