You can always call rdd.isEmpty()
Andy
private static void save(JavaDStream<String> jsonRdd, String outputURI)
{
jsonTweets.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() {
private static final long serialVersionUID = 1L;
@Override
public void call(JavaRDD<String> rdd, Time time) throws
Exception {
if(!rdd.isEmpty()) {
String dirPath = outputURI + "-" + time.milliseconds();
rdd.saveAsTextFile(dirPath);
}
}
});
From: Sebastian Piu <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, February 11, 2016 at 1:19 PM
To: "Shixiong (Ryan) Zhu" <[email protected]>
Cc: Sebastian Piu <[email protected]>, "user @spark"
<[email protected]>
Subject: Re: Skip empty batches - spark streaming
>
> Yes, and as far as I recall it also has partitions (empty) which screws up the
> isEmpty call if the rdd has been transformed down the line. I will have a look
> tomorrow at the office and see if I can collaborate
>
> On 11 Feb 2016 9:14 p.m., "Shixiong(Ryan) Zhu" <[email protected]>
> wrote:
>> Yeah, DirectKafkaInputDStream always returns a RDD even if it's empty. Feel
>> free to send a PR to improve it.
>>
>> On Thu, Feb 11, 2016 at 1:09 PM, Sebastian Piu <[email protected]>
>> wrote:
>>>
>>> I'm using the Kafka direct stream api but I can have a look on extending it
>>> to have this behaviour
>>>
>>> Thanks!
>>>
>>> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <[email protected]>
>>> wrote:
>>>> Are you using a custom input dstream? If so, you can make the `compute`
>>>> method return None to skip a batch.
>>>>
>>>> On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu <[email protected]>
>>>> wrote:
>>>>>
>>>>> I was wondering if there is there any way to skip batches with zero events
>>>>> when streaming?
>>>>> By skip I mean avoid the empty rdd from being created at all?
>>>>
>>