Hi Natu,

I believe you are correct one RDD would be created for each file.

Cheers,

David

From: Natu Lauchande [mailto:nlaucha...@gmail.com]
Sent: Tuesday, April 12, 2016 1:48 PM
To: David Newberger
Cc: user@spark.apache.org
Subject: Re: DStream how many RDD's are created by batch

Hi David,
Thanks for you answer.
I have a follow up question :
I am using textFileStream , and listening in an S3 bucket for new files to 
process.  Files are created every 5 minutes and my batch interval is 2 minutes .

Does it mean that each file will be for one RDD ?

Thanks,
Natu

On Tue, Apr 12, 2016 at 7:46 PM, David Newberger 
<david.newber...@wandcorp.com<mailto:david.newber...@wandcorp.com>> wrote:
Hi,

Time is usually the criteria if I’m understanding your question. An RDD is 
created for each batch interval. If your interval is 500ms then an RDD would be 
created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds.

Cheers,

David

From: Natu Lauchande [mailto:nlaucha...@gmail.com<mailto:nlaucha...@gmail.com>]
Sent: Tuesday, April 12, 2016 7:09 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: DStream how many RDD's are created by batch

Hi,
What's the criteria for the number of RDD's created for each micro bath 
iteration  ?

Thanks,
Natu

Reply via email to