Hi Natu, I believe you are correct one RDD would be created for each file.
Cheers, David From: Natu Lauchande [mailto:[email protected]] Sent: Tuesday, April 12, 2016 1:48 PM To: David Newberger Cc: [email protected] Subject: Re: DStream how many RDD's are created by batch Hi David, Thanks for you answer. I have a follow up question : I am using textFileStream , and listening in an S3 bucket for new files to process. Files are created every 5 minutes and my batch interval is 2 minutes . Does it mean that each file will be for one RDD ? Thanks, Natu On Tue, Apr 12, 2016 at 7:46 PM, David Newberger <[email protected]<mailto:[email protected]>> wrote: Hi, Time is usually the criteria if I’m understanding your question. An RDD is created for each batch interval. If your interval is 500ms then an RDD would be created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds. Cheers, David From: Natu Lauchande [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, April 12, 2016 7:09 AM To: [email protected]<mailto:[email protected]> Subject: DStream how many RDD's are created by batch Hi, What's the criteria for the number of RDD's created for each micro bath iteration ? Thanks, Natu
