Hi Natu, I believe you are correct one RDD would be created for each file.
Cheers, David From: Natu Lauchande [mailto:nlaucha...@gmail.com] Sent: Tuesday, April 12, 2016 1:48 PM To: David Newberger Cc: user@spark.apache.org Subject: Re: DStream how many RDD's are created by batch Hi David, Thanks for you answer. I have a follow up question : I am using textFileStream , and listening in an S3 bucket for new files to process. Files are created every 5 minutes and my batch interval is 2 minutes . Does it mean that each file will be for one RDD ? Thanks, Natu On Tue, Apr 12, 2016 at 7:46 PM, David Newberger <david.newber...@wandcorp.com<mailto:david.newber...@wandcorp.com>> wrote: Hi, Time is usually the criteria if I’m understanding your question. An RDD is created for each batch interval. If your interval is 500ms then an RDD would be created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds. Cheers, David From: Natu Lauchande [mailto:nlaucha...@gmail.com<mailto:nlaucha...@gmail.com>] Sent: Tuesday, April 12, 2016 7:09 AM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: DStream how many RDD's are created by batch Hi, What's the criteria for the number of RDD's created for each micro bath iteration ? Thanks, Natu