RE: DStream how many RDD's are created by batch

2016-04-12 Thread David Newberger
Hi Natu, I believe you are correct one RDD would be created for each file. Cheers, David From: Natu Lauchande [mailto:nlaucha...@gmail.com] Sent: Tuesday, April 12, 2016 1:48 PM To: David Newberger Cc: user@spark.apache.org Subject: Re: DStream how many RDD's are created by batch Hi

Re: DStream how many RDD's are created by batch

2016-04-12 Thread Natu Lauchande
Hi David, Thanks for you answer. I have a follow up question : I am using textFileStream , and listening in an S3 bucket for new files to process. Files are created every 5 minutes and my batch interval is 2 minutes . Does it mean that each file will be for one RDD ? Thanks, Natu On Tue, Apr

RE: DStream how many RDD's are created by batch

2016-04-12 Thread David Newberger
Hi, Time is usually the criteria if I’m understanding your question. An RDD is created for each batch interval. If your interval is 500ms then an RDD would be created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds. Cheers, David From: Natu Lauchande [mailto:nlaucha...@g