Hi Natu,
I believe you are correct one RDD would be created for each file.
Cheers,
David
From: Natu Lauchande [mailto:nlaucha...@gmail.com]
Sent: Tuesday, April 12, 2016 1:48 PM
To: David Newberger
Cc: user@spark.apache.org
Subject: Re: DStream how many RDD's are created by batch
Hi
Hi David,
Thanks for you answer.
I have a follow up question :
I am using textFileStream , and listening in an S3 bucket for new files to
process. Files are created every 5 minutes and my batch interval is 2
minutes .
Does it mean that each file will be for one RDD ?
Thanks,
Natu
On Tue, Apr
Hi,
Time is usually the criteria if I’m understanding your question. An RDD is
created for each batch interval. If your interval is 500ms then an RDD would be
created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds.
Cheers,
David
From: Natu Lauchande [mailto:nlaucha...@g