While it is doable in Spark, S3 also supports notifications: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande <nlaucha...@gmail.com> wrote: > Hi Benjamin, > > I have done it . The critical configuration items are the ones below : > > ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl", > "org.apache.hadoop.fs.s3native.NativeS3FileSystem") > ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", > AccessKeyId) > > ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", > AWSSecretAccessKey) > > val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder") > > This code will probe for new S3 files created in your every batch interval. > > Thanks, > Natu > > On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Has anyone monitored an S3 bucket or directory using Spark Streaming and >> pulled any new files to process? If so, can you provide basic Scala coding >> help on this? >> >> Thanks, >> Ben >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >