Someone please correct me if I am wrong as I am still rather green to spark, however it appears that through the S3 notification mechanism described below, you can publish events to SQS and use SQS as a streaming source into spark. The project at https://github.com/imapi/spark-sqs-receiver appears to provide libraries for doing this.
Hope this helps. Sent from my iPhone > On Apr 9, 2016, at 9:55 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > > Nezih, > > This looks like a good alternative to having the Spark Streaming job check > for new files on its own. Do you know if there is a way to have the Spark > Streaming job get notified with the new file information and act upon it? > This can reduce the overhead and cost of polling S3. Plus, I can use this to > notify and kick off Lambda to process new data files and make them ready for > Spark Streaming to consume. This will also use notifications to trigger. I > just need to have all incoming folders configured for notifications for > Lambda and all outgoing folders for Spark Streaming. This sounds like a > better setup than we have now. > > Thanks, > Ben > >> On Apr 9, 2016, at 12:25 AM, Nezih Yigitbasi <nyigitb...@netflix.com> wrote: >> >> While it is doable in Spark, S3 also supports notifications: >> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html >> >> >>> On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande <nlaucha...@gmail.com> wrote: >>> Hi Benjamin, >>> >>> I have done it . The critical configuration items are the ones below : >>> >>> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl", >>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem") >>> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", >>> AccessKeyId) >>> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", >>> AWSSecretAccessKey) >>> >>> val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder") >>> >>> This code will probe for new S3 files created in your every batch interval. >>> >>> Thanks, >>> Natu >>> >>>> On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >>>> Has anyone monitored an S3 bucket or directory using Spark Streaming and >>>> pulled any new files to process? If so, can you provide basic Scala coding >>>> help on this? >>>> >>>> Thanks, >>>> Ben >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >