All,
I have more of a general Scala JSON question.
I have setup a notification on the S3 source bucket that triggers a Lambda
function to unzip the new file placed there. Then, it saves the unzipped CSV
file into another destination bucket where a notification is sent to a SQS
topic. The conte
Ah, I spoke too soon.
I thought the SQS part was going to be a spark package. It looks like it has be
compiled into a jar for use. Am I right? Can someone help with this? I tried to
compile it using SBT, but I’m stuck with a SonatypeKeys not found error.
If there’s an easier alternative, please
This was easy!
I just created a notification on a source S3 bucket to kick off a Lambda
function that would decompress the dropped file and save it to another S3
bucket. In return, this S3 bucket has a notification to send a SNS message to
me via email. I can just as easily setup SQS to be the
why not use AWS Lambda?
Regards,
Gourav
On Fri, Apr 8, 2016 at 8:14 PM, Benjamin Kim wrote:
> Has anyone monitored an S3 bucket or directory using Spark Streaming and
> pulled any new files to process? If so, can you provide basic Scala coding
> help on this?
>
> Thanks,
> Ben
>
>
> --
Natu, Benjamin,
With this mechanism you can configure notifications for *buckets* (if you
only care about some key prefixes you can take a look at object key name
filtering, see the docs) for various event types, and then these events
can be published to SNS, SQS or Lambdas. I think using SQS as a
Do you know if textFileStream can see if new files are created underneath a
whole bucket?
Only at the level of the folder that you specify . They don't do
subfolders. So your approach would be detecting everything under path
s3://bucket/path/2016040902_data.csv
Also, will Spark Streaming not p
This is awesome! I have someplace to start from.
Thanks,
Ben
> On Apr 9, 2016, at 9:45 AM, programminggee...@gmail.com wrote:
>
> Someone please correct me if I am wrong as I am still rather green to spark,
> however it appears that through the S3 notification mechanism described
> below, you
Someone please correct me if I am wrong as I am still rather green to spark,
however it appears that through the S3 notification mechanism described below,
you can publish events to SQS and use SQS as a streaming source into spark. The
project at https://github.com/imapi/spark-sqs-receiver appea
Nezih,
This looks like a good alternative to having the Spark Streaming job check for
new files on its own. Do you know if there is a way to have the Spark Streaming
job get notified with the new file information and act upon it? This can reduce
the overhead and cost of polling S3. Plus, I can
Natu,
Do you know if textFileStream can see if new files are created underneath a
whole bucket? For example, if the bucket name is incoming and new files
underneath it are 2016/04/09/00/00/01/data.csv and
2016/04/09/00/00/02/data/csv, will these files be picked up? Also, will Spark
Streaming n
Can you elaborate a bit more in your approach using s3 notifications ? Just
curious. dealing with a similar issue right now that might benefit from
this.
On 09 Apr 2016 9:25 AM, "Nezih Yigitbasi" wrote:
> While it is doable in Spark, S3 also supports notifications:
> http://docs.aws.amazon.com/Am
While it is doable in Spark, S3 also supports notifications:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande wrote:
> Hi Benjamin,
>
> I have done it . The critical configuration items are the ones below :
>
> ssc.sparkCo
Hi Benjamin,
I have done it . The critical configuration items are the ones below :
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",
AccessKeyId)
ssc.spar
13 matches
Mail list logo