Setting AWS endponts per transform

2020-09-24 Thread tclemons
We're currently working on getting our Beam app working with localstack (and potentially other AWS regions).  We're using SqsIO and S3 as part of our pipeline (with other AWS components likely to come into the mix).  While I could cast the PipelineOptions to AwsOptions and then call AwsOptions.

SqsIO exception when moving to AWS2 SDK

2020-09-30 Thread tclemons
I've been attempting to migrate to the AWS2 SDK (version 2.24.0) on Apache Spark 2.4.7. However,when switching over to the new API and running it I keep getting the following exceptions:2020-09-30 20:38:32,428 ERROR streaming.StreamingContext: Error starting the context, marking it as stoppedo

Re: Support streaming side-inputs in the Spark runner

2020-10-02 Thread tclemons
For clarification, is it just streaming side inputs that present an issue for SparkRunner or are there other areas that need work?  We've started work on a Beam-based project that includes both streaming and batch oriented work and a Spark cluster was our choice due to the perception that it cou

Re: SqsIO exception when moving to AWS2 SDK

2020-10-02 Thread tclemons
The app itself is developed in Clojure, but here's the gist of how it's getting configured:     AwsCredentialsProvider credProvider = EnvrionmentVariableCredentialsProvider.create();         pipeline.apply(   SqsIO.read()     .withQueueUrl(url)     .withSqsClientProvider(credProvide

Re: SqsIO exception when moving to AWS2 SDK

2020-10-06 Thread tclemons
To test this, I tried a workaround of an implementation of AwsCredentialsProvider that also implemented Serializable.  The resolveCredentials method of this class would call that static create function of DefaultCredentialsProvider and forward the task to that.  There are no fields in the class

Re: SqsIO exception when moving to AWS2 SDK

2020-10-06 Thread tclemons
Yep, same stacktrace.  NPE originating from line 41 of SqsUnboundedSource.java. I'll see about getting a remote debugger attached to the process. Oct 6, 2020, 14:06 by aromanenko@gmail.com: > Hmm, do you have the same stack trace in this case? > > Can you debug it in runtime and make sure th

S3 Multi-Part Uploads

2020-10-21 Thread tclemons
We've been attempting to get writes to an S3 store working with the AWS2 SDK, but it appears that the S3 file system was removed in the process of upgrading from AWS1.  Have not yet found a concrete example of how to work with S3, so we have a few questions: * Is HadoopFileSystem the preferred

Re: SqsIO exception when moving to AWS2 SDK

2020-12-16 Thread tclemons
Thanks for your work on this!  We've since gone back to the original AWS SDK, but I'll give AWS2 another try once 2.27.0 is out. Dec 15, 2020, 04:17 by aromanenko@gmail.com: > Too fast “Send” button click =) > > You can find snapshot artifacts here: > https://repository.apache.org/content/re

retaining filename during file processing

2021-05-19 Thread tclemons
I'm writing app that processing an unbound stream of filenames and then catalogs them.  What I'd like to do is to parse the files using AvroIO, but have each record entry paired with the original filename as a key. In the past I've used the combo FileIO.matchAll() -> FileIO.readMatches() -> Avr