Re: S3 Bucket Source

2019-04-15 Thread Addison Higham
Hi Steven, Usually, what you want to do is something like this: Instead of a `SourceFunction` use a `RichParallelSourceFunction` and as an argument to that function, you might have a list of prefixes you want to consume in parallel. The `RichParallelSourceFunction` has a a method called `getRunt

Re: DynamoDB as Sink

2019-03-22 Thread Addison Higham
ult] = ??? // capture any results in the BatchWriteItemResult that didn't get processed def getUnprocessedItems(result: List[BatchWriteItemResult]): ListBuffer[T] = ??? } On Fri, Mar 22, 2019 at 10:59 AM Addison Higham wrote: > Hi there, > > We have implemented a dynamo sink, have h

Re: DynamoDB as Sink

2019-03-22 Thread Addison Higham
Hi there, We have implemented a dynamo sink, have had no real issues, but obviously, it is at-least-once and so we work around that by just structuring our transformations so that they produce idempotent writes. What we do is pretty similar to what you suggest, we collect the records in a buffer

Re: Using Flink in an university course

2019-03-04 Thread Addison Higham
Hi there, As far as a runtime for students, it seems like docker is your best bet. However, you could have them instead package a jar using some interface (for example, see https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/packaging.html, which details the `Program` interface) and th

Re: S3 StreamingFileSink never completes multipart uploads

2019-01-07 Thread Addison Higham
: > Hi Addison, > > From the information that Nick provides, how can you be sure that the root > cause is the same? > > Cheers, > Kostas > > On Fri, Jan 4, 2019, 22:10 Addison Higham >> Hi Nick, >> >> This is a known issue with 1.7.0, I have an issue open

Re: S3 StreamingFileSink never completes multipart uploads

2019-01-04 Thread Addison Higham
Hi Nick, This is a known issue with 1.7.0, I have an issue opened up here: https://issues.apache.org/jira/browse/FLINK-11187 On Wed, Jan 2, 2019 at 5:00 PM Martin, Nick wrote: > I’m running on Flink 1.7.0 trying to use the StreamingFileSink with an S3A > URI. What I’m seeing is that whenever

Re: StreamingFileSink causing AmazonS3Exception

2018-12-17 Thread Addison Higham
Oh this is timely! I hope I can save you some pain Kostas! (cc-ing to flink dev to get feedback there for what I believe to be a confirmed bug) I was just about to open up a flink issue for this after digging (really) deep and figuring out the issue over the weekend. The problem arises due the

Re: Flink Kafka to BucketingSink to S3 - S3Exception

2018-11-05 Thread Addison Higham
Hi there, This is going to be a bit of a long post, but I think there has been a lot of confusion around S3, so I am going to go over everything I know in hopes that helps. As mentioned by Rafi, The BucketingSink does not work for file systems like S3, as the bucketing sink makes some assumptions

Re: Using a ProcessFunction as a "Source"

2018-11-01 Thread Addison Higham
via Flink's new feature "broadcast state". See this blog post for > details.[1] > 3) Mix control messages with normal messages in the same message flow. > After the control message is parsed, the corresponding action is taken. Of > course, this kind of program is no

Using a ProcessFunction as a "Source"

2018-08-24 Thread Addison Higham
HI, I am writing a parallel source function that ideally needs to receive some messages as control information (specifically, a state message on where to start reading from a kinesis stream). As far as I can tell, there isn't a way to make a sourceFunction receive input (which makes sense) so I am