Does Flink use EMRFS?

2020-05-22 Thread Peter Groesbeck
Hi, I'm using Flink StreamingFileSink running in one AWS account (A) to another (B). I'm also leveraging a SecurityConfiguration in the CFN to assume a role in account B so that when I write there the files are owned by account B which then in turn allows account B to delegate to other AWS account

Re: 回复:Re: Writing _SUCCESS Files (Streaming and Batch)

2020-05-12 Thread Peter Groesbeck
mark (I sent it > downstream only for metric purposes) but want to tell impala after each > commit which partitions changed, regardless of the value from the watermark. > > Best regards > Theo > > -- > *Von: *"Yun Gao" > *An: *"Rober

Writing _SUCCESS Files (Streaming and Batch)

2020-05-04 Thread Peter Groesbeck
I am replacing an M/R job with a Streaming job using the StreamingFileSink and there is a requirement to generate an empty _SUCCESS file like the old Hadoop job. I have to implement a similar Batch job to read from backup files in case of outages or downtime. The Batch job question was answered he

Managing Job Deployments in Production

2019-10-17 Thread Peter Groesbeck
How are folks here managing deployments in production? We are deploying Flink jobs on EMR manually at the moment but would like to move towards some form of automation before anything goes into production. Adding additional EMR Steps to a long running cluster to deploy or update jobs seems like th

Re: [External] Flink 1.7.1 on EMR metrics

2019-05-30 Thread Peter Groesbeck
Hi Padarn for what it's worth I am using DataDog metrics on EMR with Flink 1.7.1 and this here my flink-conf configuration: - Classification: flink-conf ConfigurationProperties: metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter metrics.reporter.dghttp.ap

Flink vs KStreams

2019-05-20 Thread Peter Groesbeck
Hi folks, I'm hoping to get some deeper clarification on which framework, Flink or KStreams, to use in a given scenario. I've read over the following blog article which I think sets a great baseline understanding of the differences between those frameworks but I would like to get some outside opin

Re: DateTimeBucketAssigner using Element Timestamp

2019-05-03 Thread Peter Groesbeck
u would have to check whether the bucket assignment and file naming > is completely deterministic) or before reprocessing from backup remove the > dirty files from the crashed job. > > Piotrek > > On 2 May 2019, at 23:10, Peter Groesbeck > wrote: > > Hi all, > > I have

DateTimeBucketAssigner using Element Timestamp

2019-05-02 Thread Peter Groesbeck
Hi all, I have an application that reads from various Kafka topics and writes parquet files to corresponding buckets on S3 using StreamingFileSink with DateTimeBucketAssigner. The upstream application that writes to Kafka also writes records as gzipped json files to date bucketed locations on S3 a