Re: Jar Uploads in High Availability (Flink 1.7.2)

2019-10-15 Thread Ravi Bhushan Ratnakar
Hi, i was also experiencing with the similar behavior. I adopted following approach - used a distributed file system(in my case aws efs) and set the attribute "web.upload.dir", this way both the job manager have same location. - on the load balancer side(aws elb), i used "readiness p

Re: Kinesis Connector and Savepoint/Checkpoint restore.

2019-10-15 Thread Ravi Bhushan Ratnakar
Hi, I am also facing the same problem. I am using Flink 1.9.0 and consuming from Kinesis source with retention of 1 day. I am observing that when the job is submitted with "latest" initial stream position, the job starts well and keep on processing data from all the shards for very long period of

Re: Kinesis Connector and Savepoint/Checkpoint restore.

2019-10-15 Thread Yun Tang
Hi Steven If you restore savepoint/checkpoint successfully, I think this might due to the shard wasn't discovered in the previous run, therefore it would be consumed from the beginning. Please refer to the implementation here: [1] [1] https://github.com/apache/flink/blob/2c411686d23f456cdc502a

Verifying correctness of StreamingFileSink (Kafka -> S3)

2019-10-15 Thread amran dean
I am evaluating StreamingFileSink (Kafka 0.10.11) as a production-ready alternative to a current Kafka -> S3 solution. Is there any way to verify the integrity of data written in S3? I'm confused how the file names (e.g part-1-17) map to Kafka partitions, and further unsure how to ensure that no K

Jar Uploads in High Availability (Flink 1.7.2)

2019-10-15 Thread Martin, Nick J [US] (IS)
I'm seeing that when I upload a jar through the rest API, it looks like only the Jobmanager that received the upload request is aware of the newly uploaded jar. That worked fine for me in older versions where all clients were redirected to connect to the leader, but now that each Jobmanager acce

Kinesis Connector and Savepoint/Checkpoint restore.

2019-10-15 Thread Steven Nelson
Hello, we currently use Flink 1.9.0 with Kinesis to process data. We have extended data retention on the Kinesis stream, which gives us 7 days of data. We have found that when a savepoint/checkpoint is restored that it appears to be restarting the Kinesis Consumer from the start of the stream. Th

Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-15 Thread Vishal Santoshi
2 queries 1. I am trying to configure MM2 to start replicating from the head ( latest of the topic ) . Should auto.offset.reset = latest in mm2.properties be enough ? Unfortunately MM2 will start from the EARLIEST. 2. I do not have "Authorizer is configured on the broker " and see this exce

JDBC Table Sink doesn't seem to sink to database.

2019-10-15 Thread John Smith
Hi, using 1.8.0 I have the following job: https://pastebin.com/ibZUE8Qx So the job does the following steps... 1- Consume from Kafka and return JsonObject 2- Map JsonObject to MyPojo 3- Convert The stream to a table 4- Insert the table to JDBC sink table 5- Print the table. - The job seems to wo

Re: Is it possible to get Flink job name in an operator?

2019-10-15 Thread Aleksandar Mastilovic
I got mine in AbstractStreamOperator.open() method through this.getContainingTask().getEnvironment().getJobID(); > On Oct 14, 2019, at 11:53 PM, 马阳阳 wrote: > > As the title. Is it possible now? Or if we can do something to achieve this. > I tried to put the job name into the ExecutionConfig.G

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-15 Thread Vijay Balakrishnan
Hi Theo, You were right. For some reason(I still haven't figured it out) but the FilterFunction was causing issues. I commented it out and it started getting into the add() method of the aggregate method. /*kinesisStream = kinesisStream.filter((FilterFunction>) inputMap -> { Object groupByValu

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-15 Thread Vijay Balakrishnan
Hi Theo, It gets to the FilterFunction during the creation of the ExecutionGraph initially but not during the runtime when recs are streaming in.So, it is not getting that far- seems to be stuck in the final SingleOutputStreamOperator> filteredKinesisStream = kinesisStream.filter code. Doesn't

Re: Discard message on deserialization errors.

2019-10-15 Thread John Smith
Ah ok thanks! On Sat, 12 Oct 2019 at 11:13, Zhu Zhu wrote: > I mean the Kafka source provided in Flink can correctly ignores null > deserialized values. > > isEndOfStream allows you to control when to end the input stream. > If it is used for running infinite stream jobs, you can simply return >

RE: FLINK-13497 / "Could not create file for checking if truncate works" / HDFS

2019-10-15 Thread Adrian Vasiliu
Hi, FYI we've switched to a different Hadoop server, and the issue vanished... It does look as the cause was on hadoop side. Thanks again Congxian.Adrian    - Original message -From: "Adrian Vasiliu" To: qcx978132...@gmail.comCc: user@flink.apache.orgSubject: [EXTERNAL] RE: FLINK-13497 / "C

Re: Is it possible to get Flink job name in an operator?

2019-10-15 Thread Zhu Zhu
I think ExecutionConfig.GlobalJobParameters is the way to do this if you want to retrieve it in runtime. Or you can just pass the name to each operator you implement to have it serialized together with the udf. Thanks, Zhu Zhu 马阳阳 于2019年10月15日周二 下午3:11写道: > As the title. Is it possible now? Or

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-15 Thread Theo Diefenthal
Hi Vijay, Maybe a stupid question, but according to your comments, the code works fine up till a "flatMap" operation. It seems that this flatMap is directly followed by a filter-Function in the method createAggregatedMonitoringGroupingWindowStream1. Is ist maybe filtering out all events? Or i

Fwd: Is it possible to get Flink job name in an operator?

2019-10-15 Thread 马阳阳
As the title. Is it possible now? Or if we can do something to achieve this. I tried to put the job name into the ExecutionConfig.GlobalJobParameters. But it is not possible to get the job name before Environment.execute() is called. Best regards, mayangyang