subscribe

2020-04-02 Thread Giriraj Chauhan

Re: state schema evolution for case classes

2020-04-02 Thread Tzu-Li (Gordon) Tai
Hi, I see the problem that you are bumping into as the following: - In your previous job, you seem to be falling back to Kryo for the state serialization. - In your new job, you are trying to change that to use a custom serializer. You can confirm this by looking at the stack trace of the "new st

Re: Questions regarding Key Managed state

2020-04-02 Thread Timo Walther
Hi Kristoff, case 1: first of all Flink groups keys internally into so-called "key groups" for reducing the management overhead. The maximum parallelism decides about the number of key groups. When performing a rescale, the key groups are basically distributed using some consistent hashing al

Re: Perform processing only when watermark updates, buffer data otherwise

2020-04-02 Thread Timo Walther
Hi Manas, first of all, after assigning watermarks at the source level, usually Flink operators make sure to handle the watermarks. In case of a `union()`, the subsequent operator will increment its internal event-time clock and emit a new watermark only if all input streams (and their paral

Re: Making job fail on Checkpoint Expired?

2020-04-02 Thread Timo Walther
Hi Robin, this is a very good observation and maybe even unintended behavior. Maybe Arvid in CC is more familiar with the checkpointing? Regards, Timo On 02.04.20 15:37, Robin Cassan wrote: Hi all, I am wondering if there is a way to make a flink job fail (not cancel it) when one or sever

Re: state schema evolution for case classes

2020-04-02 Thread Apoorv Upadhyay
Hi Gordon, thanks for your response , So I have done a POC on state migration using avro, it seems it works out well. I am using custom avro serializer (with avro schema and (TypeSerializer, TypeSerializerSnapshot) and based on that written my own custom serializer for the scala case class that I

Making job fail on Checkpoint Expired?

2020-04-02 Thread Robin Cassan
Hi all, I am wondering if there is a way to make a flink job fail (not cancel it) when one or several checkpoints have failed due to being expired (taking longer than the timeout) ? I am using Flink 1.9.2 and have set ` *setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick. Looking

Re: Perform processing only when watermark updates, buffer data otherwise

2020-04-02 Thread Manas Kale
Also - What happens to watermarks after a union operation? Do I have to assignTimestampsAndWatermarks() again? I guess I will have to since multiple streams are being combined and Flink needs to know how to resolve individual watermarks. - What is the difference between union() and

Re: Issue with single job yarn flink cluster HA

2020-04-02 Thread Andrey Zagrebin
Hi Dinesh, Thanks for sharing the logs. There were couple of HA fixes since 1.7, e.g. [1] and [2]. I would suggest to try Flink 1.10. If the problem persists, could you also find the logs of the failed Job Manager before the failover? Best, Andrey [1] https://jira.apache.org/jira/browse/FLINK-14

Re: Savepoint Location from Flink REST API

2020-04-02 Thread Ufuk Celebi
Sorry for the copy & paste error in my earlier message. 🙄 I agree with Robert. On 2. Apr 2020, at 11:06, Robert Metzger wrote: Good catch!. Yes, you can add this to FLINK-16696. On Wed, Apr 1, 2020 at 10:59 PM Aaron Langford wrote: > All, it looks like the actual return structure from the API

Flink long Running Jon on AWS EMR/Amazon Kinesis Data Analytics

2020-04-02 Thread KristoffSC
Hi, I'm interested with uising Managed Flink service. does anyone has an experience with hosting long running (generally 24/7) Flink jobs on AWS EMR? I'm interested, is it stable enough to host long running state size intensive job. With EMR i have all the config, HA zookeeper part handled by

Re: Savepoint Location from Flink REST API

2020-04-02 Thread Robert Metzger
Good catch!. Yes, you can add this to FLINK-16696. On Wed, Apr 1, 2020 at 10:59 PM Aaron Langford wrote: > All, it looks like the actual return structure from the API is: > > 1. Success > >> { >> "status": { >> "id": "completed" >> }, >> *"operation"*: { >> "location": "string" >>

Questions regarding Key Managed state

2020-04-02 Thread KristoffSC
Hi I have few question regarding Flink's state. Lets say we have: Case 1. stream.keybBy(...).process(myProcessFunction).parallelism(3). MyProcessFucntion uses a managed state (mapState, ListState etc). I'm using state checkpoints. Flink will redistribute events across 3 instances of myProcessF

Re: Correct way to e2e test a Flink application?

2020-04-02 Thread Robert Metzger
Hey Laurent, Flink developed an internal framework for executing end to end tests from Java. Here's an example of such a test: https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-metrics-availability-test/src/test/java/org/pache/flink/metrics/tests/MetricsAvailabilityITCase.jav

Re: Reinterpreting a pre-partitioned data stream as keyed stream

2020-04-02 Thread KristoffSC
Hi, sorry for a long wait. Answering our questions: 1 - yes 2 - thx 3 - rigth, understood 4 - well, in general I want to understand how this works. To be able in future to modify my job, for example extracting cpu heavy operators to separate tasks. Actually in my job some of my operators are ch

Re: Testing RichAsyncFunction with TestHarness

2020-04-02 Thread KristoffSC
Thanks, I would suggest adding my "tutorial" about using testHarnes for AsynOperators, to the documentation. Or maybe build something based on this use case, that could be helpful for others in the future :) Thanks, Krzysztof -- Sent from: http://apache-flink-user-mailing-list-archive.2336050