Re: Unable to start Flink HA cluster with Zookeeper

2018-08-22 Thread Dawid Wysakowicz
Hi, It will use HA settings as long as you specify the high-availability: zookeeper. The jobmanager.rpc.adress is used by the jobmanager as a binding address. You can verify it by starting two jobmanagers and then killing the leader. Best, Dawid On Tue, 21 Aug 2018 at 17:46, mozer wrote: > Yeah,

[ANNOUNCE] Weekly community update #34

2018-08-22 Thread Till Rohrmann
Dear community, this is the weekly community update thread #33. Please post any news and updates you want to share with the community to this thread. # Flink 1.5.3 has been released The community released Flink 1.5.3 [1]. It contains more than 20 fixes and improvements. The community recommends

Re: Some questions about the StreamingFileSink

2018-08-22 Thread Kostas Kloudas
Hi Benoit, Thanks for using the StreamingFileSink. My answers/explanations are inlined. In most of your observations, you are correct. > On Aug 21, 2018, at 11:45 PM, Benoit MERIAUX wrote: > > Hi, > > I have some questions about the new StreamingFileSink in 1.6. > > My usecase is pretty simpl

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
What I meant to ask was, does it do any harm to keep calling cancel-with-savepoint until the job exits? If the job is already cancelling with savepoint, I would assume that another cancel-with-savepoint call is just ignored. On Tue, Aug 21, 2018 at 1:18 PM Till Rohrmann wrote: > Just a small add

Re: Some questions about the StreamingFileSink

2018-08-22 Thread Gyula Fóra
Hi Kostas, Sorry for jumping in on this discussion :) What you suggest for finite sources and waiting for checkpoints is pretty ugly in many cases. Especially if you would otherwise read from a finite source (a file for instance) and want to end the job asap. Would it make sense to not discard a

State TTL in Flink 1.6.0

2018-08-22 Thread Juho Autio
First, I couldn't find anything about State TTL in Flink docs, is there anything like that? I can manage based on Javadocs & source code, but just wondering. Then to main main question, why doesn't the TTL support event time, and is there any sensible use case for the TTL if the streaming charater

Re: State TTL in Flink 1.6.0

2018-08-22 Thread Chesnay Schepler
Just a quick note for the docs: https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#state-time-to-live-ttl On 22.08.2018 10:53, Juho Autio wrote: First, I couldn't find anything about State TTL in Flink docs, is there anything like that? I can manage based on

Batch expired in FlinkKafkaProducer09

2018-08-22 Thread 祁明良
Hi All, When using FlinkKafkaProducer09 (Flink version 1.4.2), I’m facing an Kafka batch expired error when checkpoint starts. The error log is attached below. Here is what I have investigated: 1. The error only and always occurs when checkpoint starts. 2. The error seems not related to flushOnC

Re: State TTL in Flink 1.6.0

2018-08-22 Thread Aljoscha Krettek
Hi Juho, The main motivation for the initial implementation of TTL was compliance with new GDPR rules. I.e. data cannot be accessible and must be dropped according to time in the real world, i.e. processing time. The behaviour you describe, with data being dropped if you keep a savepoint for to

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Till Rohrmann
Calling cancel-with-savepoint multiple times will trigger multiple savepoints. The first issued savepoint will complete first and then cancel the job. Thus, the later savepoints might complete or not depending on the correct timing. Since savepoint can flush results to external systems, I would rec

kafka consumer can not auto commit

2018-08-22 Thread 远远
I find kafka consumer can not auto commit, when I test kudu async client with flink async io today. - i do not enable checkpoint, and with procress time. - the consumer strategy that i set in connector is: setStartFromEarliest() the consumer config printed in console as follow: auto.commit.interv

Re: Some questions about the StreamingFileSink

2018-08-22 Thread Benoit MERIAUX
Thanks for the detailed answer. The actual behavior is correct and due to the legacy which do not make a difference between success and failure when closing the sink. So the workaround is to use a short bucket interval to commit the last received data and wait for the next checkpoint (how do I do i

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Juho Autio
I see, thanks. Looks like it's better for us to switch to triggering savepoint & cancel separately. On Wed, Aug 22, 2018 at 1:26 PM Till Rohrmann wrote: > Calling cancel-with-savepoint multiple times will trigger multiple > savepoints. The first issued savepoint will complete first and then canc

Re: Semantic when table joins table from window

2018-08-22 Thread Hequn Cheng
Hi Henry, You can increase the retention time to make sure all data you want won't be expired. As for incremental, I think we can sink results into a kv storage, say hbase. The hbase table contains a total and latest data set you want so that you don't need to flush again. Would it be satisfy your

Re: Job Manager killed by Kubernetes during recovery

2018-08-22 Thread Bruno Aranda
Actually, I have found the issue. It was a simple thing, really, once you know it of course. It was caused by the livenessProbe kicking in too early. For a Flink cluster with several jobs, the default 30 seconds I was using (after using the Flink helm chart in the examples) was not enough to let t

Re: Job Manager killed by Kubernetes during recovery

2018-08-22 Thread Till Rohrmann
Great to hear that you've resolved the problem and thanks for sharing the solution. This will help others who might run into a similar problem. Cheers, Till On Wed, Aug 22, 2018, 16:14 Bruno Aranda wrote: > Actually, I have found the issue. It was a simple thing, really, once you > know it of c

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-22 Thread mozer
Thanks for the info, I have managed to launch a HA cluster with adding rpc.address for all job managers. But it did not work with start-cluster.sh, I had to add one by one. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Kryo Serialization Issue

2018-08-22 Thread Darshan Singh
Hi, I am using a map function on a data stream which has 1 column i.e. a json string. Map function simply uses Jackson mapper and convert the String to ObjectNode and also assign key based on one of the value in Object node. The code seems to work fine for 2-3 minutes as expected and then suddenl

Re: Implement Joins with Lookup Data

2018-08-22 Thread Harshvardhan Agrawal
Hi Hequn, We considered that but unfortunately we have a lot of reference data and we would need enormous amount of memory to hold the data. As a proof of concept I had added a Guava cache and that did improve performance but then it can't hold all of our reference data. We have a lot of use cases

Re: kafka consumer can not auto commit

2018-08-22 Thread vino yang
Hi zhao, Can you explain what version of Kafka connector you are using? Thanks, vino. 远远 于2018年8月22日周三 下午6:37写道: > I find kafka consumer can not auto commit, when I test kudu async client > with flink async io today. > - i do not enable checkpoint, and with procress time. > - the consumer stra

Count sliding window does not work as expected

2018-08-22 Thread Soheil Pourbafrani
Hi, I need some sliding windowing strategy that fills the window with the count of 400 and for every 100 incoming data, process the last 400 data. For example, suppose we have a data stream of count 16. For count window of 400 and sliding of 100, I expect it output 1597 stream: 16 - 400 =