date:20180513

Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-13 Thread Tony Wei

Hi all, Recently, my flink job met a problem that caused the job failed and restarted. The log is list this screen snapshot or this ``` 2018-05-11 13:21:04,582 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 610

Re: Task Manager detached under load

2018-05-13 Thread Steven Wu

Till, thanks for the clarification. yes, that situation is undesirable either. In our case, restarting jobmanager could also recover the job from akk association lock-out. it was actually the issue (high GC pause) on jobmanager side that caused the akka failure. do we have sth like "jobmanager.e

Re: Best way to clean-up states in memory

2018-05-13 Thread ashish pok

Hi Till, Thanks for getting back. I am sure that will fix the issue but I feel like that would potentially mask an issue. I have been going back and forth with Fabian on a use case where for some of our highly transient datasets, it might make sense to just use memory based state (except of cou

Re: Substasks - Uneven allocation

2018-05-13 Thread Till Rohrmann

Hi Pedro, currently, Flink does not allow you to explicitly control the scheduling strategy at such a fine grained level. The idea behind this is to achieve location transparency and to make the scheduling easier. However, there are some tricks you could play depending on the actual job. For exam

Re: Confusing debug level log output with Flink 1.5

2018-05-13 Thread Till Rohrmann

Hi Ken, sorry for the long response time. You're right that the log output is too verbose and we should not log the full stack traces in the happy case. I've responded onto the PR thread [1] and suggested to only log the exception message for information purposes. I hope that this will solve the

Re: Task Manager detached under load

2018-05-13 Thread Till Rohrmann

Hi Steven, the reason why we did not turn on this feature per default was that in case of a true JM failure, all of the TMs will think that they got quarantined which triggers their shut down. Depending on how many container restarts you have left on Yarn, for example, this can lead to a situation

Re: Best way to clean-up states in memory

2018-05-13 Thread Till Rohrmann

Hi Ashish, have you tried using Flink's RocksDBStateBackend? If your job accumulates state exceeding the available main memory, then you have to use a state backend which can spill to disk. The RocksDBStateBackend offers you exactly this functionality. Cheers, Till On Mon, Apr 30, 2018 at 3:54 P

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

2018-05-13 Thread Till Rohrmann

Hi Derek, given that you've started the different Flink cluster components all with the same HA enabled configuration, the TMs should be able to connect to jm1 after you've killed jm0. The jobmanager.rpc.address should not be used when HA mode is enabled. In order to get to the bottom of the desc

Re: YARN per-job cluster reserves all of remaining memory in YARN

2018-05-13 Thread Till Rohrmann

Hi Dongwon, Fabian is right with his analysis of the problem. Currently, the new ResourceManager implementation will start a new TM per requested slot independent of the number of configured slots. This behavior can cause that the cluster requests too many resources when a job is started. However,

Re: A strange exception while consumption using a multi topic Kafka Connector

2018-05-13 Thread Ted Yu

FLINK-9349 was logged. FYI On Sat, May 12, 2018 at 7:52 AM, Ted Yu wrote: > I took a look at ./flink-connectors/flink-connector-kafka-0.9/src/main/ > java/org/apache/flink/streaming/connectors/kafka/ > internal/Kafka09Fetcher.java > > It seems the List subscribedPartitionStates was being modifi

Default zookeeper

2018-05-13 Thread miki haiat

When downloading the the flink source in order to run it local thire is a zookeper script and start-zookeeper-quorum script . Is thire any difference between the default zookeeper installation lets say in Ubuntu and the zookeeper that come with flink ? thanks, MIki

Re: Batch writing from Flink streaming job

2018-05-13 Thread Jörn Franke

If you want to write in batches from a streaming source you always will need some state ie a state database (here a NoSQL database such as a key value store makes sense). Then you can grab the data at certain points in time and convert it to Avro. You need to make sure that the state is logicall

Batch writing from Flink streaming job

2018-05-13 Thread Padarn Wilson

Hi all, I am writing some some jobs intended to run using the DataStream API using a Kafka source. However we also have a lot of data in Avro archives (of the same Kafka source). I would like to be able to run the processing code over parts of the archive so I can generate some "example output".

Re: How to broadcast messages to all task manager instances in cluster?

2018-05-13 Thread Di Tang

Thanks Piotr for the response. I have many data streams dependant on the configuration by getting value from static variables in a class. The way the configuration change works is to change the static variables' value in the class. Since each task manager only has one JVM process, as long as the me

Question about the behavior of TM when it lost the zookeeper client session in HA mode

Re: Task Manager detached under load

Re: Best way to clean-up states in memory

Re: Substasks - Uneven allocation

Re: Confusing debug level log output with Flink 1.5

Re: Task Manager detached under load

Re: Best way to clean-up states in memory

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

Re: YARN per-job cluster reserves all of remaining memory in YARN

Re: A strange exception while consumption using a multi topic Kafka Connector

Default zookeeper

Re: Batch writing from Flink streaming job

Batch writing from Flink streaming job

Re: How to broadcast messages to all task manager instances in cluster?

14 matches

Site Navigation

Mail list logo

Footer information