Hi all,
Recently, my flink job met a problem that caused the job failed and
restarted.
The log is list this screen snapshot
or this
```
2018-05-11 13:21:04,582 WARN
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client
session timed out, have not heard from server in 610
Till,
thanks for the clarification. yes, that situation is undesirable either.
In our case, restarting jobmanager could also recover the job from akk
association lock-out. it was actually the issue (high GC pause) on
jobmanager side that caused the akka failure.
do we have sth like "jobmanager.e
Hi Till,
Thanks for getting back. I am sure that will fix the issue but I feel like that
would potentially mask an issue. I have been going back and forth with Fabian
on a use case where for some of our highly transient datasets, it might make
sense to just use memory based state (except of cou
Hi Pedro,
currently, Flink does not allow you to explicitly control the scheduling
strategy at such a fine grained level. The idea behind this is to achieve
location transparency and to make the scheduling easier.
However, there are some tricks you could play depending on the actual job.
For exam
Hi Ken,
sorry for the long response time.
You're right that the log output is too verbose and we should not log the
full stack traces in the happy case. I've responded onto the PR thread [1]
and suggested to only log the exception message for information purposes. I
hope that this will solve the
Hi Steven,
the reason why we did not turn on this feature per default was that in case
of a true JM failure, all of the TMs will think that they got quarantined
which triggers their shut down. Depending on how many container restarts
you have left on Yarn, for example, this can lead to a situation
Hi Ashish,
have you tried using Flink's RocksDBStateBackend? If your job accumulates
state exceeding the available main memory, then you have to use a state
backend which can spill to disk. The RocksDBStateBackend offers you exactly
this functionality.
Cheers,
Till
On Mon, Apr 30, 2018 at 3:54 P
Hi Derek,
given that you've started the different Flink cluster components all with
the same HA enabled configuration, the TMs should be able to connect to jm1
after you've killed jm0. The jobmanager.rpc.address should not be used when
HA mode is enabled.
In order to get to the bottom of the desc
Hi Dongwon,
Fabian is right with his analysis of the problem. Currently, the new
ResourceManager implementation will start a new TM per requested slot
independent of the number of configured slots. This behavior can cause that
the cluster requests too many resources when a job is started. However,
FLINK-9349 was logged.
FYI
On Sat, May 12, 2018 at 7:52 AM, Ted Yu wrote:
> I took a look at ./flink-connectors/flink-connector-kafka-0.9/src/main/
> java/org/apache/flink/streaming/connectors/kafka/
> internal/Kafka09Fetcher.java
>
> It seems the List subscribedPartitionStates was being modifi
When downloading the the flink source in order to run it local thire is a
zookeper script and start-zookeeper-quorum script .
Is thire any difference between the default zookeeper installation lets say
in Ubuntu and the zookeeper that come with flink ?
thanks,
MIki
If you want to write in batches from a streaming source you always will need
some state ie a state database (here a NoSQL database such as a key value store
makes sense). Then you can grab the data at certain points in time and convert
it to Avro. You need to make sure that the state is logicall
Hi all,
I am writing some some jobs intended to run using the DataStream API using
a Kafka source. However we also have a lot of data in Avro archives (of the
same Kafka source). I would like to be able to run the processing code over
parts of the archive so I can generate some "example output".
Thanks Piotr for the response. I have many data streams dependant on the
configuration by getting value from static variables in a class. The way
the configuration change works is to change the static variables' value in
the class. Since each task manager only has one JVM process, as long as the
me
14 matches
Mail list logo