graceful shutdown of taskmanager

2019-02-06 Thread Bernd.Winterstein
Hi Is there a possibility to gracefully remove a taskmanager from a running cluster? My idea would be to trigger affected jobs to restart via a savepoint on the remaining taskmanagers. When the taskmanager is idle it can be stopped without jobs falling back to an older checkpoint. Regards Bern

AW: number of files in checkpoint directory grows endlessly

2018-12-10 Thread Bernd.Winterstein
Hi Andrey, I checked our code again. We are indeed using timers for dynamic routing updates. It gets triggered every five minutes! This must be the reason for the five minutes pattern in the remaining .sst files. Do I understand it correctly, that the files remain, because they are too small f

AW: number of files in checkpoint directory grows endlessly

2018-12-06 Thread Bernd.Winterstein
Seems that some file deletion is disabled by default. There are some log entries in the file Von: Andrey Zagrebin [mailto:and...@data-artisans.com] Gesendet: Donnerstag, 6. Dezember 2018 12:07 An: Winterstein, Bernd Cc: Yun Tang; Kostas Kloudas; user; Stefan Richter; Till Rohrmann; Stephan Ewen B

AW: number of files in checkpoint directory grows endlessly

2018-12-04 Thread Bernd.Winterstein
All calls to createColumnFamily were exchanged by createColumnFamilyWithTtl private Tuple2> tryRegisterKvStateInformation( StateDescriptor stateDesc, TypeSerializer namespaceSerializer, @Nullable StateSnapshotTransformer snapshotTransformer)

AW: number of files in checkpoint directory grows endlessly

2018-12-04 Thread Bernd.Winterstein
Sorry for the late answer. I haven’t been in the office. The logs show no problems. The files that remain in the shared subfolder are almost all 1121 bytes. Except the files from the latest checkpoint (30 files for all operators) For each historic checkpoint six files remain (parallelism is 6) c

AW: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Bernd.Winterstein
Hi We use Flink 1..6.2. As for the checkpoint directory there is only one chk-xxx directory. Therefore if would expect only one checkpoint remains. The value of 'state.checkpoints.num-retained’ is not set explicitly. The problem is not the number of checkpoints but the number of files in the "sh

number of files in checkpoint directory grows endlessly

2018-11-29 Thread Bernd.Winterstein
I have a flink job running with the following settings: * CheckpointingMode.EXACTLY_ONCE * RocksDB backend (Modified with TtlDB usage) * CheckpointConfig.ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION * 60 sec interval * Asnyc snapshots * Incremental checkp

Kafka offset behaviour when restarting job from savepoint

2018-10-31 Thread Bernd.Winterstein
Hi Let's say we have a job which reads from a Flink kafka source (FlinkKafkaConsumer011) and commits the offsets on each checkpoint. When the job is started from an older savepoint, will it take the latest offsets stored in Kafka for the consumer group or are the offsets taken from the savepoin

Consumer offsets not visible in Kafka

2018-04-19 Thread Bernd.Winterstein
Hi We are using Kafka 0.11 consumers with Flink 1.4 and Confluence Kafka 4.0.0. Checkpointing is enabled and enableCommitOnCheckpoints ist set to true. However there are no offsets from Flink jobs visible in Kafka when checking with the kafka-consumer-groups tool. Any ideas Regards Bernd

Manual Classloading on the job does not work

2017-12-08 Thread Bernd.Winterstein
Hi I have problems loading resources via the UserCodeClassloader as described in: https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#manual-classloading-in-the-job I have tried in several scenarios to load properties or wsdl files via a RichMapFunct

AW: Blob server not working with 1.4.0.RC2

2017-12-05 Thread Bernd.Winterstein
Hi Nico I think there were changes in the default port fort the BLOB server. I missed the fact that the Kubernetes configuration was still exposing 6124 for the JobManager BLOB server. Thanks Bernd -Ursprüngliche Nachricht- Von: Nico Kruber [mailto:n...@data-artisans.com] Gesendet: Mont

Blob server not working with 1.4.0.RC2

2017-12-04 Thread Bernd.Winterstein
Hi Since we switched to Release 1.4 the taskmanagers are unable to download blobs from the jobmanager. The taskmanager registration still works. Netstat on jobmanager shows open ports at 6123 and 5. But a telnet connection from taskmanager to jobmanager on port 5 times out. Any ideas are