FileInputFormat that processes files in chronological order

2019-04-26 Thread Sergei Poganshev
Given a directory with input files of the following format: /data/shard1/file1.json /data/shard1/file2.json /data/shard1/file3.json /data/shard2/file1.json /data/shard2/file2.json /data/shard2/file3.json Is there a way to make FileInputFormat with parallelism 2 split processing by "shard" (folder

What are savepoint state manipulation support plans

2019-03-27 Thread Sergei Poganshev
What are the plans to support savepoint state manipulation with batch jobs natively in core Flink? I've tried using the bravo tool [1]. It's pretty good at reading savepoints, but writing seems hacky. For example I wonder what exactly happens with the following lines: val newOpState = writer.writ

Will state TTL support event time cleanup in 1.8?

2019-03-13 Thread Sergei Poganshev
Do improvements introduced in https://issues.apache.org/jira/browse/FLINK-10471 add support for event time TTL?

Iterations and back pressure problem

2018-12-24 Thread Sergei Poganshev
We've tried using iterations feature and in case of significant load the job sometimes stalls and stops processing events due to high back pressure both in tasks that produces records for iteration and all the other inputs to this task. It looks like a back pressure loop the task can't handle all t

Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment

2018-12-12 Thread Sergei Poganshev
When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon visiting Flink UI I can see no metrics and there are WARN messages in jobmanager's log: [flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor flink-metrics-akka.remote.default-remote-dispatcher-3 - Association wit

flink-s3-fs-presto:1.7.0 is missing shaded com/facebook/presto/hadoop

2018-12-06 Thread Sergei Poganshev
When I try to configure checkpointing using Presto in 1.7.0 the following exception occurs: java.lang.NoClassDefFoundError: org/apache/flink/fs/s3presto/shaded/com/facebook/presto/hadoop/HadoopFileStatus at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem.directory(P

Will Zookeeper HA work when the cluster is run in standalonejob mode?

2018-11-29 Thread Sergei Poganshev
If I run the clustiner in "standalonejob" mode (by providing the job arguments to the job manager upon starting it) and configure HA using Zookeeper will the job restore correctly after the job manager restarts with the same "standalonejob" arguments? Will restart the job (due to job arguments pas

Continue batch job with streaming job

2018-10-29 Thread Sergei Poganshev
Is there a way to make a checkpoint/savepoint after the batch job has finished and then run the job in a streaming mode with state that has been initialized in batch mode? Or more generally speaking, what are the battle-tested solutions to "job initialization" problem, especially when there are te