Given a directory with input files of the following format:
/data/shard1/file1.json
/data/shard1/file2.json
/data/shard1/file3.json
/data/shard2/file1.json
/data/shard2/file2.json
/data/shard2/file3.json
Is there a way to make FileInputFormat with parallelism 2 split processing
by "shard" (folder
What are the plans to support savepoint state manipulation with batch jobs
natively in core Flink?
I've tried using the bravo tool [1]. It's pretty good at reading
savepoints, but writing seems hacky. For example I wonder what exactly
happens with the following lines:
val newOpState = writer.writ
Do improvements introduced in
https://issues.apache.org/jira/browse/FLINK-10471 add support for event
time TTL?
We've tried using iterations feature and in case of significant load the
job sometimes stalls and stops processing events due to high back pressure
both in tasks that produces records for iteration and all the other inputs
to this task. It looks like a back pressure loop the task can't handle all
t
When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon
visiting Flink UI I can see no metrics and there are WARN messages in
jobmanager's log:
[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
flink-metrics-akka.remote.default-remote-dispatcher-3 - Association wit
When I try to configure checkpointing using Presto in 1.7.0 the following
exception occurs:
java.lang.NoClassDefFoundError:
org/apache/flink/fs/s3presto/shaded/com/facebook/presto/hadoop/HadoopFileStatus
at
org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem.directory(P
If I run the clustiner in "standalonejob" mode (by providing the job
arguments to the job manager upon starting it) and configure HA using
Zookeeper will the job restore correctly after the job manager restarts
with the same "standalonejob" arguments?
Will restart the job (due to job arguments pas
Is there a way to make a checkpoint/savepoint after the batch job has
finished and then run the job in a streaming mode with state that has been
initialized in batch mode?
Or more generally speaking, what are the battle-tested solutions to "job
initialization" problem, especially when there are te