Re: Best practice for creating/restoring savepoint in standalone k8 setup

2022-07-05 Thread jonas eyob
vepoint dir is >> useful in most scenarios, but in some cases the _metadata may not be >> completed). >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ >> >> Best, >> Weihua >> >> >> On Tue, Jul 5, 2022 at

Best practice for creating/restoring savepoint in standalone k8 setup

2022-07-05 Thread jonas eyob
Hi! We are running a Standalone job on Kubernetes using application deployment mode, with HA enabled. We have attempted to automate how we create and restore savepoints by running a script for generating a savepoint (using k8 preStop hook) and another one for restoring from a savepoint (located i

Re: Log4j2 configuration

2022-02-15 Thread jonas eyob
o your .xml > file. > 2) > Have you made modifications to the distribution (e.g., removing other > logging jars from the lib directory)? > Are you using application mode, or session clusters? > > On 15/02/2022 16:41, jonas eyob wrote: > > Hey, > > We are depl

Log4j2 configuration

2022-02-15 Thread jonas eyob
Hey, We are deploying our Flink Cluster on a standalone Kubernetes with the longrunning job written in scala. We recently upgraded our Flink cluster from 1.12 to 1.14.3 - after which we started seeing a few problems related to logging which I have been struggling to fix for the past days). Relate

Re: Cannot consum from Kinesalite using FlinkKinesisConsumer

2021-12-04 Thread jonas eyob
InMillis" which may > be a misconfiguration on my setup, but with STREAM_INITIAL_POSITION = > "TRIM_HORIZON" I was able to consume events from the stream. > > This was with 1.14.0 of the Kinesis Flink connector. > > Kind regards, > Mika > > > On 02.12.202

Cannot consum from Kinesalite using FlinkKinesisConsumer

2021-12-02 Thread jonas eyob
:1.14.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) ~[flink-runtime-1.14.0.jar:1.14.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[flink-runtime-1.14.0.jar:1.14.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: Checkpoints aborted - Job is not in state RUNNING but FINISHED

2021-11-26 Thread jonas eyob
state.checkpoints.num-retained: 1 # Maximum number of completed > checkpoints to retain > > # Fault tolerance > restart-strategy: fixed-delay > restart-strategy.fixed-delay.delay: 10 s > restart-strategy.fixed-delay.attempts: 3 # try n times before job is > considered failed > > From what I can see the job is still running, and the checkpointing keeps > failing. > After finding this (https://issues.apache.org/jira/browse/FLINK-2491) I > updated the default parallelism from 2 -> 1 since our current kinesis steam > consists of 1 shard. But problem persists. > > Any ideas? > > Jonas > > -- *Med Vänliga Hälsningar* *Jonas Eyob*

Checkpoints aborted - Job is not in state RUNNING but FINISHED

2021-11-25 Thread jonas eyob
Hi all, I have been struggling with this issue for a couple of days now. Checkpointing appears to fail as the Task Source ( kinesis stream in this case) appears to be in a FINISHED state. Excerpt from Jobmanager logs: 2021-11-25 12:52:00,479 INFO org.apache.flink.runtime.executiongraph.Executio

High availability - leader election not working?

2021-09-01 Thread jonas eyob
{configMapName='thoros--jobmanager-leader'}. 2021-08-31 15:00:02,784 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - All 0 checkpoints found are already downloaded. 2021-08-31 15:00:02,784 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - No checkpoint found during restore. -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
"s3:ListBucket", >> "s3:Get*", >> "s3:Put*", >> "s3:Delete*" >> ], >> "Resource": [ >> "arn:aws:s3:::-flink-dev", >>

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
configuration parameter. > > Best, > Matthias > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#configure-access-credentials > > On Thu, Aug 26, 2021 at 3:43 PM jonas eyob wrote: > >> Hey, >> >> I am setting up HA on

Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
nk.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:392) ~[flink-dist_2.12-1.12.5.jar:1.12.5] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJob$29(Dispatcher.java:971) ~[flink-dist_2.12-1.12.5.jar:1.12.5] at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedConsumer$3(FunctionUtils.java:93) ~[flink-dist_2.12-1.12.5.jar:1.12.5] ... 27 more -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-26 Thread jonas eyob
est, Jonas Den ons 25 aug. 2021 kl 21:17 skrev Thms Hmm : > Can you check what is the output of those commands > > $ id > $ ls -la $FLINK_HOME/plugins/s3-fs-presto/ > > > jonas eyob schrieb am Mi. 25. Aug. 2021 um 16:17: > >> The exception is showing up both in TM

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
as needed. Den ons 25 aug. 2021 kl 11:37 skrev David Morávek : > Hi Jonas, > > Where does the exception pop-up? In job driver, TM, JM? You need to make > sure that the plugin folder is setup for all of them, because they all may > need to access s3 at some point. > > Best

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
w I would check it? Den ons 25 aug. 2021 kl 10:12 skrev Thms Hmm : > Hey Jonas, > you could also try to use the ´s3p://´ scheme to directly specify that > presto should be used. Also check if your user that executes the process is > able to read the jars. > > Am Mi., 25. Aug.

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
K-23961 [2] so we provide more descriptive warning for > this issue next time ;) > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#example-configuration > [2] https://issues.apache.org/jira/browse/FLINK-23961 > > Best, > D. &

NullPointerException when using KubernetesHaServicesFactory

2021-08-24 Thread jonas eyob
Hey, I've been struggling with this problem now for some days - driving me crazy. I have a standalone kubernetes Flink (1.12.5) using an application cluster mode approach. *The problem* I am getting a NullPointerException when specifying the FQN of the Kubernetes HA Service Factory class i.e. *or