vepoint dir is
>> useful in most scenarios, but in some cases the _metadata may not be
>> completed).
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
>>
>> Best,
>> Weihua
>>
>>
>> On Tue, Jul 5, 2022 at
Hi!
We are running a Standalone job on Kubernetes using application deployment
mode, with HA enabled.
We have attempted to automate how we create and restore savepoints by
running a script for generating a savepoint (using k8 preStop hook) and
another one for restoring from a savepoint (located i
o your .xml
> file.
> 2)
> Have you made modifications to the distribution (e.g., removing other
> logging jars from the lib directory)?
> Are you using application mode, or session clusters?
>
> On 15/02/2022 16:41, jonas eyob wrote:
>
> Hey,
>
> We are depl
Hey,
We are deploying our Flink Cluster on a standalone Kubernetes with the
longrunning job written in scala.
We recently upgraded our Flink cluster from 1.12 to 1.14.3 - after which we
started seeing a few problems related to logging which I have been
struggling to fix for the past days).
Relate
InMillis" which may
> be a misconfiguration on my setup, but with STREAM_INITIAL_POSITION =
> "TRIM_HORIZON" I was able to consume events from the stream.
>
> This was with 1.14.0 of the Kinesis Flink connector.
>
> Kind regards,
> Mika
>
>
> On 02.12.202
:1.14.0]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
~[flink-runtime-1.14.0.jar:1.14.0]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
~[flink-runtime-1.14.0.jar:1.14.0]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
--
*Med Vänliga Hälsningar*
*Jonas Eyob*
state.checkpoints.num-retained: 1 # Maximum number of completed
> checkpoints to retain
>
> # Fault tolerance
> restart-strategy: fixed-delay
> restart-strategy.fixed-delay.delay: 10 s
> restart-strategy.fixed-delay.attempts: 3 # try n times before job is
> considered failed
>
> From what I can see the job is still running, and the checkpointing keeps
> failing.
> After finding this (https://issues.apache.org/jira/browse/FLINK-2491) I
> updated the default parallelism from 2 -> 1 since our current kinesis steam
> consists of 1 shard. But problem persists.
>
> Any ideas?
>
> Jonas
>
>
--
*Med Vänliga Hälsningar*
*Jonas Eyob*
Hi all,
I have been struggling with this issue for a couple of days now.
Checkpointing appears to fail as the Task Source ( kinesis stream in this
case) appears to be in a FINISHED state.
Excerpt from Jobmanager logs:
2021-11-25 12:52:00,479 INFO
org.apache.flink.runtime.executiongraph.Executio
{configMapName='thoros--jobmanager-leader'}.
2021-08-31 15:00:02,784 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
All 0 checkpoints found are already downloaded.
2021-08-31 15:00:02,784 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - No
checkpoint found during restore.
--
*Med Vänliga Hälsningar*
*Jonas Eyob*
"s3:ListBucket",
>> "s3:Get*",
>> "s3:Put*",
>> "s3:Delete*"
>> ],
>> "Resource": [
>> "arn:aws:s3:::-flink-dev",
>>
configuration parameter.
>
> Best,
> Matthias
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#configure-access-credentials
>
> On Thu, Aug 26, 2021 at 3:43 PM jonas eyob wrote:
>
>> Hey,
>>
>> I am setting up HA on
nk.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:392)
~[flink-dist_2.12-1.12.5.jar:1.12.5]
at
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJob$29(Dispatcher.java:971)
~[flink-dist_2.12-1.12.5.jar:1.12.5]
at
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedConsumer$3(FunctionUtils.java:93)
~[flink-dist_2.12-1.12.5.jar:1.12.5]
... 27 more
--
*Med Vänliga Hälsningar*
*Jonas Eyob*
est,
Jonas
Den ons 25 aug. 2021 kl 21:17 skrev Thms Hmm :
> Can you check what is the output of those commands
>
> $ id
> $ ls -la $FLINK_HOME/plugins/s3-fs-presto/
>
>
> jonas eyob schrieb am Mi. 25. Aug. 2021 um 16:17:
>
>> The exception is showing up both in TM
as needed.
Den ons 25 aug. 2021 kl 11:37 skrev David Morávek :
> Hi Jonas,
>
> Where does the exception pop-up? In job driver, TM, JM? You need to make
> sure that the plugin folder is setup for all of them, because they all may
> need to access s3 at some point.
>
> Best
w I
would check it?
Den ons 25 aug. 2021 kl 10:12 skrev Thms Hmm :
> Hey Jonas,
> you could also try to use the ´s3p://´ scheme to directly specify that
> presto should be used. Also check if your user that executes the process is
> able to read the jars.
>
> Am Mi., 25. Aug.
K-23961 [2] so we provide more descriptive warning for
> this issue next time ;)
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#example-configuration
> [2] https://issues.apache.org/jira/browse/FLINK-23961
>
> Best,
> D.
&
Hey, I've been struggling with this problem now for some days - driving me
crazy.
I have a standalone kubernetes Flink (1.12.5) using an application cluster
mode approach.
*The problem*
I am getting a NullPointerException when specifying the FQN of the
Kubernetes HA Service Factory class
i.e.
*or
17 matches
Mail list logo