There are two things that you may want to try:
- Configure 'state.backend.rocksdb.memory.managed' to true. That tells
RocksDBStateBackend to try to limit its memory consumption within the
managed memory size.
- Increase JVM Overhead Memory Size (via
'taskmanager.memory.jvm-overhead.[min|max|fractio
Hi Xintong,
Thanks very much for the response. Let me check out the new UI on flink
1.12.
The reason I asked this question is because our flink cluster on k8s shows
a container_working_set_bytes(used by OOMkiller) to be > 3Gb. I assume that
the used(heap, non-heap) values on the UI are correct. If
Thanks for the discussion, JING ZHANG. I like the first proposal since it
is simple and consistent with dataStream API. It is helpful to add more
docs about the special late case in WindowAggregate. Also, I expect the
more flexible emit strategies later.
Jark Wu 于2021年7月2日周五 上午10:33写道:
> Sorry,
Hi Xiuming,
+1 on your idea.
BTW, Flink also provides a debug tool to track the latency of records
travelling through the system[1]. But you should note the following issue
if enable the latency tracking.
(1) It's a tool for debugging purposes because enabling latency metrics can
significantly impa
Sorry, I made a typo above. I mean I prefer proposal (1) that
only needs to set `table.exec.emit.allow-lateness` to handle late events.
`table.exec.emit.late-fire.delay` can be optional which is 0s by default.
`table.exec.state.ttl` will not affect window state anymore, so window state
is still cle
Hi Sudharsan,
The non-heap max is decided by JVM automatically and is not controlled by
Flink. Moreover, it doesn't mean Flink will use up to that size of non-heap
memory.
These metrics are fetched directly from JVM and do not correspond well with
Flink's memory configurations, which very often l
Another side question, Shall we add metric to cover the complete restarting
time (phase 1 + phase 2)? Current metric jm.restartingTime only covers
phase 1. Thanks!
Best
Lu
On Thu, Jul 1, 2021 at 12:09 PM Lu Niu wrote:
> Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
>
> I did
Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
I did another test yesterday. In this test, I intentionally throw exception
from the source operator:
```
if (runtimeContext.getIndexOfThisSubtask() == 1
&& errorFrenquecyInMin > 0
&& System.currentTimeMillis() - last
Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
I did another test yesterday. In this test, I intentionally throw exception
from the source operator:
```
if (runtimeContext.getIndexOfThisSubtask() == 1
&& errorFrenquecyInMin > 0
&& System.currentTimeMillis() - last
Hi,
On my flink setup, I have taskmanager.memory.process.size set to 2536M. I
expect all the memory components shown on the UI to add up to this number.
However, I don't see this.
I have flink managed memory: 811Mb
JVM heap max: 886Mb
JVM non-heap max: 744Mb
Direct memory: 204Mb
This adds
Hello everyone,
I'm testing a custom Kubernetes operator that should fulfill some specific
requirements I have for Flink. I know of this WIP project:
https://github.com/wangyang0918/flink-native-k8s-operator
I can see that it uses some classes that aren't publicly documented, and I
believe it
Hi Shilpa,
I've confirmed that "recovered" jobs are not compatible between minor
versions of Flink (e.g., between 1.12 and 1.13). I believe the issue is
that the session cluster was upgraded to 1.13 without first stopping the
jobs running on it.
If this is the case, the workaround is to stop each
Hi Stephan,
Thanks for the detailed explanation! This really helps understand all this
better. Appreciate your help!
Regards,
Sonam
From: Stephan Ewen
Sent: Wednesday, June 30, 2021 3:56:22 AM
To: Sonam Mandal ; user@flink.apache.org
Cc: matth...@ververica.com
A quick addition, I think with FLINK-23202 it should now also be possible
to improve the heartbeat mechanism in the general case. We can leverage the
unreachability exception thrown if a remote target is no longer reachable
to mark an heartbeat target as no longer reachable [1]. This can then be
co
Thanks Jing for bringing up this topic,
The emit strategy configs are annotated as Experiential and not public on
documentations.
However, I see this is a very useful feature which many users are looking
for.
I have posted these configs for many questions like "how to handle late
events in SQL".
T
Hi Zhu,
Does is mean our upgrades are going to fail and the jobs are not backward
compatible?
I did verify the job itself is built using 1.13.0.
Is there a workaround for this?
Thanks,
Shilpa
On Wed, Jun 30, 2021 at 11:14 PM Zhu Zhu wrote:
> Hi Shilpa,
>
> JobType was introduced in 1.13. So
Since you are deploying Flink workloads on Yarn, the Flink ResourceManager
should get the container
completion event after the heartbeat of Yarn NM->Yarn RM->Flink RM, which
is 8 seconds by default.
And Flink ResourceManager will release the dead TaskManager container once
received the completion e
Thanks for the pointer. let me try upgrading the flink
On Thu, Jul 1, 2021 at 5:29 PM Yun Tang wrote:
> Hi Tao,
>
> I run your program with Flink-1.12.1 and found the problem you described
> really existed. And things would go normal if switching to Flink-1.12.2
> version.
>
> After dig into the
Hi Tao,
I run your program with Flink-1.12.1 and found the problem you described really
existed. And things would go normal if switching to Flink-1.12.2 version.
After dig into the root cause, I found this is caused by a fixed bug [1]: If a
legacy source task fails outside of the legacy thread,
The analysis of Gen is correct. Flink currently uses its heartbeat as the
primary means to detect dead TaskManagers. This means that Flink will take
at least `heartbeat.timeout` time before the system recovers. Even if the
cancellation happens fast (e.g. by having configured a low
akka.ask.timeout)
Hi Tao,
it looks like it should work considering that you have a sleep of 1 second
before each emission. I'm going to add Roman to this thread. Maybe, he has
sees something obvious which I'm missing.
Could you run the job with the log level set to debug and provide the logs
once more? Additionally,
> 1. how to specify the number of TaskManager?
> In batch mode, I tried to use (Max Parallelism / (cores per tm)), but it
> does not work. Number of TaskManager is muchlarger than (Max Parallelism /
> cores per tm).
It not the cores per tm, but the number of slots per tm. Please refer
to ta
22 matches
Mail list logo