I did a simple test on my
laptop, launching a docker container with cpu limit configured. Inside the
container, I can still see all my machine's cpus.
Thank you~
Xintong Song
On Wed, Aug 12, 2020 at 1:19 AM Bajaj, Abhinav
wrote:
> Hi,
>
>
>
> Reaching out to folks running Fl
ption
`taskmanager.host` for your task managers, see if that is reflected in the
metrics.
Thank you~
Xintong Song
On Wed, Aug 12, 2020 at 3:06 PM Nikola Hrusov wrote:
> Hello,
>
> After upgrading the docker image for flink to 1.11.1 from 1.9 the hostname
> of the taskmanagers reported to
Hi Vishwas,
According to the log, heap space is 13+GB, which looks fine.
Several reason might lead to the heap space OOM:
- Memory leak
- Not enough GC threads
- Concurrent GC starts too late
- ...
I would suggest taking a look at the GC logs.
Thank you~
Xintong Song
On Fri
Congratulations Dian~!
Thank you~
Xintong Song
On Thu, Aug 27, 2020 at 7:42 PM Jark Wu wrote:
> Congratulations Dian!
>
> Best,
> Jark
>
> On Thu, 27 Aug 2020 at 19:37, Leonard Xu wrote:
>
> > Congrats, Dian! Well deserved.
> >
> > Best
> > Le
you~
Xintong Song
On Mon, Aug 31, 2020 at 1:33 PM lec ssmi wrote:
> HI:
> Generally speaking, when we submitting the flink program, the number of
> taskmanager and the memory of each tn will be specified. And the smallest
> real execution unit of flink should be operator.
] the
cluster to allocate slots evenly across task managers.
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#tasks-and-operator-chains
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/job_scheduling
, thus separating the pipeline into
several slot sharing groups will not bring any benefit. If you are just
trying out with the slot sharing groups or preparing for later deploying
the execution to a distributed cluster, then there should be no problem.
Thank you~
Xintong Song
On Thu, Sep 10, 20
rs can write/remove the stored object. What if the previous owner
failed to release the lock (e.g., dead before releasing)? Would there be
any problem?
## HA storage > HA data clean up
If the ConfigMap is destroyed on `kubectl delete deploy `, how
are the HA dada retained?
Thank you~
Xintong So
do.
>
- Which Flink's kubernetes deployment are you using? The standalone or
native Kubernetes?
- Which cluster mode are you using? Job cluster, session cluster, or the
application mode?
Thank you~
Xintong Song
On Sat, Sep 19, 2020 at 1:22 AM Claude M wrote:
> Hello,
>
> I upgrad
t trust Flink's "Non-Heap" metrics. It is
practically helpless and misleading. The "Non-Heap" accounts for SOME of
the non-heap memory usage, but NOT ALL of them. The community is working on
a new set of metrics and Web UI for the task manager memory tuning.
Thank you~
Xinton
dump, we can look into it later.
Thank you~
Xintong Song
On Mon, Sep 21, 2020 at 9:37 PM Claude M wrote:
> Hi Xintong,
>
> Thanks for your reply. Here is the command output w/ the java.opts:
>
> /usr/local/openjdk-8/bin/java -Xms768m -Xmx768m -XX:+UseG1GC
> -XX:+Hea
Thanks for the input, Brain.
This looks like what we are looking for. The issue is fixed in 1.10.3,
which also matches this problem occurred in 1.10.2.
Maybe Claude can further confirm it.
Thank you~
Xintong Song
On Tue, Sep 22, 2020 at 10:57 AM Zhou, Brian wrote:
> Hi Xintong and Cla
that fixes
your problem.
Given that it could take weeks to reproduce your problem, I would suggest
to keep track of the native memory usage with jemalloc and jeprof. This
should provide direct information about which component is using extra
memory.
Thank you~
Xintong Song
On Tue, Sep 22
] and build your custom image (from the 1.0.2 image and
replace the flink distribution with the one you built).
Thank you~
Xintong Song
[1] https://github.com/apache/flink/tree/release-1.10
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/flinkDev/building.html
On Wed, S
g released,
see if we can do something about it.
Thank you~
Xintong Song
On Thu, Sep 24, 2020 at 6:35 PM Claude M wrote:
> I have 35 task managers, 1 slot on each. I'm running a total of 7 jobs in
> the cluster. All the slots are occupied. When you say that
al states in the rpc main thread. With FLINK-19241, this can be
achieved easily by delegating the work to the io executor.
Thank you~
Xintong Song
On Mon, Oct 12, 2020 at 12:44 PM Paul Lam wrote:
> Hi,
>
> After FLINK-13184 is implemented (even with Flink 1.11), occasionally
>
FYI, I just created FLINK-19568 for tracking this issue.
Thank you~
Xintong Song
[1] https://issues.apache.org/jira/browse/FLINK-19568
On Mon, Oct 12, 2020 at 2:18 PM Xintong Song wrote:
> Hi Paul,
>
> Thanks for reporting this.
>
> Indeed, Flink's RM currently p
No worries :)
Thank you~
Xintong Song
On Mon, Oct 12, 2020 at 2:48 PM Paul Lam wrote:
> Sorry for the misspelled name, Xintong
>
> Best,
> Paul Lam
>
> 2020年10月12日 14:46,Paul Lam 写道:
>
> Hi Xingtong,
>
> Thanks a lot for the pointer!
>
> It’s good to
intended to execute the tests locally, you can try the following
actions. I'm not sure whether that helps though.
- Try to add '-DfailIfNoTests=false' to your maven command.
- Execute the maven command with '-X' to print all the debug logs.
Thank you~
Xintong Song
On Tu
Would you be able to share the complete maven logs and the command? And
what is the maven version?
Thank you~
Xintong Song
On Wed, Oct 21, 2020 at 1:37 AM Dan Hill wrote:
> Hi Xintong!
>
> No changes. I tried -X and no additional log information is logged.
> -DfailIfNoTests=fa
n
logs.
- Quick question: which PR are you working on? By any chance you called
`System.exit()` in your codes?
Thank you~
Xintong Song
On Thu, Oct 22, 2020 at 5:59 AM Dan Hill wrote:
> Sure, here's a link
> <https://drive.google.com/file/d/13Q7h77zG-2vp7gJOke8QAzLtKLKIPuTf/view?usp=sh
3.6.3.
I'm not sure whether the maven version is related, but maybe you can try it
out with 3.2.5. And if it turns out worked, we may fire a issue at the
Apache Maven community.
Thank you~
Xintong Song
On Thu, Oct 22, 2020 at 12:31 PM Dan Hill wrote:
> 1) I don't see anything use
oices definitely matter a lot for this community. Either way, it would be
good to draw users attention to this discussion early.
Thank you~
Xintong Song
On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf wrote:
> Hi Robert,
>
> +1 to the plan you outlined. If we were to drop support in F
think it should be
fine.
Thank you~
Xintong Song
[1] https://issues.apache.org/jira/browse/FLINK-19665
On Sat, Oct 24, 2020 at 5:56 AM Dan Hill wrote:
> Changing down to maven 3.2 shows an error. It seems like I'm hitting
> flaky tests. I hit one error and then a different error
resource management improvements may not be ported to Mesos), while
keeping other components up-to-date (e.g., improvements from programming
APIs, operators, state backens, etc.)?
Thank you~
Xintong Song
On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:
early next month. It would be greatly
appreciated if you fork as experienced Flink on Mesos users can help with
verifying the release candidates.
Thank you~
Xintong Song
[1]
https://issues.apache.org/jira/browse/FLINK-17402?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20%22Deployment%20%2F
n the `top` command
- Look into the `/proc/meminfo` file
- Any container memory usage metrics that are available to your Yarn cluster
Thank you~
Xintong Song
On Tue, Oct 27, 2020 at 6:21 PM Ori Popowski wrote:
> After the job is running for 10 days in production, TaskManagers start
> f
can also try increasing the `jvm-overhead`, simply to leave more native
memory in the container in case there are other other significant native
memory usages.
Thank you~
Xintong Song
On Wed, Oct 28, 2020 at 5:53 PM Ori Popowski wrote:
> Hi Xintong,
>
> See here:
>
> # Top me
upgrade to 1.10.2, to include the latest bug fixes on the 1.10 release.
Thank you~
Xintong Song
On Thu, Oct 29, 2020 at 4:41 PM Ori Popowski wrote:
> Hi,
>
> PID 20331 is indeed the Flink process, specifically the TaskManager
> process.
>
> - Workload is a streaming workload
, you might want to look into this comment [1] in FLINK-18712.
- If neither of the above actions helps, we might need to leverage tools
(e.g., JVM NMT [2]) to track the native memory usages and see where exactly
the leak comes from.
Thank you~
Xintong Song
[1]
https://issues.apache.org/jira/b
Hi Schneider,
The error message suggests that your task managers are not configured with
enough network memory. You would need to increase the network memory
configuration. See this doc [1] for more details.
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release
on the decommissioning node will be killed.
Thank you~
Xintong Song
On Fri, Nov 13, 2020 at 2:57 PM Robert Metzger wrote:
> Hi,
> it seems that YARN has a feature for targeting specific hardware:
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.htm
requested
in such cases.
- kubernetes.jobmanager.cpu
- kubernetes.taskmanager.cpu
- yarn.appmaster.vcores
- yarn.containers.vcores
- mesos.resourcemanager.tasks.cpus
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memo
sers. Will fire an issue on that.
Thank you~
Xintong Song
On Mon, Dec 7, 2020 at 11:03 AM Xintong Song wrote:
> Hi Rex,
>
> We're running this in a local environment so that may be contributing to
>> what we're seeing.
>>
> Just to double check on this. By `
FYI, I've opened FLINK-20503 for this.
https://issues.apache.org/jira/browse/FLINK-20503
Thank you~
Xintong Song
On Mon, Dec 7, 2020 at 11:10 AM Xintong Song wrote:
> I forgot to mention that it is designed that task managers always have
> `Double#MAX_VALUE` cpu cores in loca
into the ZooKeeper logs checking why RM's
leadership is revoked.
Thank you~
Xintong Song
On Thu, Dec 17, 2020 at 8:42 AM Lu Niu wrote:
> Hi, Flink users
>
> Recently we migrated to flink 1.11 and see exceptions like:
> ```
> 2020-12-
I'm not aware of any significant changes to the HA components between
1.9/1.11.
Would you mind sharing the complete jobmanager/taskmanager logs?
Thank you~
Xintong Song
On Fri, Dec 18, 2020 at 8:53 AM Lu Niu wrote:
> Hi, Xintong
>
> Thanks for replying and your suggestion. I
The Apache Flink community is very happy to announce the release of Apache
Flink 1.11.3, which is the third bugfix release for the Apache Flink 1.11
series.
Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
a
I believe what you are looking for is the State TTL [1][2].
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl
[2]
https://ci.apache.org/projects/flink/flink-docs-stabledev/table/config.html#table-exec-state
ed as `yarn.ship-files`, `yarn.ship-archives` or
`yarn.provided.lib.dirs`? This helps us to locate the code path that this
file went through.
Thank you~
Xintong Song
On Sun, Jan 17, 2021 at 10:32 PM Mark Davis wrote:
> Hi all,
> I am upgrading my DataSet jobs from Flink 1.8 to 1.12.
> Aft
The Apache Flink community is very happy to announce the release of Apache
Flink 1.12.1, which is the first bugfix release for the Apache Flink 1.12
series.
Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
a
ime.highavailability
org.apache.flink.runtime.leaderretrieval
org.apache.zookeeper
Thank you~
Xintong Song
On Wed, Mar 4, 2020 at 5:42 AM Bajaj, Abhinav
wrote:
> Hi,
>
>
>
> We recently came across an issue where JobMaster does not register with
> ResourceManager in Fink high availability set
hose from the job
restart to the NoResourceAvailableException) to find out which is the case.
Thank you~
Xintong Song
On Thu, Mar 5, 2020 at 7:30 AM Bajaj, Abhinav
wrote:
> While I setup to reproduce the issue with debug logs, I would like to
> share more information I noticed in INFO logs.
&
the rest part of the log (from where the current one ends to
the NoResourceAvailableException) to tell what happened during the
scheduling. Also, could you confirm how many TMs do you use?
Thank you~
Xintong Song
On Fri, Mar 6, 2020 at 5:55 AM Bajaj, Abhinav
wrote:
> Hi Xintong,
&g
skew ease?
I suspect the performance difference might be an outcome of some warming up
issues. E.g., the existing TMs might have some file already localized, or
some memory buffers already promoted to the JVM tenured area, while the new
TMs have not.
Thank you~
Xintong Song
On Wed, Mar 11
rea.
Thank you~
Xintong Song
On Wed, Mar 11, 2020 at 10:37 AM Eleanore Jin
wrote:
> _Hi Xintong,
>
> Thanks for the prompt reply! To answer your question:
>
>- Which Flink version are you using?
>
>v1.8.2
>
>- Is this skew observed on
Hi Vitaliy,
You can specify a yarn queue by either setting the configuration option
'yarn.application.queue' [1], or using the command line option '-qu' (or
'--queue') [2].
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-rel
size'
is missing. You can take a look at the launching command, see if there's
anything unexpected before the memory dynamic configurations.
Thank you~
Xintong Song
On Thu, Mar 12, 2020 at 2:26 PM Yangze Guo wrote:
> Hi, Alexander
>
> I could not reproduce it in my local
e.g.,
in a Flink YARN Session.[1]
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html#flink-yarn-session
On Thu, Mar 12, 2020 at 6:20 PM Vitaliy Semochkin
wrote:
> Thank you Xintong Song,
>
> is there any way to queue pr
link Master will interact with Kubernetes Master, and actively
requests for pods/containers, like on Yarn/Mesos.
Thank you~
Xintong Song
On Mon, Mar 16, 2020 at 4:03 PM Pankaj Chand
wrote:
> Hi all,
>
> I want to run Flink, Spark and other processing engines on a single
> K
Forgot to mention that "running Flink natively on Kubernetes" is newly
introduced and is only available for Flink 1.10 and above.
Thank you~
Xintong Song
On Mon, Mar 16, 2020 at 5:40 PM Xintong Song wrote:
> Hi Pankaj,
>
> "Running Flink on Kubernetes" refers
Flink 1.7 till the
latest 1.10, and I'm not aware of any reported issue that the JM may not
try to connect RM once the address is received.
Thank you~
Xintong Song
On Tue, Mar 17, 2020 at 7:45 AM Bajaj, Abhinav
wrote:
> Hi Xintong,
>
>
>
> Apologies for delayed response
I'm not familiar with ZK either.
I've copied Yang Wang, who might be able to provide some suggestions.
Alternatively, you can try to post your question to the Apache ZooKeeper
community, see if they have any clue.
Thank you~
Xintong Song
On Wed, Mar 18, 2020 at 8:12 AM Bajaj, Abhi
Hi Forideal,
Do you mean you have 700 slots per TM or in total? How many TMs do you
have? And how many slots do you have per TM?
Also, when is the screenshot taken? It is after the job is fully initiated?
It seems you only need 1k+ network buffers.
Thank you~
Xintong Song
On Fri, Mar 20
16 [1], which replaces masterMemoryMB with
`jobmanager.memory.process.size`. That would also involve refactoring
YarnClusterDescriptor, which is not in good shape (e.g. the method
startAppMaster has more than 400 lines) and is closely coupled with
ClusterSpecification.
Thank you~
Xintong Song
O
helpful to that end.
In addition, would you be able to check the Yarn logs? See if the container
requests are received and containers are allocated.
Thank you~
Xintong Song
On Tue, Mar 24, 2020 at 6:45 AM Vitaliy Semochkin
wrote:
> Hi,
>
> I create a job with following p
Thanks Yangze, I've tried the tool and I think its very helpful.
Thank you~
Xintong Song
On Mon, Mar 30, 2020 at 9:40 AM Yangze Guo wrote:
> Hi, Yun,
>
> I'm sorry that it currently could not handle it. But I think it is a
> really good idea and that feature woul
for a job
cluster, but does not cover the scenarios of session clusters.
Thank you~
Xintong Song
On Mon, Mar 30, 2020 at 12:03 PM Yangze Guo wrote:
> Thanks for your feedbacks, @Xintong and @Jeff.
>
> @Jeff
> I think it would always be good to leverage exist logic in Flink, such
>
environment and workloads.
For standalone clusters, the cut-off will not take any effect. For
containerized environments, depending on Yarn/Mesos configurations your
container may or may not get killed due to exceeding the container memory.
Thank you~
Xintong Song
On Tue, Mar 31, 2020 at 5:34 PM
d, including "-d". As a result, you're
running the session cluster in attached mode, and the client will not exit
until the session is shutdown.
Thank you~
Xintong Song
On Fri, Apr 10, 2020 at 1:10 PM Yangze Guo wrote:
> Do you mean to run it in detach mode? If so, you could add
ny native memory? E.g., launch
another process, calling a JNI library or so?
Thank you~
Xintong Song
On Sat, Apr 11, 2020 at 3:56 AM Mitch Lloyd wrote:
> We are having an issue with a Flink Job that gradually consumes all
> available memory on a Docker host machine, crashing the machin
Normally, Yarn RM switch should not cause any problem to the running Flink
instance. Unless the RM switch takes too long and Flink happens to request
new containers during that time, it might lead to resource allocation
timeout.
Thank you~
Xintong Song
On Wed, Apr 15, 2020 at 3:49 PM LakeShen
heap /
direct memory.
My suggestion is to try increasing the JVM overhead configuration. You can
leverage the configuration options
'taskmanager.memory.jvm-overhead.[min|max|fraction]'. See more details in
the documentation[1].
Thank you~
Xintong Song
[1]
https://ci.apache.org/pr
performance
to get stabilized. Depends on your workload, this could take up to tens of
minutes.
Please also be careful with aggregations over large windows. The emitting
of windows might introduce large processing workloads, fluctuating the
measured throughput.
Thank you~
Xintong Song
On Thu, Apr 23
@Stephan,
I don't think so. If JVM hits the direct memory limit, you should see the
error message "OutOfMemoryError: Direct buffer memory".
Thank you~
Xintong Song
On Thu, Apr 23, 2020 at 6:11 PM Stephan Ewen wrote:
> @Xintong and @Lasse could it be that the JVM hits
Hi Flavio,
I'm not aware of anyway to automatically format the codes. The only thing I
find that might help is to enable your IDE with a checkstyle plugin.
https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/ide_setup.html#checkstyle-for-java
Thank you~
Xintong Song
On Thu
ative method, I think the problem is
that not enough native memory can be allocated for executing the native
method.
Thank you~
Xintong Song
On Fri, Apr 24, 2020 at 3:40 PM Stephan Ewen wrote:
> @Xintong - out of curiosity, where do you see that this tries to fork a
> process?
True. Thanks for the clarification.
Thank you~
Xintong Song
On Fri, Apr 24, 2020 at 5:21 PM Stephan Ewen wrote:
> I think native methods are not in a forked process. It is just a malloc()
> call that failed, probably an I/O buffer or so.
> This might mean that there really is
'task.off-heap.size'
being 0 only represents that in most cases user codes / operators do not
use off-heap memory. User would need to explicitly increase this
configuration if UDFs or libraries of the job uses off-heap memory.
Thank you~
Xintong Song
On Wed, Apr 29, 2020 at 11:07 AM
tions look good to me. It the configured path '/dumps/oom.bin' a
local path of the pod or a path of the host mounted onto the pod? The
restarted pod is a completely new different pod. Everything you write to
the old pod goes away as the pod terminated, unless they are written to the
host
led by
JVM. In Flink, managed memory and jvm-overhead are using native memory.
That means, if you see a JVM OOM, increasing jvm-overhead should not help.
Thank you~
Xintong Song
On Thu, Apr 30, 2020 at 11:06 AM Jiahui Jiang
wrote:
> Hey Xintong, Steven, thanks for replies!
>
> @Steven W
ner". I suspect there might be some
argument passing problem regarding the spaces and double quotation marks.
Thank you~
Xintong Song
On Thu, Apr 30, 2020 at 11:39 AM Eleanore Jin
wrote:
> Hi Xintong,
>
> Thanks for the detailed explanation!
>
> as for the 2nd question: I mou
Hi Lei,
Could you check whether the hostname 'localhost' is available on your
CentOS machine? This is usually defined in "/etc/hosts".
You can also try to modify the slaves file, replacing 'localhost' with
'127.0.0.1'. The path is: /conf/slaves
Thank you~
se a few direct memory. But that's quite opportunistic. So
it would be better to configure a non-zero task.off-heap if you know your
tasks/operators use some direct memory.
Thank you~
Xintong Song
On Thu, Apr 30, 2020 at 12:14 PM Jiahui Jiang
wrote:
> Hey Xintong, thanks for the explanat
Linking to the jira ticket, for the record.
https://issues.apache.org/jira/browse/FLINK-17560
Thank you~
Xintong Song
On Sat, May 9, 2020 at 2:14 AM Josson Paul wrote:
> Set up
> --
> Flink verson 1.8.3
>
> Zookeeper HA cluster
>
> 1 ResourceManager/Dispa
Hi Jacky,
Could you search for "Application Master start command:" in the debug log
and post the result and a few lines before & after that? This is not
included in the clip of attached log file.
Thank you~
Xintong Song
On Tue, May 12, 2020 at 5:33 AM Jacky D wrote:
> hi,
PREFIX}"
with "/your-file-name.jit". The token "" should be
replaced with proper log directory path by Yarn automatically.
I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's
documentation [1]. This is IMO a bit misleading. I'll try to file
1.11.0 is feature freezing today. The final release date depends on the
progress of release testing / bug fixing.
Thank you~
Xintong Song
On Mon, May 18, 2020 at 6:36 PM Omar Gawi wrote:
> Thanks Till!
> Do you know what is 1.11.0 release date?
>
>
> On Mon, May 18, 2020 a
lower parallelism.
Could you share some more information about your use case?
- What kind of job are your executing? Is it a streaming or batch
processing job?
- Which Flink deployment do you use? Standalone? Yarn?
- It would be helpful if you can share the Flink logs.
Thank you~
Xintong
an argument for
the `flink run` command, to set parallelism for all operators.
- Set `parallelism.default` in your `flink-conf.yaml`, to set a default
parallelism for your jobs. This will be used for jobs that have not set
parallelism with neither of the above methods.
Thank you~
Xintong So
t the execution plan
only shows 5.
Thank you~
Xintong Song
On Wed, May 27, 2020 at 3:16 AM Vijay Balakrishnan
wrote:
> Hi Xintong,
> Thanks for the excellent clarification for tasks.
>
> I attached a sample screenshot above and din't reflect the slots used and
> the tasks li
etwork_fraction, network_min), network_max)`. According to the error
message, your current network memory size is `85922 buffers * 32KB/buffer =
2685MB`, smaller than your "max" (4gb). That means increasing the "max"
does not help in your case. It is the "fraction" that you
.NioEventLoop
>> .processSelectedKeys(NioEventLoop.java:508)
>> at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop
>> .run(NioEventLoop.java:470)
>> at org.apache.flink.shaded.netty4.io.netty.util.concurrent.
>> SingleThreadEventExecutor$5.run(SingleThre
ould need to look into the *log
of the task manager that is not responding* to understand what's wrong with
it.
Thank you~
Xintong Song
On Fri, Jun 5, 2020 at 6:06 AM Vijay Balakrishnan
wrote:
> Thx a ton, Xintong.
> I am using this configuration now:
> taskman
ing only one job.
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/cluster_setup.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/glossary.html#flink-application-cluster
[3]
https://ci.apache.org/projects/flink
dynamically adapt to the available
resources (e.g., add/reduce pods on kubernetes). AFAIK, this is still in
the design discussion.
Thank you~
Xintong Song
On Wed, Jun 10, 2020 at 2:44 AM Prasanna kumar <
prasannakumarram...@gmail.com> wrote:
> Hi all,
>
> Does flink support dynamic s
igurations will be read by Flink
task manager so that memory will be managed accordingly.
Flink task manager expects all the memory configurations are already set
(thus network min/max should have the same value) before it's started. In
your case, it seems such configurations are missin
jvmHeap = (total - Max(cutoff-min, total * cutoff-ratio)) *
(1 - networkFraction) = (102GB - Max(600MB, 102GB * 0.25)) * (1 - 0.48) =
40.6GB
Have you specified a custom "-Xmx" parameter?
Thank you~
Xintong Song
On Fri, Jun 12, 2020 at 7:50 AM Vijay Balakrishnan
wrote:
> Hi
he configuration option but not
for the environment variable)
> The previous options which were responsible for the total memory used by
> Flink are taskmanager.heap.size or taskmanager.heap.mb. Despite their
> naming, they included not only JVM heap but also other off-heap memory
> compon
you~
Xintong Song
On Fri, Jun 12, 2020 at 4:27 PM Xintong Song wrote:
> Hi Li,
>
> FLINK_TM_HEAP corresponds to the legacy configuration option
> "taskmanager.heap.size". It is supported for backwards compatibility. I
> strongly recommend you to use "
-Xmx on Mesos.
BTW, from your screenshot the physical memory is 123GB, so 1/4 of that is
much closer to 29GB if we consider there are some rounding errors and
accuracy loss.
Thank you~
Xintong Song
On Fri, Jun 12, 2020 at 4:33 PM Vijay Balakrishnan
wrote:
> Thx, Xintong for a great
leverage the
configuration option "taskmanager.memory.task.heap.size", and an additional
constant framework overhead will be added to this value for -Xmx.
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters
O
l whether "env.java.opts" works for you.
Thank you~
Xintong Song
On Fri, Jun 12, 2020 at 5:33 PM Vijay Balakrishnan
wrote:
> Hi Xintong,
> Just to be clear. I haven't set any -Xmx -i will check our scripts again.
> Assuming no -Xmx is set, the doc above says 1/4 of
Yes, that is correct. 'taskmanager.memory.process.size' is the most
recommended.
Thank you~
Xintong Song
On Fri, Jun 12, 2020 at 10:59 PM Clay Teeter wrote:
> Ok, this is great to know. So in my case; I have a k8 pod that has a
> limit of 4Gb. I should remove the -Xmx and
single
job mode. The session mode is not supported. But I haven't checked this for
quite a while. It could have been changed.
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
[2]
https://ci.apach
Congratulations Yu, well deserved~!
Thank you~
Xintong Song
On Wed, Jun 17, 2020 at 9:15 AM jincheng sun
wrote:
> Hi all,
>
> On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
> part of the Apache Flink Project Management Committee (PMC).
>
> Yu Li
not timely handled before the timeout check.
- Is there any metrics monitoring the network condition between the JM
and timeouted TM? Possibly any jitters?
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#heartbeat-timeout
On Thu
o set `.task.heap.size` and `managed.size`.
2. If you don't know how many heap/managed memory to configure, you
can look for the configuration options in the beginning of the TM logs
(`-Dkey=value`). Those are the values derived from your current
configuration.
Thank you~
Xi
n guide [1].
Thank you~
Xintong Song
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_migration.html
On Sun, Jun 28, 2020 at 10:12 PM Ori Popowski wrote:
> Thanks for the suggestions!
>
> > i recently tried 1.10 and see this error frequently. and
sk managers (say tens
of GBs) unless absolutely necessary. Alternatively, you can try to launch
multiple TMs on one physical machine, to reduce the memory size of each TM
process.
BTW, what kind of workload are you running? Is it streaming or batch?
Thank you~
Xintong Song
On Mon, Jun 29, 20
1 - 100 of 220 matches
Mail list logo