Pretty sure, even though I seemingly documented it incorrectly :)
On 28/04/2022 15:49, John Smith wrote:
You sure?
*
/JDBC/: JDBC drivers leak references outside the user code
classloader. To ensure that these classes are only loaded once you
should either add the driver jars to Flink’s |lib/| folder, or add
the driver classes to the list of parent-first loaded class via
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>.
It says either or
On Wed, Apr 27, 2022 at 3:44 AM Chesnay Schepler <ches...@apache.org>
wrote:
You're misinterpreting the docs.
The parent/child-first classloading controls where Flink looks for
a class /first/, specifically whether we first load from /lib or
the user-jar.
It does not allow you to load something from the user-jar in the
parent classloader. That's just not how it works.
It must be in /lib.
On 27/04/2022 04:59, John Smith wrote:
Hi Chesnay as per the docs...
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/
You can either put the jars in task manager lib folder or use
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
I prefer the latter like this: the dependency stays with the
user-jar and not on the task manager.
On Tue, Apr 26, 2022 at 9:52 PM John Smith
<java.dev....@gmail.com> wrote:
Ok so I should put the Apache ignite and my Microsoft drivers
in the lib folders of my task managers?
And then in my job jar only include them as compile time
dependencies?
On Tue, Apr 26, 2022 at 10:42 AM Chesnay Schepler
<ches...@apache.org> wrote:
JDBC drivers are well-known for leaking classloaders
unfortunately.
You have correctly identified your alternatives.
You must put the jdbc driver into /lib instead. Setting
only the parent-first pattern shouldn't affect anything.
That is only relevant if something is in both in /lib and
the user-jar, telling Flink to prioritize what is in lib.
On 26/04/2022 15:35, John Smith wrote:
So I put classloader.parent-first-patterns.additional:
"org.apache.ignite." in the task config and so far I
don't think I'm getting "java.lang.OutOfMemoryError:
Metaspace" any more.
Or it's too early to tell.
Though now, the task managers are shutting down due to
some other failures.
So maybe because tasks were failing and reloading often
the task manager was running out of Metspace. But now
maybe it's just cleanly shutting down.
On Wed, Apr 20, 2022 at 11:35 AM John Smith
<java.dev....@gmail.com> wrote:
Or I can put in the config to treat
org.apache.ignite. classes as first class?
On Tue, Apr 19, 2022 at 10:18 PM John Smith
<java.dev....@gmail.com> wrote:
Ok, so I loaded the dump into Eclipse Mat and
followed:
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
- On the Histogram, I got over 30 entries for:
ChildFirstClassLoader
- Then I clicked on one of them "Merge Shortest
Path..." and picked "Exclude all
phantom/weak/soft references"
- Which then gave me: SqlDriverManager > Apache
Ignite JdbcThin Driver
So i'm guessing anything JDBC based. I should
copy into the task manager libs folder and my
jobs make the dependencies as compile only?
On Tue, Apr 19, 2022 at 12:18 PM Yaroslav
Tkachenko <yaros...@goldsky.io> wrote:
Also
https://shopify.engineering/optimizing-apache-flink-applications-tips
might be helpful (has a section on
profiling, as well as classloading).
On Tue, Apr 19, 2022 at 4:35 AM Chesnay
Schepler <ches...@apache.org> wrote:
We have a very rough "guide" in the wiki
(it's just the specific steps I took to
debug another leak):
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
On 19/04/2022 12:01, huweihua wrote:
Hi, John
Sorry for the late reply. You can use
MAT[1] to analyze the dump file. Check
whether have too many loaded classes.
[1] https://www.eclipse.org/mat/
2022年4月18日 下午9:55,John Smith
<java.dev....@gmail.com> 写道:
Hi, can anyone help with this? I never
looked at a dump file before.
On Thu, Apr 14, 2022 at 11:59 AM John
Smith <java.dev....@gmail.com> wrote:
Hi, so I have a dump file. What do
I look for?
On Thu, Mar 31, 2022 at 3:28 PM
John Smith
<java.dev....@gmail.com> wrote:
Ok so if there's a leak, if I
manually stop the job and
restart it from the UI
multiple times, I won't see
the issue because because the
classes are unloaded correctly?
On Thu, Mar 31, 2022 at 9:20
AM huweihua
<huweihua....@gmail.com> wrote:
The difference is that
manually canceling the job
stops the JobMaster, but
automatic failover keeps
the JobMaster running. But
looking on TaskManager, it
doesn't make much difference
2022年3月31日 上午4:01,John
Smith
<java.dev....@gmail.com>
写道:
Also if I manually cancel
and restart the same job
over and over is it the
same as if flink was
restarting a job due to
failure?
I.e: When I click "Cancel
Job" on the UI is the job
completely unloaded vs
when the job scheduler
restarts a job because if
whatever reason?
Lile this I'll stop and
restart the job a few
times or maybe I can
trick my job to fail and
have the scheduler
restart it. Ok let me
think about this...
On Wed, Mar 30, 2022 at
10:24 AM 胡伟华
<huweihua....@gmail.com>
wrote:
So if I run the same
jobs in my dev env
will I still be able
to see the similar
dump?
I think running the
same job in dev
should be
reproducible, maybe
you can have a try.
If not I would have
to wait at a low
volume time to do it
on production. Aldo
if I recall the dump
is as big as the JVM
memory right so if I
have 10GB configed
for the JVM the dump
will be 10GB file?
Yes, JMAP will pause
the JVM, the time of
pause depends on the
size to dump. you can
use "jmap -dump:live"
to dump only the
reachable objects,
this will take a
brief pause
2022年3月30日
下午9:47,John Smith
<java.dev....@gmail.com>
写道:
I have 3 task
managers (see config
below). There is
total of 10 jobs
with 25 slots being
used.
The jobs are 100%
ETL I.e; They load
Json, transform it
and push it to JDBC,
only 1 job of the 10
is pushing to Apache
Ignite cluster.
FOR JMAP. I know
that it will pause
the task manager. So
if I run the same
jobs in my dev env
will I still be able
to see the similar
dump? I I assume so.
If not I would have
to wait at a low
volume time to do it
on production. Aldo
if I recall the dump
is as big as the JVM
memory right so if I
have 10GB configed
for the JVM the dump
will be 10GB file?
# Operating system
has 16GB total.
env.ssh.opts: -l
flink
-oStrictHostKeyChecking=no
cluster.evenly-spread-out-slots:
true
taskmanager.memory.flink.size:
10240m
taskmanager.memory.jvm-metaspace.size:
2048m
taskmanager.numberOfTaskSlots:
16
parallelism.default: 1
high-availability:
zookeeper
high-availability.storageDir:
file:///mnt/flink/ha/flink_1_14/
high-availability.zookeeper.quorum:
...
high-availability.zookeeper.path.root:
/flink_1_14
high-availability.cluster-id:
/flink_1_14_cluster_0001
web.upload.dir:
/mnt/flink/uploads/flink_1_14
state.backend: rocksdb
state.backend.incremental:
true
state.checkpoints.dir:
file:///mnt/flink/checkpoints/flink_1_14
state.savepoints.dir:
file:///mnt/flink/savepoints/flink_1_14
On Wed, Mar 30, 2022
at 2:16 AM 胡伟华
<huweihua....@gmail.com>
wrote:
Hi, John
Could you tell
us you
application
scenario? Is it
a flink session
cluster with a
lot of jobs?
Maybe you can
try to dump the
memory with jmap
and use tools
such as MAT to
analyze whether
there are
abnormal classes
and classloaders
> 2022年3月30日
上午6:09,John
Smith
<java.dev....@gmail.com>
写道:
>
> Hi running 1.14.4
>
> My tasks
manager still
fails with
java.lang.OutOfMemoryError:
Metaspace. The
metaspace
out-of-memory
error has
occurred. This
can mean two
things: either
the job requires
a larger size of
JVM metaspace to
load classes or
there is a class
loading leak.
>
> I have 2GB of
metaspace
configed
taskmanager.memory.jvm-metaspace.size:
2048m
>
> But the task
nodes still fail.
>
> When looking
at the UI
metrics, the
metaspace starts
low. Now I see
85% usage. It
seems to be a
class loading
leak at this
point, how can
we debug this issue?