I think what I meant was "either add it to /lib, or [if it is already in
/lib but also bundled in the jar] add it to the parent-first patterns."
On 28/04/2022 15:56, Chesnay Schepler wrote:
Pretty sure, even though I seemingly documented it incorrectly :)
On 28/04/2022 15:49, John Smith wrote:
You sure?
*
/JDBC/: JDBC drivers leak references outside the user code
classloader. To ensure that these classes are only loaded once
you should either add the driver jars to Flink’s |lib/| folder,
or add the driver classes to the list of parent-first loaded
class via |classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>.
It says either or
On Wed, Apr 27, 2022 at 3:44 AM Chesnay Schepler <ches...@apache.org>
wrote:
You're misinterpreting the docs.
The parent/child-first classloading controls where Flink looks
for a class /first/, specifically whether we first load from /lib
or the user-jar.
It does not allow you to load something from the user-jar in the
parent classloader. That's just not how it works.
It must be in /lib.
On 27/04/2022 04:59, John Smith wrote:
Hi Chesnay as per the docs...
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/
You can either put the jars in task manager lib folder or use
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
I prefer the latter like this: the dependency stays with the
user-jar and not on the task manager.
On Tue, Apr 26, 2022 at 9:52 PM John Smith
<java.dev....@gmail.com> wrote:
Ok so I should put the Apache ignite and my Microsoft
drivers in the lib folders of my task managers?
And then in my job jar only include them as compile time
dependencies?
On Tue, Apr 26, 2022 at 10:42 AM Chesnay Schepler
<ches...@apache.org> wrote:
JDBC drivers are well-known for leaking classloaders
unfortunately.
You have correctly identified your alternatives.
You must put the jdbc driver into /lib instead. Setting
only the parent-first pattern shouldn't affect anything.
That is only relevant if something is in both in /lib
and the user-jar, telling Flink to prioritize what is in
lib.
On 26/04/2022 15:35, John Smith wrote:
So I put classloader.parent-first-patterns.additional:
"org.apache.ignite." in the task config and so far I
don't think I'm getting "java.lang.OutOfMemoryError:
Metaspace" any more.
Or it's too early to tell.
Though now, the task managers are shutting down due to
some other failures.
So maybe because tasks were failing and reloading often
the task manager was running out of Metspace. But now
maybe it's just cleanly shutting down.
On Wed, Apr 20, 2022 at 11:35 AM John Smith
<java.dev....@gmail.com> wrote:
Or I can put in the config to treat
org.apache.ignite. classes as first class?
On Tue, Apr 19, 2022 at 10:18 PM John Smith
<java.dev....@gmail.com> wrote:
Ok, so I loaded the dump into Eclipse Mat and
followed:
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
- On the Histogram, I got over 30 entries for:
ChildFirstClassLoader
- Then I clicked on one of them "Merge Shortest
Path..." and picked "Exclude all
phantom/weak/soft references"
- Which then gave me: SqlDriverManager > Apache
Ignite JdbcThin Driver
So i'm guessing anything JDBC based. I should
copy into the task manager libs folder and my
jobs make the dependencies as compile only?
On Tue, Apr 19, 2022 at 12:18 PM Yaroslav
Tkachenko <yaros...@goldsky.io> wrote:
Also
https://shopify.engineering/optimizing-apache-flink-applications-tips
might be helpful (has a section on
profiling, as well as classloading).
On Tue, Apr 19, 2022 at 4:35 AM Chesnay
Schepler <ches...@apache.org> wrote:
We have a very rough "guide" in the
wiki (it's just the specific steps I
took to debug another leak):
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
On 19/04/2022 12:01, huweihua wrote:
Hi, John
Sorry for the late reply. You can use
MAT[1] to analyze the dump file. Check
whether have too many loaded classes.
[1] https://www.eclipse.org/mat/
2022年4月18日 下午9:55,John Smith
<java.dev....@gmail.com> 写道:
Hi, can anyone help with this? I
never looked at a dump file before.
On Thu, Apr 14, 2022 at 11:59 AM John
Smith <java.dev....@gmail.com> wrote:
Hi, so I have a dump file. What
do I look for?
On Thu, Mar 31, 2022 at 3:28 PM
John Smith
<java.dev....@gmail.com> wrote:
Ok so if there's a leak, if I
manually stop the job and
restart it from the UI
multiple times, I won't see
the issue because because the
classes are unloaded correctly?
On Thu, Mar 31, 2022 at 9:20
AM huweihua
<huweihua....@gmail.com> wrote:
The difference is that
manually canceling the
job stops the JobMaster,
but automatic failover
keeps the JobMaster
running. But looking on
TaskManager, it doesn't
make much difference
2022年3月31日 上午4:01,John
Smith
<java.dev....@gmail.com>
写道:
Also if I manually
cancel and restart the
same job over and over
is it the same as if
flink was restarting a
job due to failure?
I.e: When I click
"Cancel Job" on the UI
is the job completely
unloaded vs when the job
scheduler restarts a job
because if whatever reason?
Lile this I'll stop and
restart the job a few
times or maybe I can
trick my job to fail and
have the scheduler
restart it. Ok let me
think about this...
On Wed, Mar 30, 2022 at
10:24 AM 胡伟华
<huweihua....@gmail.com>
wrote:
So if I run the
same jobs in my dev
env will I still be
able to see the
similar dump?
I think running the
same job in dev
should be
reproducible, maybe
you can have a try.
If not I would
have to wait at a
low volume time to
do it on
production. Aldo if
I recall the dump
is as big as the
JVM memory right so
if I have 10GB
configed for the
JVM the dump will
be 10GB file?
Yes, JMAP will pause
the JVM, the time of
pause depends on the
size to dump. you
can use "jmap
-dump:live" to dump
only the reachable
objects, this will
take a brief pause
2022年3月30日
下午9:47,John Smith
<java.dev....@gmail.com>
写道:
I have 3 task
managers (see
config below).
There is total of
10 jobs with 25
slots being used.
The jobs are 100%
ETL I.e; They load
Json, transform it
and push it to
JDBC, only 1 job of
the 10 is pushing
to Apache Ignite
cluster.
FOR JMAP. I know
that it will pause
the task manager.
So if I run the
same jobs in my dev
env will I still be
able to see the
similar dump? I I
assume so. If not I
would have to wait
at a low volume
time to do it on
production. Aldo if
I recall the dump
is as big as the
JVM memory right so
if I have 10GB
configed for the
JVM the dump will
be 10GB file?
# Operating system
has 16GB total.
env.ssh.opts: -l
flink
-oStrictHostKeyChecking=no
cluster.evenly-spread-out-slots:
true
taskmanager.memory.flink.size:
10240m
taskmanager.memory.jvm-metaspace.size:
2048m
taskmanager.numberOfTaskSlots:
16
parallelism.default: 1
high-availability:
zookeeper
high-availability.storageDir:
file:///mnt/flink/ha/flink_1_14/
high-availability.zookeeper.quorum:
...
high-availability.zookeeper.path.root:
/flink_1_14
high-availability.cluster-id:
/flink_1_14_cluster_0001
web.upload.dir:
/mnt/flink/uploads/flink_1_14
state.backend: rocksdb
state.backend.incremental:
true
state.checkpoints.dir:
file:///mnt/flink/checkpoints/flink_1_14
state.savepoints.dir:
file:///mnt/flink/savepoints/flink_1_14
On Wed, Mar 30,
2022 at 2:16 AM 胡伟华
<huweihua....@gmail.com>
wrote:
Hi, John
Could you tell
us you
application
scenario? Is it
a flink session
cluster with a
lot of jobs?
Maybe you can
try to dump the
memory with
jmap and use
tools such as
MAT to analyze
whether there
are abnormal
classes and
classloaders
> 2022年3月30日
上午6:09,John
Smith
<java.dev....@gmail.com>
写道:
>
> Hi running 1.14.4
>
> My tasks
manager still
fails with
java.lang.OutOfMemoryError:
Metaspace. The
metaspace
out-of-memory
error has
occurred. This
can mean two
things: either
the job
requires a
larger size of
JVM metaspace
to load classes
or there is a
class loading leak.
>
> I have 2GB of
metaspace
configed
taskmanager.memory.jvm-metaspace.size:
2048m
>
> But the task
nodes still fail.
>
> When looking
at the UI
metrics, the
metaspace
starts low. Now
I see 85%
usage. It seems
to be a class
loading leak at
this point, how
can we debug
this issue?