[ https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414093#comment-16414093 ]
Rohit Singh commented on FLINK-9080: ------------------------------------ Based on documentation on Flink, Tried adding job in the flink lib of scheduler and task manager to avoid dynamic class loading https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html Getting following error {code:java} Class=o.a.f.r.e.ExecutionGraph Msg=Source: Custom Source -> Sink: Unnamed (1/1) (3f12f6953a235eb43f07cdf7966b5fcf) switched from RUNNING to FAILED. org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function. at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:235) ~[iot-mirror-device.jar:na] at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:95) ~[iot-mirror-device.jar:na] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231) ~[iot-mirror-device.jar:na] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) ~[iot-mirror-device.jar:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91] Caused by: java.lang.ClassCastException: cannot assign instance of org.apache.commons.collections.map.LinkedMap to field org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit of type org.apache.commons.collections.map.LinkedMap in instance of org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010 at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) ~[na:1.8.0_91] at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) ~[na:1.8.0_91] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024) ~[na:1.8.0_91] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) ~[na:1.8.0_91] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) ~[na:1.8.0_91] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) ~[na:1.8.0_91] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) ~[na:1.8.0_91] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) ~[na:1.8.0_91] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) ~[na:1.8.0_91] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) ~[na:1.8.0_91] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) ~[na:1.8.0_91] at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) ~[iot-mirror-device.jar:na] {code} > Flink Scheduler goes OOM, suspecting a memory leak > -------------------------------------------------- > > Key: FLINK-9080 > URL: https://issues.apache.org/jira/browse/FLINK-9080 > Project: Flink > Issue Type: Bug > Components: JobManager > Affects Versions: 1.4.0 > Reporter: Rohit Singh > Priority: Critical > Attachments: Top Level packages.JPG, Top level classes.JPG, > classesloaded vs unloaded.png > > > Running FLink version 1.4.0. on mesos,scheduler running along with job > manager in single container, whereas task managers running in seperate > containers. > Couple of jobs were running continously, Flink scheduler was working > properlyalong with task managers. Due to some change in data, one of the jobs > started failing continuously. In the meantime,there was a surge in flink > scheduler memory usually eventually died out off OOM > > Memory dump analysis was done, > Following were findings !Top Level packages.JPG!!Top level classes.JPG! > * Majority of top loaded packages retaining heap indicated towards > Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top > level package image) > * Top level classes were of Flinkuserclassloader, (Top Level class image) > * The number of classes loaded vs unloaded was quite less PFA,inspite of > adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled , > PFAclassloaded vs unloaded graph, scheduler was restarted 3 times > * There were custom classes as well which were duplicated during subsequent > class uploads > PFA all the images of heap dump. Can you suggest some pointers on as to how > to overcome this issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)