Something seems to be off with the user code class loader. The only way I can get my job to start is if I drop the job into the lib folder in the JM and configure the JM's classloader.resolve-order to parent-first.
Suggestions? On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy <fearsome.lucid...@gmail.com> wrote: > I am currently suffering through similar issues. > > Had a job running happily, but when it the cluster tried to restarted it > would not find the JSON serializer in it. The job kept trying to restart in > a loop. > > Just today I was running a job I built locally. The job ran fine. I > added two commits and rebuilt the jar. Now the job dies when it tries to > start claiming it can't find the time assigner class. I've unzipped the > job jar, both locally and in the TM blob directory and have confirmed the > class is in it. > > This is the backtrace: > > java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner > at java.net.URLClassLoader.findClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Unknown Source) > at > org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73) > at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) > at java.io.ObjectInputStream.readClassDesc(Unknown Source) > at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) > at java.io.ObjectInputStream.readObject0(Unknown Source) > at java.io.ObjectInputStream.readObject(Unknown Source) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368) > at > org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) > at > org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542) > at > org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167) > at > org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89) > at > org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Unknown Source) > > > On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <se...@apache.org> wrote: > >> Hi! >> >> We changed a few things between 1.3 and 1.4 concerning Avro. One of the >> main things is that Avro is no longer part of the core Flink class library, >> but needs to be packaged into your application jar file. >> >> The class loading / caching issues of 1.3 with respect to Avro should >> disappear in Flink 1.4, because Avro classes and caches are scoped to the >> job classloaders, so the caches do not go across different jobs, or even >> different operators. >> >> >> *Please check: Make sure you have Avro as a dependency in your jar file >> (in scope "compile").* >> >> Hope that solves the issue. >> >> Stephan >> >> >> On Mon, Jan 22, 2018 at 2:31 PM, Edward <egb...@hotmail.com> wrote: >> >>> Yes, we've seen this issue as well, though it usually takes many more >>> resubmits before the error pops up. Interestingly, of the 7 jobs we run >>> (all >>> of which use different Avro schemas), we only see this issue on 1 of >>> them. >>> Once the NoClassDefFoundError crops up though, it is necessary to >>> recreate >>> the task managers. >>> >>> There's a whole page on the Flink documentation on debugging >>> classloading, >>> and Avro is mentioned several times on that page: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/ >>> monitoring/debugging_classloading.html >>> >>> It seems that (in 1.3 at least) each submitted job has its own >>> classloader, >>> and its own instance of the Avro class definitions. However, the Avro >>> class >>> cache will keep references to the Avro classes from classloaders for the >>> previous cancelled jobs. That said, we haven't been able to find a >>> solution >>> to this error yet. Flink 1.4 would be worth a try because the of the >>> changes >>> to the default classloading behaviour (child-first is the new default, >>> not >>> parent-first). >>> >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nab >>> ble.com/ >>> >> >> >