Something seems to be off with the user code class loader.  The only way I
can get my job to start is if I drop the job into the lib folder in the JM
and configure the JM's classloader.resolve-order to parent-first.

Suggestions?

On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:

> I am currently suffering through similar issues.
>
> Had a job running happily, but when it the cluster tried to restarted it
> would not find the JSON serializer in it. The job kept trying to restart in
> a loop.
>
> Just today I was running a job I built locally.  The job ran fine.  I
> added two commits and rebuilt the jar.  Now the job dies when it tries to
> start claiming it can't find the time assigner class.  I've unzipped the
> job jar, both locally and in the TM blob directory and have confirmed the
> class is in it.
>
> This is the backtrace:
>
> java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner
>       at java.net.URLClassLoader.findClass(Unknown Source)
>       at java.lang.ClassLoader.loadClass(Unknown Source)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>       at java.lang.ClassLoader.loadClass(Unknown Source)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Unknown Source)
>       at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
>       at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
>       at java.io.ObjectInputStream.readClassDesc(Unknown Source)
>       at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>       at java.io.ObjectInputStream.readObject0(Unknown Source)
>       at java.io.ObjectInputStream.readObject(Unknown Source)
>       at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
>       at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
>       at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
>       at 
> org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
>       at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542)
>       at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
>       at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
>       at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
>       at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
>       at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
>       at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>       at java.lang.Thread.run(Unknown Source)
>
>
> On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> We changed a few things between 1.3 and 1.4 concerning Avro. One of the
>> main things is that Avro is no longer part of the core Flink class library,
>> but needs to be packaged into your application jar file.
>>
>> The class loading / caching issues of 1.3 with respect to Avro should
>> disappear in Flink 1.4, because Avro classes and caches are scoped to the
>> job classloaders, so the caches do not go across different jobs, or even
>> different operators.
>>
>>
>> *Please check: Make sure you have Avro as a dependency in your jar file
>> (in scope "compile").*
>>
>> Hope that solves the issue.
>>
>> Stephan
>>
>>
>> On Mon, Jan 22, 2018 at 2:31 PM, Edward <egb...@hotmail.com> wrote:
>>
>>> Yes, we've seen this issue as well, though it usually takes many more
>>> resubmits before the error pops up. Interestingly, of the 7 jobs we run
>>> (all
>>> of which use different Avro schemas), we only see this issue on 1 of
>>> them.
>>> Once the NoClassDefFoundError crops up though, it is necessary to
>>> recreate
>>> the task managers.
>>>
>>> There's a whole page on the Flink documentation on debugging
>>> classloading,
>>> and Avro is mentioned several times on that page:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>> monitoring/debugging_classloading.html
>>>
>>> It seems that (in 1.3 at least) each submitted job has its own
>>> classloader,
>>> and its own instance of the Avro class definitions. However, the Avro
>>> class
>>> cache will keep references to the Avro classes from classloaders for the
>>> previous cancelled jobs. That said, we haven't been able to find a
>>> solution
>>> to this error yet. Flink 1.4 would be worth a try because the of the
>>> changes
>>> to the default classloading behaviour (child-first is the new default,
>>> not
>>> parent-first).
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/
>>>
>>
>>
>

Reply via email to