Hi Vino, Thanks for the explanation, but the job only ever uses the Avro (1.8.2) pulled in by flink-formats/avro, so it's not a class version conflict there.
I'm using default child-first loading. It might be a further transitive dependency, though it's not clear by stack trace or stepping through the process. When I get a chance I'll look further into it but in case anyone is experiencing similar problems, what is clear is that classloader order does matter with Avro. On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1...@gmail.com> wrote: > Hi Cliff, > > My personal guess is that this may be caused by Job's Avro conflict with > the Avro that the Flink framework itself relies on. > Flink has provided some configuration parameters which allows you to > determine the order of the classloaders yourself. [1] > Alternatively, you can debug classloading and participate in the > documentation.[2] > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html > [2]: > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html > > Thanks, vino. > > Cliff Resnick <cre...@gmail.com> 于2018年8月20日周一 上午10:40写道: > >> Our Flink/YARN pipeline has been reading Avro from Kafka for a while now. >> We just introduced a source of Avro OCF (Object Container Files) read from >> S3. The Kafka Avro continued to decode without incident, but the OCF files >> failed 100% with anomalous parse errors in the decoding phase after the >> schema and codec were successfully read from them. The pipeline would work >> on my laptop, and when I submitted a test Main program to the Flink Session >> in YARN, that would also successfully decode. Only the actual pipeline run >> from the TaskManager failed. At one point I even remote debugged the >> TaskManager process and stepped through what looked like a normal Avro >> decode (if you can describe Avro code as normal!) -- until it abruptly >> failed with an int decode or what-have-you. >> >> This stumped me for a while, but I finally tried moving flink-avro.jar >> from the lib to the application jar, and that fixed it. I'm not sure why >> this is, especially since there were no typical classloader-type errors. >> This issue was observed both on Flink 1.5 and 1.6 in Flip-6 mode. >> >> -Cliff >> >> >> >> >> >>