Hi Cliff,

You are welcome, I am very happy to hear this message.

Thanks, vino.

Cliff Resnick <cre...@gmail.com> 于2018年8月21日周二 下午11:46写道:

> Solved this by moving flink-avro to lib and reverting to
> `classloader.resolve-order: parent-first`.  I still don't know why, but I
> guess if you're reading Avro both from file and Kafka in the same pipeline
> then inverted class loader delegation will not work. Thanks, Vino for your
> help!
>
> On Tue, Aug 21, 2018 at 8:02 AM Cliff Resnick <cre...@gmail.com> wrote:
>
>> Hi Aljoscha,
>>
>> We need flink-shaded-hadoop2-uber.jar because there is no hadoop distro
>> on the instance the Flink session/jobs is managed from and the process that
>> launches Flink is not a java process, but execs a process that calls the
>> flink script.
>>
>> -Cliff
>>
>> On Tue, Aug 21, 2018 at 5:11 AM Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Hi Cliff,
>>>
>>> Do you actually need the flink-shaded-hadoop2-uber.jar in lib. If you're
>>> running on YARN, you should be able to just remove them because with YARN
>>> you will have Hadoop in the classpath anyways.
>>>
>>> Aljoscha
>>>
>>> On 21. Aug 2018, at 03:45, vino yang <yanghua1...@gmail.com> wrote:
>>>
>>> Hi Cliff,
>>>
>>> If so, you can explicitly exclude Avro's dependencies from related
>>> dependencies (using <exclude>) and then directly introduce dependencies on
>>> the Avro version you need.
>>>
>>> Thanks, vino.
>>>
>>> Cliff Resnick <cre...@gmail.com> 于2018年8月21日周二 上午5:13写道:
>>>
>>>> Hi Vino,
>>>>
>>>> Unfortunately, I'm still stuck here. By moving the avro dependency
>>>> chain to lib (and removing it from user jar), my OCFs decode but I get the
>>>> error described here:
>>>>
>>>> https://github.com/confluentinc/schema-registry/pull/509
>>>>
>>>> However, the Flink fix described in the PR above was to move the Avro
>>>> dependency to the user jar. However, since I'm using YARN, I'm required to
>>>> have flink-shaded-hadoop2-uber.jar loaded from lib -- and that has
>>>> avro bundled un-shaded. So I'm back to the start problem...
>>>>
>>>> Any advice is welcome!
>>>>
>>>> -Cliff
>>>>
>>>>
>>>> On Mon, Aug 20, 2018 at 1:42 PM Cliff Resnick <cre...@gmail.com> wrote:
>>>>
>>>>> Hi Vino,
>>>>>
>>>>> You were right in your assumption -- unshaded avro was being added to
>>>>> our application jar via third-party dependency. Excluding it in packaging
>>>>> fixed the issue. For the record, it looks flink-avro must be loaded from
>>>>> the lib or there will be errors in checkpoint restores.
>>>>>
>>>>> On Mon, Aug 20, 2018 at 8:43 AM Cliff Resnick <cre...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Vino,
>>>>>>
>>>>>> Thanks for the explanation, but the job only ever uses the Avro
>>>>>> (1.8.2) pulled in by flink-formats/avro, so it's not a class version
>>>>>> conflict there.
>>>>>>
>>>>>> I'm using default child-first loading. It might be a further
>>>>>> transitive dependency, though it's not clear by stack trace or stepping
>>>>>> through the process. When I get a chance I'll look further into it but in
>>>>>> case anyone is experiencing similar problems, what is clear is that
>>>>>> classloader order does matter with Avro.
>>>>>>
>>>>>> On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Cliff,
>>>>>>>
>>>>>>> My personal guess is that this may be caused by Job's Avro conflict
>>>>>>> with the Avro that the Flink framework itself relies on.
>>>>>>> Flink has provided some configuration parameters which allows you to
>>>>>>> determine the order of the classloaders yourself. [1]
>>>>>>> Alternatively, you can debug classloading and participate in the
>>>>>>> documentation.[2]
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html
>>>>>>> [2]:
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html
>>>>>>>
>>>>>>> Thanks, vino.
>>>>>>>
>>>>>>> Cliff Resnick <cre...@gmail.com> 于2018年8月20日周一 上午10:40写道:
>>>>>>>
>>>>>>>> Our Flink/YARN pipeline has been reading Avro from Kafka for a
>>>>>>>> while now. We just introduced a source of Avro OCF (Object Container 
>>>>>>>> Files)
>>>>>>>> read from S3. The Kafka Avro continued to decode without incident, but 
>>>>>>>> the
>>>>>>>> OCF files failed 100% with anomalous parse errors in the decoding phase
>>>>>>>> after the schema and codec were successfully read from them. The 
>>>>>>>> pipeline
>>>>>>>> would work on my laptop, and when I submitted a test Main program to 
>>>>>>>> the
>>>>>>>> Flink Session in YARN, that would also successfully decode. Only the 
>>>>>>>> actual
>>>>>>>> pipeline run from the TaskManager failed. At one point I even remote
>>>>>>>> debugged the TaskManager process and stepped through what looked like a
>>>>>>>> normal Avro decode (if you can describe Avro code as normal!) -- until 
>>>>>>>> it
>>>>>>>> abruptly failed with an int decode or what-have-you.
>>>>>>>>
>>>>>>>> This stumped me for a while, but I finally tried moving
>>>>>>>> flink-avro.jar from the lib to the application jar, and that fixed it. 
>>>>>>>> I'm
>>>>>>>> not sure why this is, especially since there were no typical
>>>>>>>> classloader-type errors.  This issue was observed both on Flink 1.5 
>>>>>>>> and 1.6
>>>>>>>> in Flip-6 mode.
>>>>>>>>
>>>>>>>> -Cliff
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>

Reply via email to