Hi,

I think I've hit this same issue on a 3 node standalone cluster (1.4.2)
using HDFS (2.8.4) as state backend.

2018-09-26 17:07:39,370 INFO
org.apache.flink.runtime.taskmanager.Task                     - Attempting
to fail task externally SelectCepOperator (1/1)
(3bec4aa1ef2226c4e0c5ff7b3860d340).
2018-09-26 17:07:39,370 INFO
org.apache.flink.runtime.taskmanager.Task                     -
SelectCepOperator (1/1) (3bec4aa1ef2226c4e0c5ff7b3860d340) switched from
RUNNING to FAILED.
AsynchronousException{java.lang.Exception: Could not materialize checkpoint
6 for operator SelectCepOperator (1/1).}
    at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 6 for
operator SelectCepOperator (1/1).
    ... 6 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
    at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894)
    ... 5 more
    Suppressed: java.lang.Exception: Could not properly cancel managed
keyed state future.
        at
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)
        ... 5 more
    Caused by: java.util.concurrent.ExecutionException:
java.lang.NullPointerException
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
        at
org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66)
        at
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89)
        ... 7 more
    Caused by: java.lang.NullPointerException
        at
org.apache.flink.cep.nfa.SharedBuffer$SharedBufferSerializer.serialize(SharedBuffer.java:954)
        at
org.apache.flink.cep.nfa.SharedBuffer$SharedBufferSerializer.serialize(SharedBuffer.java:825)
        at
org.apache.flink.cep.nfa.NFA$NFASerializer.serialize(NFA.java:888)
        at
org.apache.flink.cep.nfa.NFA$NFASerializer.serialize(NFA.java:820)
        at
org.apache.flink.runtime.state.heap.CopyOnWriteStateTableSnapshot.writeMappingsInKeyGroup(CopyOnWriteStateTableSnapshot.java:196)
        at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:390)
        at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$1.performOperation(HeapKeyedStateBackend.java:339)
        at
org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894)
        ... 5 more
    [CIRCULAR REFERENCE:java.lang.NullPointerException]

Any ideas on why I'm hitting this especially when this (
https://issues.apache.org/jira/browse/FLINK-7756) says it has been fixed in
1.4.2 ?

On Sat, Nov 4, 2017 at 12:34 AM Federico D'Ambrosio <
federico.dambro...@smartlab.ws> wrote:

> Thank you very much for your steady response, Kostas!
>
> Cheers,
> Federico
>
> 2017-11-03 16:26 GMT+01:00 Kostas Kloudas <k.klou...@data-artisans.com>:
>
>> Hi Federico,
>>
>> Thanks for trying it out!
>> Great to hear that your problem was fixed!
>>
>> The feature freeze for the release is going to be next week, and I would
>> expect 1 or 2 more weeks testing.
>> So I would say in 2.5 weeks. But this is of course subject to potential
>> issues we may find during testing.
>>
>> Cheers,
>> Kostas
>>
>> On Nov 3, 2017, at 4:22 PM, Federico D'Ambrosio <
>> federico.dambro...@smartlab.ws> wrote:
>>
>> Hi Kostas,
>>
>> I just tried running the same job with 1.4-SNAPSHOT for 10 minutes and it
>> didn't crash, so that was the same underlying issue of the JIRA you linked.
>>
>> Do you happen to know when it's expected the 1.4 stable release?
>>
>> Thank you very much,
>> Federico
>>
>> 2017-11-03 15:25 GMT+01:00 Kostas Kloudas <k.klou...@data-artisans.com>:
>>
>>> Perfect! thanks a lot!
>>>
>>> Kostas
>>>
>>> On Nov 3, 2017, at 3:23 PM, Federico D'Ambrosio <
>>> federico.dambro...@smartlab.ws> wrote:
>>>
>>> Hi Kostas,
>>>
>>> yes, I'm using 1.3.2. I'll try the current master and I'll get back to
>>> you.
>>>
>>> 2017-11-03 15:21 GMT+01:00 Kostas Kloudas <k.klou...@data-artisans.com>:
>>>
>>>> Hi Federico,
>>>>
>>>> I assume that you are using Flink 1.3, right?
>>>>
>>>> In this case, in 1.4 we have fixed a bug that seems similar to your
>>>> case:
>>>> https://issues.apache.org/jira/browse/FLINK-7756
>>>>
>>>> Could you try the current master to see if it fixes your problem?
>>>>
>>>> Thanks,
>>>> Kostas
>>>>
>>>> On Nov 3, 2017, at 3:12 PM, Federico D'Ambrosio <
>>>> federico.dambro...@smartlab.ws> wrote:
>>>>
>>>>  Could not find id for
>>>> entry:
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Federico D'Ambrosio
>>>
>>>
>>>
>>
>>
>> --
>> Federico D'Ambrosio
>>
>>
>>
>
>
> --
> Federico D'Ambrosio
>

Reply via email to