Hi Mike,

It would be worthwhile to check if this still occurs in Flink 1.14, since
Flink bumped to a newer version of RocksDB in that version. Is that a
possibility for you?

Best regards,

Martijn

Op ma 13 jun. 2022 om 15:21 schreef Mike Barborak <mi...@ec.ai>:

> When trying to savepoint our job, we are getting the stack trace below. Is
> there a way to know more about this failure? Like which function in the job
> graph is associated with the problematic state and which key (assuming it
> is keyed state)?
>
>
>
> Or is there a fix for this exception? The only mention of this exception
> that I can find is in [1] and [2]. [1] has a message at the bottom saying
> that the issue was fixed in RocksDb in 2018. And while we do have a part of
> the job graph that matches the pattern discussed in these two links, our
> attempts to reproduce the problem by pumping messages through at a rate
> millions of times higher than normal have not worked.
>
>
>
> We are using Flink version 1.13.5.
>
>
>
> Thanks,
>
> Mike
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-9268
>
> [2] https://www.mail-archive.com/user@flink.apache.org/msg34915.html
>
>
>
> Caused by: java.lang.Exception: Could not materialize checkpoint 49768 for
> operator KeyedProcess -> KeyedProcess -> re-operator-output -> Sink: Kafka
> sink to ec.platform.braid.responses-rtw (9/15)#0.
>
>                 at
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:257)
>
>                 ... 4 more
>
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.NegativeArraySizeException: -785722504
>
>                 at
> java.base/java.util.concurrent.FutureTask.report(Unknown Source)
>
>                 at java.base/java.util.concurrent.FutureTask.get(Unknown
> Source)
>
>                 at
> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:636)
>
>                 at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:54)
>
>                 at
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:128)
>
>                 ... 3 more
>
> Caused by: java.lang.NegativeArraySizeException: -785722504
>
>                 at org.rocksdb.RocksIterator.$$YJP$$value0(Native Method)
>
>                 at org.rocksdb.RocksIterator.value0(RocksIterator.java)
>
>                 at org.rocksdb.RocksIterator.value(RocksIterator.java:50)
>
>                 at
> org.apache.flink.contrib.streaming.state.RocksIteratorWrapper.value(RocksIteratorWrapper.java:103)
>
>                 at
> org.apache.flink.contrib.streaming.state.iterator.RocksSingleStateIterator.value(RocksSingleStateIterator.java:66)
>
>                 at
> org.apache.flink.contrib.streaming.state.iterator.RocksStatesPerKeyGroupMergeIterator.value(RocksStatesPerKeyGroupMergeIterator.java:202)
>
>                 at
> org.apache.flink.runtime.state.FullSnapshotAsyncWriter.writeKVStateData(FullSnapshotAsyncWriter.java:210)
>
>                 at
> org.apache.flink.runtime.state.FullSnapshotAsyncWriter.writeSnapshotToOutputStream(FullSnapshotAsyncWriter.java:107)
>
>                 at
> org.apache.flink.runtime.state.FullSnapshotAsyncWriter.get(FullSnapshotAsyncWriter.java:77)
>
>                 at
> org.apache.flink.runtime.state.SnapshotStrategyRunner$1.callInternal(SnapshotStrategyRunner.java:91)
>
>                 at
> org.apache.flink.runtime.state.SnapshotStrategyRunner$1.callInternal(SnapshotStrategyRunner.java:88)
>
>                 at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:78)
>
>                 at java.base/java.util.concurrent.FutureTask.run(Unknown
> Source)
>
>                 at
> org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:633)
>

Reply via email to