Gabor Somogyi created FLINK-36530:
-------------------------------------
Summary: Not able to restore list state from S3
Key: FLINK-36530
URL: https://issues.apache.org/jira/browse/FLINK-36530
Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Affects Versions: 1.19.1, 1.20.0, 1.18.1, 2.0.0
Reporter: Gabor Somogyi
FLINK-34063 has fixed an important issue with compacted state but introduced
super slow state recovery for both non-compacted and compacted list states from
S3.
Short statement: ~6Mb list state generated from
{code:java}
org.apache.flink.connector.file.sink.compactor.operator.CompactCoordinator{code}
restore time is ~62 hours.
Detailed analysis:
During file sink compaction CompactCoordinator with parallelism 1 is collecting
the file list which needs to be compacted (and writes them into the state). In
the problematic scenario the list list size was ~15k entries.
OperatorStateRestoreOperation.deserializeOperatorStateValues gets an offeset
for each and every list entry and does basically the following:
{code:java}
for (long offset : offsets) {
in.seek(offset);
stateListForName.add(serializer.deserialize(div));
}{code}
CompressibleFSDataInputStream.seek has introduced the following code:
{code:java}
final int available = compressingDelegate.available();
if (available > 0) {
if (available != compressingDelegate.skip(available)) {
throw new IOException("Unable to skip buffered data.");
}
}
{code}
There are 2 problems with the mentioned code part:
* The skip operation is not needed for uncompressed state
* skip takes ~15 seconds for ~6Mb in case of S3 (which ends up in ~62 hours
restore time)
We've already addressed the first issue with a simple if condition but the
second is definitely a harder one. Until the latter is not resolved I would say
that compressed state is not a good choice together with S3 and list restoral.
Steps to reproduce:
* Create a list operator state with several thousand entries
* Put it to S3
* Try to restore it from Flink
--
This message was sent by Atlassian Jira
(v8.20.10#820010)