I guess there's no point in making it a KeyedProcessFunction since it's not
going to have access to context, timers or anything like that. So it can be
a simple InputFormat returning a DataSet of key and value tuples.
On Wed, Mar 17, 2021 at 8:37 AM Andrey Bulgakov wrote:
> Hi Gordon,
>
> I thin
Hi Gordon,
I think my current implementation is very specific and wouldn't be that
valuable for the broader public.
But I think there's a potential version of it that could also retrieve
values from a savepoint in the same efficient way and that would be
something that other people might need.
I'
Hi Andrey,
Perhaps the functionality you described is worth adding to the State
Processor API.
Your observation on how the library currently works is correct; basically it
tries to restore the state backends as is.
In you current implementation, do you see it worthwhile to try to add this?
Cheer
If anyone is interested, I reliazed that State Processor API was not the
right tool for this since it spends a lot of time rebuilding RocksDB tables
and then a lot of memory trying to read from it. All I really needed was
operator keys.
So I used SavepointLoader.loadSavepointMetadata to get KeyGro
Hi all,
I'm trying to use the State Processor API to extract all keys from a
RocksDB savepoint produced by an operator in a Flink streaming job into CSV
files.
The problem is that the storage size of the savepoint is 30TB and I'm
running into garbage collection issues no matter how much memory in