Re: Frequent Full GC's in case of FSStateBackend

Stefan Richter Fri, 10 Feb 2017 02:50:40 -0800

Hi,

FSStateBackend operates completely on-heap and only snapshots for checkpoints 
go against the file system. This is why the backend is typically faster for 
small states, but can become problematic for larger states. If your state 
exceeds a certain size, you should strongly consider to use RocksDB as backend. 
In particular, RocksDB also offers asynchronous snapshots which is very 
valuable to keep stream processing running for large state. RocksDB works on 
native memory/disk, so there is no GC to observe. For cases in which your state 
fits in memory but GC is a problem you could try using the G1 garbage collector 
which offers better performance for the FSStateBackend than the default.


Best,
Stefan


> Am 10.02.2017 um 11:16 schrieb Vinay Patil <vinay18.pa...@gmail.com>:
> 
> Hi,
> 
> I am doing performance test for my pipeline keeping FSStateBackend, I have 
> observed frequent Full GC's after processing 20M records.
> 
> When I did memory analysis using MAT, it showed that the many objects 
> maintained by Flink state are live.
> 
> Flink keeps the state in memory even after checkpointing , when does this 
> state gets removed / GC. (I am using window operator in which the DTO comes 
> as input)
> 
> Also why does Flink keep the state in memory after checkpointing ? 
> 
> P.S Using RocksDB is not causing Full GC at all.
> 
> Regards,
> Vinay Patil

Re: Frequent Full GC's in case of FSStateBackend

Reply via email to