Re: RocksDB state on HDFS seems not being cleanned up

Till Rohrmann Tue, 05 Nov 2019 07:25:15 -0800

Hi Shuwen,

I think the problem is that you configured state ttl to clean up on full
snapshots which aren't executed when using RocksDB with incremental
snapshots. Instead you need to activate `cleanupInRocksdbCompactFilter`:


val ttlConfig = StateTtlConfig
  .newBuilder(Time.minutes(30)
  .updateTtlOnCreateAndWrite()
  .cleanupInBackground()
  .cleanupInRocksdbCompactFilter()

.setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp)

Cheers,
Till

On Tue, Nov 5, 2019 at 4:04 PM shuwen zhou <jaco...@gmail.com> wrote:

> Hi Jiayi,
> I understand that being shared folder means to store state of multiple
> checkpoints. I think that shared folder should only retain data across
> number “state.checkpoint.num-retained” checkpoints and remove outdated
> checkpoint, isn't it?
> In my case I doubt that outdated checkpoint's states wasn't cleaned up,
> which makes shared folder keep increasing even after TTL was passed.
>
>
> On Tue, 5 Nov 2019 at 21:13, bupt_ljy <bupt_...@163.com> wrote:
>
> > Hi Shuwen,
> >
> >
> > The “shared” means that the state files are shared among multiple
> > checkpoints, which happens when you enable incremental checkpointing[1].
> > Therefore, it’s reasonable that the size keeps growing if you set
> > “state.checkpoint.num-retained” to be a big value.
> >
> >
> > [1]
> >
> https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html
> >
> >
> > Best,
> > Jiayi Liao
> >
> >
> >  Original Message
> > Sender: shuwen zhou<jaco...@gmail.com>
> > Recipient: dev<dev@flink.apache.org>
> > Date: Tuesday, Nov 5, 2019 17:59
> > Subject: RocksDB state on HDFS seems not being cleanned up
> >
> >
> > Hi Community, I have a job running on Flink1.9.0 on YARN with rocksDB on
> > HDFS with incremental checkpoint enabled. I have some MapState in code
> with
> > following config: val ttlConfig = StateTtlConfig
> > .newBuilder(Time.minutes(30) .updateTtlOnCreateAndWrite()
> > .cleanupInBackground() .cleanupFullSnapshot()
> >
> .setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp)
> > After running for around 2 days, I observed checkpoint folder is showing
> > 44.4 M /flink-chk743e4568a70b626837b/chk-40 65.9 M
> > /flink-chk743e4568a70b626837b/chk-41 91.7 M
> > /flink-chk743e4568a70b626837b/chk-42 96.1 M
> > /flink-chk743e4568a70b626837b/chk-43 48.1 M
> > /flink-chk743e4568a70b626837b/chk-44 71.6 M
> > /flink-chk743e4568a70b626837b/chk-45 50.9 M
> > /flink-chk743e4568a70b626837b/chk-46 90.2 M
> > /flink-chk743e4568a70b626837b/chk-37 49.3 M
> > /flink-chk743e4568a70b626837b/chk-38 96.9 M
> > /flink-chk743e4568a70b626837b/chk-39 797.9 G
> > /flink-chk743e4568a70b626837b/shared The ./shared folder size seems
> > continuing increasing and seems the folder is not being clean up. However
> > while I disabled incremental cleanup, the expired full snapshot will be
> > removed automatically. Is there any way to remove outdated state on HDFS
> to
> > stop it from increasing? Thanks. -- Best Wishes, Shuwen Zhou
>
>
>
> --
> Best Wishes,
> Shuwen Zhou
>

Re: RocksDB state on HDFS seems not being cleanned up

Reply via email to