Great, thanks for the follow up.

Best Regards,
Yu


On Mon, 21 Sep 2020 at 15:04, Juha Mynttinen <juha.myntti...@king.com>
wrote:

> Good,
>
> I opened this JIRA for the issue
> https://issues.apache.org/jira/browse/FLINK-19303. The discussion can be
> moved there.
>
> Regards,
> Juha
> ------------------------------
> *From:* Yu Li <car...@gmail.com>
> *Sent:* Friday, September 18, 2020 3:58 PM
> *To:* Juha Mynttinen <juha.myntti...@king.com>
> *Cc:* user@flink.apache.org <user@flink.apache.org>
> *Subject:* Re: Disable WAL in RocksDB recovery
>
> Thanks for bringing this up Juha, and good catch.
>
> We actually are disabling WAL for routine writes by default when using
> RocksDB and never encountered segment fault issues. However, from history
> in FLINK-8922, segment fault issue occurs during restore if WAL is
> disabled, so I guess the root cause lies in RocksDB batch write
> (org.rocksdb.WriteBatch). And IMHO this is a RocksDB bug (it should work
> well when WAL is disabled, no matter under single or batch write).
>
> +1 for opening a new JIRA to figure the root cause out, fix it and disable
> WAL during restore by default (maybe checking the fixes around WriteBatch
> in later RocksDB versions could help locate the issue more quickly), and
> thanks for volunteering taking the efforts. I will follow up and help
> review if any findings / PR submission.
>
> Best Regards,
> Yu
>
>
> On Wed, 16 Sep 2020 at 13:58, Juha Mynttinen <juha.myntti...@king.com>
> wrote:
>
> Hello there,
>
> I'd like to bring to discussion a previously discussed topic - disabling
> WAL in RocksDB recovery.
>
> It's clear that WAL is not needed during the process, the reason being
> that the WAL is never read, so there's no need to write it.
>
> AFAIK the last thing that was done with WAL during recovery is an attempt
> to remove it and later reverting that removal 
> (https://issues.apache.org/jira/browse/FLINK-8922
> [issues.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D8922&d=DwMFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=-2x4lRPm2yEX3Ylri2jKFRC6zr9S6Iqg2kAJIspWwfA&m=AxIzKYnvz1WPfhVBb3h7dasyjYw21mR3x-cuBH3L3Ww&s=EFZry0q99qolXx6Ml-joOUoVEBQXgvsvTg5Ww0Y8ha8&e=>).
> If I interpret the comments in the ticket correctly, what happened was that
> a) WAL was kept in the recovery, 2) it's unknown why removing WAL causes
> segfault.
>
> What can be seen in the ticket is that having WAL causes a significant
> performance penalty. Thus, getting rid of WAL would be a very nice
> performance improvement. I think it'd be worth to creating a new JIRA
> ticket at least as a reminder that WAL should be removed?
>
> I'm planning adding an experimental flag to remove WAL in the environment
> I'm using Flink and trying it out. If the flag is made configurable, WAL
> can always be re-enabled if removing it causes issues.
>
> Thoughts?
>
> Regards,
> Juha
>
>

Reply via email to