WAL enable/disable does not work on unstable topology - removal or warning

Ilya Kasnacheev Wed, 20 Jan 2021 08:28:41 -0800

Hello!

We had this feature for a few versions, where you could do gnite.cluster().
disableWal() to temporarily disable WAL on a specific cache, involving a
PME and checkpoint on every node.


However, it became apparent that you cannot enable or disable WAL on any
kind of unstable topology, at all:
https://issues.apache.org/jira/browse/IGNITE-13976

You cannot even disable WAL while a baseline node is offline: When it comes
back, it will not sync its WAL enabled status with the rest of the cluster,
and all subsequent "WAL enable" or "WAL disable" operations will fail on
that cache, with no clear way to recover this cache:

ignite.close();
client.cluster().disableWal(CACHE_NAME);
nodes.add(Ignition.start(igniteCfg(false, consistentId)));
client.cluster().enableWal(CACHE_NAME); // will fail

Even if this simple scenario is fixed, it seems that there are multiple
failure scenarios if you try to add or remove a node in the middle of WAL
state change operation. It does not seem that we have any expertise in wal
disable/enable implementation right now, and I did not find a simple way of
fixing it short of a full rewrite.

Therefore, I propose that we should *(a) disable that feature* in 2.10 or*
(b) give a clear warning *when it is used, and also mention in the
documentation that it may only be used on stable topology.

We may also want to re-mark this feature's API as @IgniteExperimental.
I have lifted this ticket to be a Blocker.

WDYT?

Regards,

WAL enable/disable does not work on unstable topology - removal or warning

Reply via email to