On 12/24/21 06:37, Kyotaro Horiguchi wrote:
At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra 
<tomas.von...@enterprisedb.com> wrote in
On 12/23/21 15:42, Fujii Masao wrote:
On 2021/12/23 3:49, Tomas Vondra wrote:
Attached is a patch tweaking WAL logging - in wal_level=minimal we do
the same thing as now, in higher levels we log every sequence fetch.
Thanks for the patch!
With the patch, I found that the regression test for sequences failed.
+            fetch = log = fetch;
This should be "log = fetch"?
On second thought, originally a sequence doesn't guarantee that the
value already returned by nextval() will never be returned by
subsequent nextval() after the server crash recovery. That is,
nextval() may return the same value across crash recovery. Is this
understanding right? For example, this case can happen if the server
crashes after nextval() returned the value but before WAL for the
sequence was flushed to the permanent storage.

I think the important step is commit. We don't guarantee anything for
changes in uncommitted transactions. If you do nextval in a
transaction and the server crashes before the WAL gets flushed before
COMMIT, then yes, nextval may generate the same nextval again. But
after commit that is not OK - it must not happen.

I don't mean to stand on Fujii-san's side particularly, but it seems
to me sequences of RDBSs are not rolled back generally.  Some googling
told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
server doesn't rewind sequences at rollback, that is, sequences are
incremented independtly from transaction control.  It seems common to
think that two nextval() calls for the same sequence must not return
the same value in any context.


Yes, sequences are not rolled back on abort generally. That would require much stricter locking, and that'd go against using sequences in concurrent sessions.

But we're not talking about sequence rollback - we're talking about data loss, caused by failure to flush WAL for a sequence. But that affects the *current* code too, and to much greater extent.

Consider this:

BEGIN;
SELECT nextval('s') FROM generate_series(1,1000) s(i);
ROLLBACK; -- or crash of a different backend

BEGIN;
SELECT nextval('s');
COMMIT;

With the current code, this may easily lose the WAL, and we'll generate duplicate values from the sequence. We pretty much ignore the COMMIT.

With the proposed change to WAL logging, that is not possible. The COMMIT flushes enough WAL to prevent this issue.

So this actually makes this issue less severe.

Maybe I'm missing some important detail, though. Can you show an example where the proposed changes make the issue worse?

So it's not a bug that sync standby may return the same value as
already returned in the primary because the corresponding WAL has not
been replicated yet, isn't it?


No, I don't think so. Once the COMMIT happens (and gets confirmed by
the sync standby), it should be possible to failover to the sync
replica without losing any data in committed transaction. Generating
duplicate values is a clear violation of that.

So, strictly speaking, that is a violation of the constraint I
mentioned regardless whether the transaction is committed or
not. However we have technical limitations as below.


I don't follow. What violates what?

If the transaction commits (and gets a confirmation from sync replica), the modified WAL logging prevents duplicate values. It does nothing for uncommitted transactions. Seems like an improvement to me.

IMHO the fact that we allow a transaction to commit (even just
locally) without flushing all the WAL it depends on is clearly a data
loss bug.

BTW, if the returned value is stored in database, the same value is
guaranteed not to be returned again after the server crash or by sync
standby. Because in that case the WAL of the transaction storing that
value is flushed and replicated.


True, assuming the table is WAL-logged etc. I agree the issue may be
affecting a fairly small fraction of workloads, because most people
use sequences to generate data for inserts etc.

It seems to me, from the fact that sequences are designed explicitly
untransactional and that behavior is widely adopted, the discussion
might be missing some significant use-cases.  But there's a
possibility that the spec of sequence came from some technical
limitation in the past, but I'm not sure..


No idea. IMHO from the correctness / behavior point of view, the modified logging is an improvement. The only issue is the additional overhead, and I think the cache addresses that quite well.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to