Hi Juergen,
On 22/02/2023 08:36, Juergen Gross wrote:
On 21.02.23 23:36, Julien Grall wrote:
Hi Juergen,
On 21/02/2023 08:10, Juergen Gross wrote:
On 20.02.23 19:01, Julien Grall wrote:
So I have recreated an XTF test which I think match what you wrote [1].
It is indeed failing without your patch. But then there are still
some weird behavior here.
I would expect the creation of the node would also fail if instead
of removing the node if removed outside of the transaction.
This is not the case because we are looking at the current quota. So
shouldn't we snapshot the global count?
As we don't do a global snapshot of the data base for a transaction
(this was
changed due to huge memory needs, bad performance, and a higher
transaction
failure rate),
I am a bit surprised that the only way to do proper transaction is to
have a global snapshot. Instead, you could have an overlay.
I didn't say that a global snapshot is the only way. And we are using an
overlay.
I don't think we should snapshot the count either.
But that would mean that the quota will change depending on
modification of the global database while the transaction is inflight.
I really don't see the problem with that. But it seems our views are
different
in this case.
See below.
I guess this is not better nor worse that the current situation. But
it is still really confusing for a client because:
1) The error could happen at random point
Yes, like without a transaction.
2) You may see an inconsistent database as nodes are only cached
when they are first accessed
It isn't inconsistent at all. The same could happen if such other nodes are
added/modified/removed just a microsecond before you start the transaction.
You won't be able to tell the difference. You can only reason about nodes
being accessed in the transaction, and those are transaction-local.
I am not talking about the case a node is added/modified/removed outside
of a transaction. I am talking about the in-transaction case. For
example, let say we have a transaction A that remove node 1, 2 and
transaction B to access 1, 2 (it may do more).
Now let's imagine the following sequence with the initial state is node
1 and 2 exists:
- TA started
- TA remove 1
- TA remove 2
- TA remove 3
- TB started
- TB read 1
- TA ended
- TB read 2
With the above, one would expect that transaction B can read 2 as
transaction A didn't commit before B started. But this is not what's
happening.
My point here is that a protocol could require that if 1 is present then
2 is. So it would be valid for a client to error out because the other
side was considered to have misbehaved. However, here this is just how
Xenstored behave and AFAICT it is undocumented.
Cheers,
--
Julien Grall