Thanks for your feedback, I will update the patch in the next few days, addressing the comments and reorganizing classify_object_over_fdes. Concerning your question:

+
+restart:
+  struct btree_node *iter;
+  uintptr_t lock;
+  {
+    // Accessing the root node requires defending against concurrent pointer +    // changes Thus we couple rootLock -> lock on root node -> validate rootLock

Can you elaborate a bit more in the comment on why it's necessary and sufficient to validate the containing node after "locking" the child node?

+    if (!version_lock_lock_optimistic (&(t->root_lock), &lock))
+      goto restart;
+    iter = RLOAD (t->root);
+    if (!version_lock_validate (&(t->root_lock), lock))
+      goto restart;
+    if (!iter)
+      return NULL;
+    uintptr_t child_lock;
+    if ((!btree_node_lock_optimistic (iter, &child_lock))
+    || (!version_lock_validate (&(t->root_lock), lock)))
+      goto restart;
+    lock = child_lock;
+  }
+
+  // Now we can walk down towards the right leaf node

We are reading here without explicit locks, i.e., memory can change behind our back at any time. Thus, before we act on a value that we have read, we must check if that value is still valid.
The invariants that we rely upon are:
I1) a node pointer will always be either a nullptr or point to a b-tree node. The node might have been logically deleted, but the memory will always remain valid due to our free list. I2) The root pointer is only modified by a thread that holds an exclusive lock on root_lock. I3) valide calls fail if a) somebody else currently holds the lock, or b) if the version value has changed since locking, which happens when somebody releases an exlusive lock.

Using that, the code does the following steps

1. We lock root_lock optimistic. In case of an exclusive lock we restart immediately, otherwise we remember the current version 2. We load the root pointer and store it in iter. iter might contain arbitrary garbage due to races 3. We validate the root_lock, and restart in case of errors. Now we know that a) nobody has locked the root exclusively or modified it since step 1 (I3), and b) iter indeed points to the node that corresponds the to the version that we have read in step 1 (due to I2)

At this point we now that iter is indeed the right root pointer. If it is a nullptr we stop here. Otherwise we continue with

4. We lock the root node itself (iter), storing the validated version in child_lock. We need that lock to detect changes to the content of the root, the root_lock only protects the pointer to the root. Loading the lock is safe due to I1. As our tree can change at any point in time, we need another valide call on root_lock to make sure that we still have the right root pointer.


At this point we have both a pointer to the root node and an optimistic lock that allows us to detect changes to the content of the root. Thus, we can safely navigate down, any changes to the b-tree that could affect us, like, e.g., node splits in the root, will be detected when validating the lock.


I hope that answered both the necessary and sufficient part. As a general rule, you must always use the pair relax-atomic-read + validate to avoid data races. Acting on any value that you have not validated is unsafe, as you could get races.

Best

Thomas

Reply via email to