On 2024-10-05 18:04, Peter Zijlstra wrote:
On Fri, Oct 04, 2024 at 02:27:33PM -0400, Mathieu Desnoyers wrote:
  include/linux/hp.h | 158 +++++++++++++++++++++++++++++++++++++++++++++
  kernel/Makefile    |   2 +-
  kernel/hp.c        |  46 +++++++++++++
  3 files changed, 205 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/hp.h
  create mode 100644 kernel/hp.c

diff --git a/include/linux/hp.h b/include/linux/hp.h
new file mode 100644
index 000000000000..e85fc4365ea2
--- /dev/null
+++ b/include/linux/hp.h
@@ -0,0 +1,158 @@
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers 
<mathieu.desnoy...@efficios.com>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+#ifndef _LINUX_HP_H
+#define _LINUX_HP_H
+
+/*
+ * HP: Hazard Pointers
+ *
+ * This API provides existence guarantees of objects through hazard
+ * pointers.
+ *
+ * It uses a fixed number of hazard pointer slots (nr_cpus) across the
+ * entire system for each HP domain.
+ *
+ * Its main benefit over RCU is that it allows fast reclaim of
+ * HP-protected pointers without needing to wait for a grace period.
+ *
+ * It also allows the hazard pointer scan to call a user-defined callback
+ * to retire a hazard pointer slot immediately if needed. This callback
+ * may, for instance, issue an IPI to the relevant CPU.
+ *
+ * References:
+ *
+ * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
+ *      lock-free objects," in IEEE Transactions on Parallel and
+ *      Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
+ */
+
+#include <linux/rcupdate.h>
+
+/*
+ * Hazard pointer slot.
+ */
+struct hp_slot {
+       void *addr;
+};
+
+/*
+ * Hazard pointer context, returned by hp_use().
+ */
+struct hp_ctx {
+       struct hp_slot *slot;
+       void *addr;
+};
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is NULL, wait to observe that each slot contains a value
+ * that differs from @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+            void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr));

struct hp_domain {
        struct hp_slot __percpu *slots
};

might clarify things a wee little.

Good point. This introduces:

#define DECLARE_HP_DOMAIN(domain)                                       \
        extern struct hp_domain domain

#define DEFINE_HP_DOMAIN(domain)                                        \
        static DEFINE_PER_CPU(struct hp_slot, __ ## domain ## _slots);  \
        struct hp_domain domain = {                                     \
                .percpu_slots = &__## domain ## _slots,                 \
        }


+
+/* Get the hazard pointer context address (may be NULL). */
+static inline
+void *hp_ctx_addr(struct hp_ctx ctx)
+{
+       return ctx.addr;
+}

 From where I'm sitting this seems like superfluous fluff, what's wrong
with ctx.addr ?

I'm OK removing the accessor and just using ctx.addr.


+/*
+ * hp_allocate: Allocate a hazard pointer.
+ *
+ * Allocate a hazard pointer slot for @addr. The object existence should
+ * be guaranteed by the caller. Expects to be called from preempt
+ * disable context.
+ *
+ * Returns a hazard pointer context.

So you made the WTF'o'meter crack, this here function does not allocate
nothing. Naming is bad. At best this is something like
try-set-hazard-pointer or somesuch.

I went with the naming from the 2004 paper from Maged Michael, but I
agree it could be clearer.

I'm tempted to go for "hp_try_post()" and "hp_remove()", basically
"posting" the intent to use a pointer (as in on a metaphorical billboard),
and removing it when it's done.


+ */
+static inline
+struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr)
+{
+       struct hp_slot *slot;
+       struct hp_ctx ctx;
+
+       if (!addr)
+               goto fail;
+       slot = this_cpu_ptr(percpu_slots);
+       /*
+        * A single hazard pointer slot per CPU is available currently.
+        * Other hazard pointer domains can eventually have a different
+        * configuration.
+        */
+       if (READ_ONCE(slot->addr))
+               goto fail;
+       WRITE_ONCE(slot->addr, addr);        /* Store B */
+       ctx.slot = slot;
+       ctx.addr = addr;
+       return ctx;
+
+fail:
+       ctx.slot = NULL;
+       ctx.addr = NULL;
+       return ctx;
+}
+
+/*
+ * hp_dereference_allocate: Dereference and allocate a hazard pointer.
+ *
+ * Returns a hazard pointer context. Expects to be called from preempt
+ * disable context.
+ */

More terrible naming. Same as above, but additionally, I would expect a
'dereference' to actually dereference the pointer and have a return
value of the dereferenced type.

hp_dereference_try_post() ?


This function seems to double check and update the hp_ctx thing. I'm not
at all sure yet wtf this is doing -- and the total lack of comments
aren't helping.

The hp_ctx contains the outputs.

The function loads *addr_p to then try_post it into a HP slot. On success,
it re-reads the *addr_p (with address dependency) and if it still matches,
use that as output address pointer.

I'm planning to remove hp_ctx, and just have:

/*
 * hp_try_post: Try to post a hazard pointer.
 *
 * Post a hazard pointer slot for @addr. The object existence should
 * be guaranteed by the caller. Expects to be called from preempt
 * disable context.
 *
 * Returns true if post succeeds, false otherwise.
 */
static inline
bool hp_try_post(struct hp_domain *hp_domain, void *addr, struct hp_slot 
**_slot)
[...]

/*
 * hp_dereference_try_post: Dereference and try to post a hazard pointer.
 *
 * Returns a hazard pointer context. Expects to be called from preempt
 * disable context.
 */
static inline
void *__hp_dereference_try_post(struct hp_domain *hp_domain,
                                void * const * addr_p, struct hp_slot **_slot)
[...]

#define hp_dereference_try_post(domain, p, slot_p)              \
        ((__typeof__(*(p))) __hp_dereference_try_post(domain, (void * const *) 
p, slot_p))

/* Clear the hazard pointer in @slot. */
static inline
void hp_remove(struct hp_slot *slot)
[...]


+static inline
+struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, 
void * const * addr_p)
+{
+       void *addr, *addr2;
+       struct hp_ctx ctx;
+
+       addr = READ_ONCE(*addr_p);
+retry:
+       ctx = hp_allocate(percpu_slots, addr);
+       if (!hp_ctx_addr(ctx))
+               goto fail;
+       /* Memory ordering: Store B before Load A. */
+       smp_mb();
+       /*
+        * Use RCU dereference without lockdep checks, because
+        * lockdep is not aware of HP guarantees.
+        */
+       addr2 = rcu_access_pointer(*addr_p);    /* Load A */
+       /*
+        * If @addr_p content has changed since the first load,
+        * clear the hazard pointer and try again.
+        */
+       if (!ptr_eq(addr2, addr)) {
+               WRITE_ONCE(ctx.slot->addr, NULL);
+               if (!addr2)
+                       goto fail;
+               addr = addr2;
+               goto retry;
+       }
+       /*
+        * Use addr2 loaded from rcu_access_pointer() to preserve
+        * address dependency ordering.
+        */
+       ctx.addr = addr2;
+       return ctx;
+
+fail:
+       ctx.slot = NULL;
+       ctx.addr = NULL;
+       return ctx;
+}
+
+/* Retire the hazard pointer in @ctx. */
+static inline
+void hp_retire(const struct hp_ctx ctx)
+{
+       smp_store_release(&ctx.slot->addr, NULL);
+}
+
+#endif /* _LINUX_HP_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 3c13240dfc9f..ec16de96fa80 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -7,7 +7,7 @@ obj-y     = fork.o exec_domain.o panic.o \
            cpu.o exit.o softirq.o resource.o \
            sysctl.o capability.o ptrace.o user.o \
            signal.o sys.o umh.o workqueue.o pid.o task_work.o \
-           extable.o params.o \
+           extable.o params.o hp.o \
            kthread.o sys_ni.o nsproxy.o \
            notifier.o ksysfs.o cred.o reboot.o \
            async.o range.o smpboot.o ucount.o regset.o ksyms_common.o
diff --git a/kernel/hp.c b/kernel/hp.c
new file mode 100644
index 000000000000..b2447bf15300
--- /dev/null
+++ b/kernel/hp.c
@@ -0,0 +1,46 @@
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers 
<mathieu.desnoy...@efficios.com>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+/*
+ * HP: Hazard Pointers
+ */
+
+#include <linux/hp.h>
+#include <linux/percpu.h>
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ * Wait to observe that each slot contains a value that differs from
+ * @addr before returning.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+            void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr))
+{
+       int cpu;
+
+       /*
+        * Store A precedes hp_scan(): it unpublishes addr (sets it to
+        * NULL or to a different value), and thus hides it from hazard
+        * pointer readers.
+        */
+
+       if (!addr)
+               return;
+       /* Memory ordering: Store A before Load B. */
+       smp_mb();
+       /* Scan all CPUs slots. */
+       for_each_possible_cpu(cpu) {
+               struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu);
+
+               if (retire_cb && smp_load_acquire(&slot->addr) == addr)  /* 
Load B */
+                       retire_cb(cpu, slot, addr);

Is retirce_cb allowed to cmpxchg the thing?

It could, but we'd need to make sure the slot is not re-used by another
hp_try_post() before the current user removes its own post. It would
need to synchronize with the current HP user (e.g. with IPI).

I've actually renamed retire_cb to "on_match_cb".


+               /* Busy-wait if node is found. */
+               while ((smp_load_acquire(&slot->addr)) == addr)  /* Load B */
+                       cpu_relax();

This really should be using smp_cond_load_acquire()

Good point,

Thanks,

Mathieu


+       }
+}

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Reply via email to