From: John Groves <[email protected]> famfs is a shared, memory-mappable filesystem for disaggregated and fabric-attached memory such as CXL. Files map directly to dax memory without page-cache buffering, which lets multiple hosts share the same memory through per-host file mappings. See https://famfs.org for more information.
Putting a daxdev in famfs mode means binding it to fsdev_dax.ko (drivers/dax/fsdev.c). Finding a daxdev bound to fsdev_dax means it is in famfs mode. A test for this functionality is added in the next commit. With devdax, famfs, and system-ram modes, the previous logic that assumed 'not in mode X means in mode Y' needed to get slightly more complicated. Add explicit mode detection to libdaxctl: - daxctl_dev_is_famfs_mode(): check if bound to the fsdev_dax driver - daxctl_dev_is_devdax_mode(): check if bound to the device_dax driver - daxctl_dev_is_system_ram_mode(): check if bound to the kmem driver All three delegate to a shared static helper daxctl_dev_bound_to_module() to avoid duplicating the driver-symlink lookup. daxctl_dev_is_system_ram_mode() is the consistent name for the pre-existing daxctl_dev_is_system_ram_capable(), which becomes a thin compatibility wrapper so the existing ABI is preserved. Add daxctl_dev_get_mode(), which reports the current mode of a device (system-ram, devdax, famfs, or unknown) so callers can dispatch on a single value rather than chaining the predicates above. enum daxctl_dev_mode moves to the public header and gains a DAXCTL_DEV_MODE_UNKNOWN sentinel, and daxctl_dev_get_mode() is exported. Update mode transition logic in device.c: - the reconfig_mode_*() functions switch on daxctl_dev_get_mode() rather than repeating an if-else mode chain - disable_devdax_device() and disable_famfs_device() collapse into a single disable_mode_device(); the caller has already matched the mode - rename the local 'enum dev_mode' to 'enum reconfig_mode' (RECONFIG_MODE_*) so it no longer shares member names with enum daxctl_dev_mode - handle an unrecognized mode with an error instead of a wrong assumption While here, fix daxctl_dev_enable() to range-check mode before using it to index the dax_modules[] array: the mod_name lookup previously happened before the bounds check, allowing an out-of-bounds read for a negative or out-of-range mode. The lookup now runs only after mode is validated. Update json.c to report fsdev_dax-bound devices as 'famfs' mode. An unbound device continues to be reported as 'devdax' (the legacy default when no driver is bound), to preserve existing behavior. Document the famfs mode in Documentation/daxctl/daxctl-reconfigure-device.txt. Signed-off-by: John Groves <[email protected]> --- .../daxctl/daxctl-reconfigure-device.txt | 22 +++- daxctl/device.c | 113 +++++++++++++----- daxctl/json.c | 18 ++- daxctl/lib/libdaxctl-private.h | 9 +- daxctl/lib/libdaxctl.c | 73 +++++++++-- daxctl/lib/libdaxctl.sym | 9 ++ daxctl/libdaxctl.h | 14 +++ 7 files changed, 215 insertions(+), 43 deletions(-) diff --git a/Documentation/daxctl/daxctl-reconfigure-device.txt b/Documentation/daxctl/daxctl-reconfigure-device.txt index 09691d2..9c3922d 100644 --- a/Documentation/daxctl/daxctl-reconfigure-device.txt +++ b/Documentation/daxctl/daxctl-reconfigure-device.txt @@ -17,7 +17,9 @@ DESCRIPTION Reconfigure the operational mode of a dax device. This can be used to convert a regular 'devdax' mode device to the 'system-ram' mode which arranges for the -dax range to be hot-plugged into the system as regular memory. +dax range to be hot-plugged into the system as regular memory. A 'devdax' mode +device can also be converted to 'famfs' mode, which binds it to the fsdev_dax +driver for use by the famfs shared-memory filesystem (see https://famfs.org). NOTE: This is a destructive operation. Any data on the dax device *will* be lost. @@ -127,6 +129,19 @@ EXAMPLES } ---- +* Reconfigure dax0.0 (currently in devdax mode) to famfs mode +---- +# daxctl reconfigure-device --mode=famfs dax0.0 +[ + { + "chardev":"dax0.0", + "size":16777216000, + "target_node":2, + "mode":"famfs" + } +] +---- + * Reconfigure all dax devices on region0 to system-ram mode ---- # daxctl reconfigure-device --mode=system-ram --region=0 all @@ -205,6 +220,11 @@ include::region-option.txt[] kernel to support hot-unplugging 'kmem' based memory. If this is not available, a reboot is the only way to switch back to 'devdax' mode. + - "famfs": bind the device to the fsdev_dax driver for use by the famfs + shared-memory filesystem (https://famfs.org). The device must + currently be in "devdax" mode; converting directly from "system-ram" + is rejected. + -N:: --no-online:: By default, memory sections provided by system-ram devices will be diff --git a/daxctl/device.c b/daxctl/device.c index a4e36b1..47942f1 100644 --- a/daxctl/device.c +++ b/daxctl/device.c @@ -38,17 +38,18 @@ static struct { bool verbose; } param; -enum dev_mode { - DAXCTL_DEV_MODE_UNKNOWN, - DAXCTL_DEV_MODE_DEVDAX, - DAXCTL_DEV_MODE_RAM, +enum reconfig_mode { + RECONFIG_MODE_UNKNOWN, + RECONFIG_MODE_DEVDAX, + RECONFIG_MODE_RAM, + RECONFIG_MODE_FAMFS, }; struct mapping { unsigned long long start, end, pgoff; }; -static enum dev_mode reconfig_mode = DAXCTL_DEV_MODE_UNKNOWN; +static enum reconfig_mode reconfig_mode = RECONFIG_MODE_UNKNOWN; static long long align = -1; static long long size = -1; static unsigned long flags; @@ -463,13 +464,20 @@ static const char *parse_device_options(int argc, const char **argv, if (param.align) align = __parse_size64(param.align, &units); } else if (strcmp(param.mode, "system-ram") == 0) { - reconfig_mode = DAXCTL_DEV_MODE_RAM; + reconfig_mode = RECONFIG_MODE_RAM; } else if (strcmp(param.mode, "devdax") == 0) { - reconfig_mode = DAXCTL_DEV_MODE_DEVDAX; + reconfig_mode = RECONFIG_MODE_DEVDAX; if (param.no_online) { fprintf(stderr, "--no-online is incompatible with --mode=devdax\n"); - rc = -EINVAL; + rc = -EINVAL; + } + } else if (strcmp(param.mode, "famfs") == 0) { + reconfig_mode = RECONFIG_MODE_FAMFS; + if (param.no_online) { + fprintf(stderr, + "--no-online is incompatible with --mode=famfs\n"); + rc = -EINVAL; } } break; @@ -689,17 +697,10 @@ static int dev_destroy(struct daxctl_dev *dev) return 0; } -static int disable_devdax_device(struct daxctl_dev *dev) +static int disable_mode_device(struct daxctl_dev *dev) { - struct daxctl_memory *mem = daxctl_dev_get_memory(dev); - const char *devname = daxctl_dev_get_devname(dev); int rc; - if (mem) { - fprintf(stderr, "%s was already in system-ram mode\n", - devname); - return 1; - } rc = daxctl_dev_disable(dev); if (rc) { fprintf(stderr, "%s: disable failed: %s\n", @@ -724,11 +725,21 @@ static int reconfig_mode_system_ram(struct daxctl_dev *dev) } if (daxctl_dev_is_enabled(dev)) { - rc = disable_devdax_device(dev); - if (rc < 0) - return rc; - if (rc > 0) + switch (daxctl_dev_get_mode(dev)) { + case DAXCTL_DEV_MODE_RAM: + /* already in system-ram mode */ skip_enable = 1; + break; + case DAXCTL_DEV_MODE_FAMFS: + case DAXCTL_DEV_MODE_DEVDAX: + rc = disable_mode_device(dev); + if (rc) + return rc; + break; + default: + fprintf(stderr, "%s: unknown mode\n", devname); + return -EINVAL; + } } if (!skip_enable) { @@ -750,7 +761,7 @@ static int disable_system_ram_device(struct daxctl_dev *dev) int rc; if (!mem) { - fprintf(stderr, "%s was already in devdax mode\n", devname); + fprintf(stderr, "%s is not in system-ram mode\n", devname); return 1; } @@ -786,12 +797,26 @@ static int disable_system_ram_device(struct daxctl_dev *dev) static int reconfig_mode_devdax(struct daxctl_dev *dev) { + const char *devname = daxctl_dev_get_devname(dev); int rc; if (daxctl_dev_is_enabled(dev)) { - rc = disable_system_ram_device(dev); - if (rc) - return rc; + switch (daxctl_dev_get_mode(dev)) { + case DAXCTL_DEV_MODE_RAM: + rc = disable_system_ram_device(dev); + if (rc) + return rc; + break; + case DAXCTL_DEV_MODE_FAMFS: + case DAXCTL_DEV_MODE_DEVDAX: + rc = disable_mode_device(dev); + if (rc) + return rc; + break; + default: + fprintf(stderr, "%s: unknown mode\n", devname); + return -EINVAL; + } } rc = daxctl_dev_enable_devdax(dev); @@ -801,6 +826,37 @@ static int reconfig_mode_devdax(struct daxctl_dev *dev) return 0; } +static int reconfig_mode_famfs(struct daxctl_dev *dev) +{ + const char *devname = daxctl_dev_get_devname(dev); + int rc; + + if (daxctl_dev_is_enabled(dev)) { + switch (daxctl_dev_get_mode(dev)) { + case DAXCTL_DEV_MODE_RAM: + fprintf(stderr, + "%s is in system-ram mode; must be in devdax mode to convert to famfs\n", + devname); + return -EINVAL; + case DAXCTL_DEV_MODE_FAMFS: + case DAXCTL_DEV_MODE_DEVDAX: + rc = disable_mode_device(dev); + if (rc) + return rc; + break; + default: + fprintf(stderr, "%s: unknown mode\n", devname); + return -EINVAL; + } + } + + rc = daxctl_dev_enable_famfs(dev); + if (rc) + return rc; + + return 0; +} + static int do_create(struct daxctl_region *region, long long val, struct json_object **jdevs) { @@ -862,7 +918,7 @@ static int do_create(struct daxctl_region *region, long long val, return 0; } -static int do_reconfig(struct daxctl_dev *dev, enum dev_mode mode, +static int do_reconfig(struct daxctl_dev *dev, enum reconfig_mode mode, struct json_object **jdevs) { const char *devname = daxctl_dev_get_devname(dev); @@ -881,12 +937,15 @@ static int do_reconfig(struct daxctl_dev *dev, enum dev_mode mode, } switch (mode) { - case DAXCTL_DEV_MODE_RAM: + case RECONFIG_MODE_RAM: rc = reconfig_mode_system_ram(dev); break; - case DAXCTL_DEV_MODE_DEVDAX: + case RECONFIG_MODE_DEVDAX: rc = reconfig_mode_devdax(dev); break; + case RECONFIG_MODE_FAMFS: + rc = reconfig_mode_famfs(dev); + break; default: fprintf(stderr, "%s: unknown mode requested: %d\n", devname, mode); diff --git a/daxctl/json.c b/daxctl/json.c index 3cbce9d..8da91b1 100644 --- a/daxctl/json.c +++ b/daxctl/json.c @@ -46,10 +46,24 @@ struct json_object *util_daxctl_dev_to_json(struct daxctl_dev *dev, json_object_object_add(jdev, "align", jobj); } - if (mem) + switch (daxctl_dev_get_mode(dev)) { + case DAXCTL_DEV_MODE_RAM: jobj = json_object_new_string("system-ram"); - else + break; + case DAXCTL_DEV_MODE_FAMFS: + jobj = json_object_new_string("famfs"); + break; + case DAXCTL_DEV_MODE_DEVDAX: + default: + /* A device bound to device_dax is in devdax mode. A device with + * no driver bound (DAXCTL_DEV_MODE_UNKNOWN) is reported as devdax + * too, the legacy default (the disabled modifier is added later + * in this function if applicable). + */ jobj = json_object_new_string("devdax"); + break; + } + if (jobj) json_object_object_add(jdev, "mode", jobj); diff --git a/daxctl/lib/libdaxctl-private.h b/daxctl/lib/libdaxctl-private.h index ae45311..b902e3d 100644 --- a/daxctl/lib/libdaxctl-private.h +++ b/daxctl/lib/libdaxctl-private.h @@ -5,6 +5,8 @@ #include <libkmod.h> +#include <daxctl/libdaxctl.h> + #define DAXCTL_EXPORT __attribute__ ((visibility("default"))) enum dax_subsystem { @@ -18,15 +20,10 @@ static const char *dax_subsystems[] = { [DAX_BUS] = "/sys/bus/dax/devices", }; -enum daxctl_dev_mode { - DAXCTL_DEV_MODE_DEVDAX = 0, - DAXCTL_DEV_MODE_RAM, - DAXCTL_DEV_MODE_END, -}; - static const char *dax_modules[] = { [DAXCTL_DEV_MODE_DEVDAX] = "device_dax", [DAXCTL_DEV_MODE_RAM] = "kmem", + [DAXCTL_DEV_MODE_FAMFS] = "fsdev_dax", }; enum memory_op { diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index 02ae7e5..01b1915 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -385,13 +385,19 @@ static bool device_model_is_dax_bus(struct daxctl_dev *dev) return false; } -DAXCTL_EXPORT int daxctl_dev_is_system_ram_capable(struct daxctl_dev *dev) +/* + * Test whether @dev is bound to the driver named @mod_name. Returns false for + * a disabled (unbound) device: the DAX bus does not retain the previous driver + * binding after unbind, so a device's mode cannot be determined without a + * bound driver. + */ +static int daxctl_dev_bound_to_module(struct daxctl_dev *dev, const char *mod_name) { const char *devname = daxctl_dev_get_devname(dev); struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); const char *mod_base; char *mod_path; - char path[200]; + char path[PATH_MAX]; const int len = sizeof(path); if (!device_model_is_dax_bus(dev)) @@ -406,11 +412,13 @@ DAXCTL_EXPORT int daxctl_dev_is_system_ram_capable(struct daxctl_dev *dev) } mod_path = realpath(path, NULL); - if (!mod_path) + if (!mod_path) { + dbg(ctx, "%s: realpath failed for driver link\n", devname); return false; + } mod_base = path_basename(mod_path); - if (strcmp(mod_base, dax_modules[DAXCTL_DEV_MODE_RAM]) == 0) { + if (strcmp(mod_base, mod_name) == 0) { free(mod_path); return true; } @@ -419,6 +427,46 @@ DAXCTL_EXPORT int daxctl_dev_is_system_ram_capable(struct daxctl_dev *dev) return false; } +DAXCTL_EXPORT int daxctl_dev_is_system_ram_mode(struct daxctl_dev *dev) +{ + return daxctl_dev_bound_to_module(dev, dax_modules[DAXCTL_DEV_MODE_RAM]); +} + +/* + * Compatibility alias for daxctl_dev_is_system_ram_mode(), retained as part of + * the public ABI (LIBDAXCTL_10). Despite the name it tests the current driver + * binding, not a capability. + */ +DAXCTL_EXPORT int daxctl_dev_is_system_ram_capable(struct daxctl_dev *dev) +{ + return daxctl_dev_is_system_ram_mode(dev); +} + +DAXCTL_EXPORT int daxctl_dev_is_famfs_mode(struct daxctl_dev *dev) +{ + return daxctl_dev_bound_to_module(dev, dax_modules[DAXCTL_DEV_MODE_FAMFS]); +} + +DAXCTL_EXPORT int daxctl_dev_is_devdax_mode(struct daxctl_dev *dev) +{ + return daxctl_dev_bound_to_module(dev, dax_modules[DAXCTL_DEV_MODE_DEVDAX]); +} + +/* + * Report the current mode of a device, determined from its bound driver. + * A device with no driver bound returns DAXCTL_DEV_MODE_UNKNOWN. + */ +DAXCTL_EXPORT enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev) +{ + if (daxctl_dev_is_system_ram_mode(dev)) + return DAXCTL_DEV_MODE_RAM; + if (daxctl_dev_is_famfs_mode(dev)) + return DAXCTL_DEV_MODE_FAMFS; + if (daxctl_dev_is_devdax_mode(dev)) + return DAXCTL_DEV_MODE_DEVDAX; + return DAXCTL_DEV_MODE_UNKNOWN; +} + /* * This checks for the device to be in system-ram mode, so calling * daxctl_dev_get_memory() on a devdax mode device will always return NULL. @@ -433,7 +481,7 @@ static struct daxctl_memory *daxctl_dev_alloc_mem(struct daxctl_dev *dev) char buf[SYSFS_ATTR_SIZE]; int node_num; - if (!daxctl_dev_is_system_ram_capable(dev)) + if (!daxctl_dev_is_system_ram_mode(dev)) return NULL; mem = calloc(1, sizeof(*mem)); @@ -939,7 +987,7 @@ static int daxctl_dev_enable(struct daxctl_dev *dev, enum daxctl_dev_mode mode) struct daxctl_region *region = daxctl_dev_get_region(dev); const char *devname = daxctl_dev_get_devname(dev); struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); - const char *mod_name = dax_modules[mode]; + const char *mod_name; int rc; if (!device_model_is_dax_bus(dev)) { @@ -951,7 +999,13 @@ static int daxctl_dev_enable(struct daxctl_dev *dev, enum daxctl_dev_mode mode) if (daxctl_dev_is_enabled(dev)) return 0; - if (mode >= DAXCTL_DEV_MODE_END || mod_name == NULL) { + if (mode < 0 || mode >= DAXCTL_DEV_MODE_END) { + err(ctx, "%s: Invalid mode: %d\n", devname, mode); + return -EINVAL; + } + + mod_name = dax_modules[mode]; + if (mod_name == NULL) { err(ctx, "%s: Invalid mode: %d\n", devname, mode); return -EINVAL; } @@ -983,6 +1037,11 @@ DAXCTL_EXPORT int daxctl_dev_enable_ram(struct daxctl_dev *dev) return daxctl_dev_enable(dev, DAXCTL_DEV_MODE_RAM); } +DAXCTL_EXPORT int daxctl_dev_enable_famfs(struct daxctl_dev *dev) +{ + return daxctl_dev_enable(dev, DAXCTL_DEV_MODE_FAMFS); +} + DAXCTL_EXPORT int daxctl_dev_disable(struct daxctl_dev *dev) { const char *devname = daxctl_dev_get_devname(dev); diff --git a/daxctl/lib/libdaxctl.sym b/daxctl/lib/libdaxctl.sym index 3098811..43dd60b 100644 --- a/daxctl/lib/libdaxctl.sym +++ b/daxctl/lib/libdaxctl.sym @@ -104,3 +104,12 @@ LIBDAXCTL_10 { global: daxctl_dev_is_system_ram_capable; } LIBDAXCTL_9; + +LIBDAXCTL_11 { +global: + daxctl_dev_enable_famfs; + daxctl_dev_is_famfs_mode; + daxctl_dev_is_devdax_mode; + daxctl_dev_get_mode; + daxctl_dev_is_system_ram_mode; +} LIBDAXCTL_10; diff --git a/daxctl/libdaxctl.h b/daxctl/libdaxctl.h index 53c6bbd..7ec159e 100644 --- a/daxctl/libdaxctl.h +++ b/daxctl/libdaxctl.h @@ -72,12 +72,26 @@ int daxctl_dev_is_enabled(struct daxctl_dev *dev); int daxctl_dev_disable(struct daxctl_dev *dev); int daxctl_dev_enable_devdax(struct daxctl_dev *dev); int daxctl_dev_enable_ram(struct daxctl_dev *dev); +int daxctl_dev_enable_famfs(struct daxctl_dev *dev); int daxctl_dev_get_target_node(struct daxctl_dev *dev); int daxctl_dev_will_auto_online_memory(struct daxctl_dev *dev); int daxctl_dev_has_online_memory(struct daxctl_dev *dev); struct daxctl_memory; + +enum daxctl_dev_mode { + DAXCTL_DEV_MODE_UNKNOWN = -1, + DAXCTL_DEV_MODE_DEVDAX = 0, + DAXCTL_DEV_MODE_RAM, + DAXCTL_DEV_MODE_FAMFS, + DAXCTL_DEV_MODE_END, +}; + +int daxctl_dev_is_system_ram_mode(struct daxctl_dev *dev); int daxctl_dev_is_system_ram_capable(struct daxctl_dev *dev); +int daxctl_dev_is_famfs_mode(struct daxctl_dev *dev); +int daxctl_dev_is_devdax_mode(struct daxctl_dev *dev); +enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev); struct daxctl_memory *daxctl_dev_get_memory(struct daxctl_dev *dev); struct daxctl_dev *daxctl_memory_get_dev(struct daxctl_memory *mem); const char *daxctl_memory_get_node_path(struct daxctl_memory *mem); -- 2.53.0

