On Wed, Jan 08, 2025 at 03:57:49PM -0600, Ben Cheatham wrote:
> Add inject-error command for injecting CXL errors into CXL devices.
> The command currently only has support for injecting CXL protocol
> errors into CXL downstream ports via EINJ.

Hi Ben,
I went through enough to give you some feedback for your v1.
Just ran out of time and didn't want to keep you waiting any longer.

wrt 'currently only has' - what is the grander scope of this that we
might see in the future. Will there only every be one system wide
response to --list-errors or will there be error types per type of
port.

Spec reference?

> 
> The command takes an error type and injects an error of that type into the
> specified downstream port. Downstream ports can be specified using the
> port's device name with the -d option. Available error types can be obtained
> by running "cxl inject-error --list-errors".
> 
> This command requires the kernel to be built with CONFIG_DEBUGFS and
> CONFIG_ACPI_APEI_EINJ_CXL enabled. It also requires root privileges to
> run due to reading from <debugfs>/cxl/einj_types and writing to
> <debugfs>/cxl/<dport>/einj_inject.
> 
> Example usage:
>     # cxl inject-error --list-errors
>     cxl.mem_correctable
>     cxl.mem_fatal
>     ...
>     # cxl inject-error -d 0000:00:01.1 cxl.mem_correctable
>     injected cxl.mem_correctable protocol error
>

I'll probably ask this again later on, but how does user see
list of downstream ports. Does user really think -d dport,
or do they think -d device-name, or ?
Man page will help here.

We don't have cxl-cli support for poison inject, but to future
proof this, let's think about the naming.

Please split the patch up at least into 2 -
libcxl: Add APIs supporting CXL protocol error injection
        include updates to Documentation/cxl/lib/libcxl.txt

cxl: add {list,inject}-protocol-error' commands
     include man page updates, additiona

The 'list-errors' is the the system level error support, right?
Could that fit in the existing 'cxl list' hierachy?
Would they be a property of the bus?

I don't know that 'protocol' is best name, but seems it needs
another descriptor.


> Signed-off-by: Ben Cheatham <benjamin.cheat...@amd.com>
> ---
>  cxl/builtin.h      |   1 +
>  cxl/cxl.c          |   1 +
>  cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++
>  cxl/lib/libcxl.c   |  53 +++++++++++++
>  cxl/lib/libcxl.sym |   2 +
>  cxl/libcxl.h       |  13 ++++
>  cxl/meson.build    |   1 +
>  7 files changed, 259 insertions(+)
>  create mode 100644 cxl/inject-error.c
> 
> diff --git a/cxl/builtin.h b/cxl/builtin.h
> index c483f30..e82fcb5 100644
> --- a/cxl/builtin.h
> +++ b/cxl/builtin.h
> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct 
> cxl_ctx *ctx);
>  int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>  int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>  int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
>  #ifdef ENABLE_LIBTRACEFS
>  int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
>  #else
> diff --git a/cxl/cxl.c b/cxl/cxl.c
> index 1643667..f808926 100644
> --- a/cxl/cxl.c
> +++ b/cxl/cxl.c
> @@ -79,6 +79,7 @@ static struct cmd_struct commands[] = {
>       { "enable-region", .c_fn = cmd_enable_region },
>       { "disable-region", .c_fn = cmd_disable_region },
>       { "destroy-region", .c_fn = cmd_destroy_region },
> +     { "inject-error", .c_fn = cmd_inject_error },
>       { "monitor", .c_fn = cmd_monitor },
>  };
>  
> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
> new file mode 100644

Can this fit in an existing file?  port.c ?

I didn't review this file yet, so snipping.


> index 0000000..3645934
> --- /dev/null
> +++ b/cxl/inject-error.c

snip

> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index 91eedd1..8174c11 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -3179,6 +3179,59 @@ CXL_EXPORT int cxl_dport_get_id(struct cxl_dport 
> *dport)
>       return dport->id;
>  }
>  
> +CXL_EXPORT int cxl_dport_inject_proto_err(struct cxl_dport *dport,
> +                                       enum cxl_proto_error_types perr,
> +                                       const char *debugfs)
> +{
> +     struct cxl_port *port = cxl_dport_get_port(dport);
> +     size_t path_len = strlen(debugfs) + 24;

What's the path_len math, +24 here and +16 in next fcn.
I notice other path calloc's in this file padding more, ie +100 or +50.


> +     struct cxl_ctx *ctx = port->ctx;
> +     char buf[32];
> +     char *path;
> +     int rc;
> +
> +     if (!dport->dev_path) {
> +             err(ctx, "no dev_path for dport\n");
> +             return -EINVAL;
> +     }
> +
> +     path_len += strlen(dport->dev_path);
> +     path = calloc(1, path_len);
> +     if (!path)
> +             return -ENOMEM;
> +
> +     snprintf(path, path_len, "%s/cxl/%s/einj_inject", debugfs,
> +              cxl_dport_get_devname(dport));
> +
> +     snprintf(buf, sizeof(buf), "0x%lx\n", (u64) perr);

Here, and in cxl_get_proto_errors(), can we check for the path and
fail with 'unsupported' if it doesn't exist. That'll tell folks
if feature is not configured, or not enabled ?
Are kernel configured and port enabled 2 different levels?

There's an example in this file w "if (access(path, F_OK) != 0)"


> +     rc = sysfs_write_attr(ctx, path, buf);
> +     if (rc)
> +             err(ctx, "could not write to %s: %d\n", path, rc);

for 'sameness' with most err's in this library, do:
        err(ctx, "failed write to %s: %s\n", path, strerr(-rc));

> +
> +     free(path);
> +     return rc;
> +}
> +
> +CXL_EXPORT int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf,
> +                                 const char *debugfs)
> +{
> +     size_t path_len = strlen(debugfs) + 16;
> +     char *path;
> +     int rc = 0;
> +
> +     path = calloc(1, path_len);
> +     if (!path)
> +             return -ENOMEM;
> +
> +     snprintf(path, path_len, "%s/cxl/einj_types", debugfs);
> +     rc = sysfs_read_attr(ctx, path, buf);
> +     if (rc)
> +             err(ctx, "could not read from %s: %d\n", path, rc);
> +

same comments as previous, check path and make err msg similar


> +     free(path);
> +     return rc;
> +}
> +
>  CXL_EXPORT struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport)
>  {
>       return dport->port;
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index 304d7fa..d39a12d 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -281,4 +281,6 @@ global:
>       cxl_memdev_get_ram_qos_class;
>       cxl_region_qos_class_mismatch;
>       cxl_port_decoders_committed;
> +     cxl_dport_inject_proto_err;
> +     cxl_get_proto_errors;
>  } LIBCXL_6;

Start a new section above for these new symbols.
Each ndctl release gets a new section - if symbols added.



> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index fc6dd00..867daa4 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -160,6 +160,15 @@ struct cxl_port *cxl_port_get_next_all(struct cxl_port 
> *port,
>       for (port = cxl_port_get_first(top); port != NULL;                     \
>            port = cxl_port_get_next_all(port, top))
>  
> +enum cxl_proto_error_types {
> +     CXL_CACHE_CORRECTABLE = 1 << 12,
> +     CXL_CACHE_UNCORRECTABLE = 1 << 13,
> +     CXL_CACHE_FATAL = 1 << 14,
> +     CXL_MEM_CORRECTABLE = 1 << 15,
> +     CXL_MEM_UNCORRECTABLE = 1 << 16,
> +     CXL_MEM_FATAL = 1 << 17,
> +};

Please align like enum util_json_flags
Is there a spec reference to add?

That's all.
-- Alison



> +
>  struct cxl_dport;
>  struct cxl_dport *cxl_dport_get_first(struct cxl_port *port);
>  struct cxl_dport *cxl_dport_get_next(struct cxl_dport *dport);
> @@ -168,6 +177,10 @@ const char *cxl_dport_get_physical_node(struct cxl_dport 
> *dport);
>  const char *cxl_dport_get_firmware_node(struct cxl_dport *dport);
>  struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport);
>  int cxl_dport_get_id(struct cxl_dport *dport);
> +int cxl_dport_inject_proto_err(struct cxl_dport *dport,
> +                            enum cxl_proto_error_types err,
> +                            const char *debugfs);
> +int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, const char 
> *debugfs);
>  bool cxl_dport_maps_memdev(struct cxl_dport *dport, struct cxl_memdev 
> *memdev);
>  struct cxl_dport *cxl_port_get_dport_by_memdev(struct cxl_port *port,
>                                              struct cxl_memdev *memdev);
> diff --git a/cxl/meson.build b/cxl/meson.build
> index 61b4d87..79da4e6 100644
> --- a/cxl/meson.build
> +++ b/cxl/meson.build
> @@ -7,6 +7,7 @@ cxl_src = [
>    'memdev.c',
>    'json.c',
>    'filter.c',
> +  'inject-error.c',
>    '../daxctl/json.c',
>    '../daxctl/filter.c',
>  ]
> -- 
> 2.34.1
> 
> 

Reply via email to