Hi Alison, sorry it took me so long to respond I've been a bit sick. Responses 
inline.

On 2/13/25 6:10 PM, Alison Schofield wrote:
> On Wed, Jan 08, 2025 at 03:57:49PM -0600, Ben Cheatham wrote:
>> Add inject-error command for injecting CXL errors into CXL devices.
>> The command currently only has support for injecting CXL protocol
>> errors into CXL downstream ports via EINJ.
> 
> Hi Ben,
> I went through enough to give you some feedback for your v1.
> Just ran out of time and didn't want to keep you waiting any longer.
> 
> wrt 'currently only has' - what is the grander scope of this that we
> might see in the future. Will there only every be one system wide
> response to --list-errors or will there be error types per type of
> port.
> 

My thinking was that it would be a command to do any type of CXL-related error
injection. At the moment, that would be poison and protocol error injection,
but I wanted to leave it open-ended in case something else came along in later
spec revisions.

As for the --list-errors, it'll probably be per type of port. I think I see 
where
you're going with that, I can look into that option taking an argument for the 
type
of the component.

> Spec reference?
> 

Protocol EINJ is defined starting in the ACPI v6.5 spec, section 18.6.4. I'll 
add
references in the commit message(s) and code comments.

>>
>> The command takes an error type and injects an error of that type into the
>> specified downstream port. Downstream ports can be specified using the
>> port's device name with the -d option. Available error types can be obtained
>> by running "cxl inject-error --list-errors".
>>
>> This command requires the kernel to be built with CONFIG_DEBUGFS and
>> CONFIG_ACPI_APEI_EINJ_CXL enabled. It also requires root privileges to
>> run due to reading from <debugfs>/cxl/einj_types and writing to
>> <debugfs>/cxl/<dport>/einj_inject.
>>
>> Example usage:
>>     # cxl inject-error --list-errors
>>     cxl.mem_correctable
>>     cxl.mem_fatal
>>     ...
>>     # cxl inject-error -d 0000:00:01.1 cxl.mem_correctable
>>     injected cxl.mem_correctable protocol error
>>
> 
> I'll probably ask this again later on, but how does user see
> list of downstream ports. Does user really think -d dport,
> or do they think -d device-name, or ?
> Man page will help here.
> 

I went back and forth on whether to use '-d' or '-s' (to follow lspci), but
ended up on '-d' since the dport device name isn't necessarily a PCIe SBDF.
I don't mind changing it if you have a better suggestion.

I'll also add a man page next time.

> We don't have cxl-cli support for poison inject, but to future
> proof this, let's think about the naming.
> 
> Please split the patch up at least into 2 -
> libcxl: Add APIs supporting CXL protocol error injection
>         include updates to Documentation/cxl/lib/libcxl.txt
> 
> cxl: add {list,inject}-protocol-error' commands
>      include man page updates, additiona
> 

Will do.

> The 'list-errors' is the the system level error support, right?
> Could that fit in the existing 'cxl list' hierachy?
> Would they be a property of the bus?

The errors listed by '--list-errors' are the CXL-related EINJ error types
(ACPI v6.5, table 18.30). These error types are provided by the ACPI EINJ
implementation and are system-wide AFAIK. So that wouldn't be a problem for
adding them as a bus property, but the issue with that is these error types
are meant to be used on CXL-capable PCIe root ports.

I think it may be confusing to list these under a bus, and I wouldn't want to 
muddy
up the appropriate dport entries since there can techincally be up to 6 of 
these error
types (and the names are a bit long). If you think that's preferable, or adding 
it
as a bus property is fine, I'd be happy to do so.

> 
> I don't know that 'protocol' is best name, but seems it needs
> another descriptor.
> 

I just took the names from the spec, if you have a suggestion here I'd be happy
to hear it!

> 
>> Signed-off-by: Ben Cheatham <benjamin.cheat...@amd.com>
>> ---
>>  cxl/builtin.h      |   1 +
>>  cxl/cxl.c          |   1 +
>>  cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++
>>  cxl/lib/libcxl.c   |  53 +++++++++++++
>>  cxl/lib/libcxl.sym |   2 +
>>  cxl/libcxl.h       |  13 ++++
>>  cxl/meson.build    |   1 +
>>  7 files changed, 259 insertions(+)
>>  create mode 100644 cxl/inject-error.c
>>
>> diff --git a/cxl/builtin.h b/cxl/builtin.h
>> index c483f30..e82fcb5 100644
>> --- a/cxl/builtin.h
>> +++ b/cxl/builtin.h
>> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct 
>> cxl_ctx *ctx);
>>  int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>  int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>  int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
>> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
>>  #ifdef ENABLE_LIBTRACEFS
>>  int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
>>  #else
>> diff --git a/cxl/cxl.c b/cxl/cxl.c
>> index 1643667..f808926 100644
>> --- a/cxl/cxl.c
>> +++ b/cxl/cxl.c
>> @@ -79,6 +79,7 @@ static struct cmd_struct commands[] = {
>>      { "enable-region", .c_fn = cmd_enable_region },
>>      { "disable-region", .c_fn = cmd_disable_region },
>>      { "destroy-region", .c_fn = cmd_destroy_region },
>> +    { "inject-error", .c_fn = cmd_inject_error },
>>      { "monitor", .c_fn = cmd_monitor },
>>  };
>>  
>> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
>> new file mode 100644
> 
> Can this fit in an existing file?  port.c ?

I just put it into a new file since it's a new command. The functionality in the
file ATM could probably fit, but my idea was this command would eventually not 
just
do dport protocol error injection.

> 
> I didn't review this file yet, so snipping.
> 
> 
>> index 0000000..3645934
>> --- /dev/null
>> +++ b/cxl/inject-error.c
> 
> snip
> 
>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>> index 91eedd1..8174c11 100644
>> --- a/cxl/lib/libcxl.c
>> +++ b/cxl/lib/libcxl.c
>> @@ -3179,6 +3179,59 @@ CXL_EXPORT int cxl_dport_get_id(struct cxl_dport 
>> *dport)
>>      return dport->id;
>>  }
>>  
>> +CXL_EXPORT int cxl_dport_inject_proto_err(struct cxl_dport *dport,
>> +                                      enum cxl_proto_error_types perr,
>> +                                      const char *debugfs)
>> +{
>> +    struct cxl_port *port = cxl_dport_get_port(dport);
>> +    size_t path_len = strlen(debugfs) + 24;
> 
> What's the path_len math, +24 here and +16 in next fcn.
> I notice other path calloc's in this file padding more, ie +100 or +50.
> 

Taking another look at it, I can't remember why they are different :/. If I
had to guess I was adding the length of the sysfs attribute (i.e. "einj_inject")
+ a bit. I'll change this to just add +50 or so when I get to the calloc call
instead of doing it at the top (and also make it uniform between functions).

> 
>> +    struct cxl_ctx *ctx = port->ctx;
>> +    char buf[32];
>> +    char *path;
>> +    int rc;
>> +
>> +    if (!dport->dev_path) {
>> +            err(ctx, "no dev_path for dport\n");
>> +            return -EINVAL;
>> +    }
>> +
>> +    path_len += strlen(dport->dev_path);
>> +    path = calloc(1, path_len);
>> +    if (!path)
>> +            return -ENOMEM;
>> +
>> +    snprintf(path, path_len, "%s/cxl/%s/einj_inject", debugfs,
>> +             cxl_dport_get_devname(dport));
>> +
>> +    snprintf(buf, sizeof(buf), "0x%lx\n", (u64) perr);
> 
> Here, and in cxl_get_proto_errors(), can we check for the path and
> fail with 'unsupported' if it doesn't exist. That'll tell folks
> if feature is not configured, or not enabled ?
> Are kernel configured and port enabled 2 different levels?

For sure. If you have the kernel configured to use CXL EINJ these files
come up as part of CXL driver init (specifically when cxl_dports are 
initialized).

It's possible the CXL driver will come up before EINJ is initialized, in which 
case
the sysfs files won't be visible. I've only seen that happen when both the EINJ 
module
and CXL module(s) are built-in. In general though, if the kernel is configured 
correctly
and the CXL driver doesn't run into any issues these files should be present.

> 
> There's an example in this file w "if (access(path, F_OK) != 0)"
> 
> 
>> +    rc = sysfs_write_attr(ctx, path, buf);
>> +    if (rc)
>> +            err(ctx, "could not write to %s: %d\n", path, rc);
> 
> for 'sameness' with most err's in this library, do:
>         err(ctx, "failed write to %s: %s\n", path, strerr(-rc));
> 

Will do!

>> +
>> +    free(path);
>> +    return rc;
>> +}
>> +
>> +CXL_EXPORT int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf,
>> +                                const char *debugfs)
>> +{
>> +    size_t path_len = strlen(debugfs) + 16;
>> +    char *path;
>> +    int rc = 0;
>> +
>> +    path = calloc(1, path_len);
>> +    if (!path)
>> +            return -ENOMEM;
>> +
>> +    snprintf(path, path_len, "%s/cxl/einj_types", debugfs);
>> +    rc = sysfs_read_attr(ctx, path, buf);
>> +    if (rc)
>> +            err(ctx, "could not read from %s: %d\n", path, rc);
>> +
> 
> same comments as previous, check path and make err msg similar
> 

Ok!

> 
>> +    free(path);
>> +    return rc;
>> +}
>> +
>>  CXL_EXPORT struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport)
>>  {
>>      return dport->port;
>> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
>> index 304d7fa..d39a12d 100644
>> --- a/cxl/lib/libcxl.sym
>> +++ b/cxl/lib/libcxl.sym
>> @@ -281,4 +281,6 @@ global:
>>      cxl_memdev_get_ram_qos_class;
>>      cxl_region_qos_class_mismatch;
>>      cxl_port_decoders_committed;
>> +    cxl_dport_inject_proto_err;
>> +    cxl_get_proto_errors;
>>  } LIBCXL_6;
> 
> Start a new section above for these new symbols.
> Each ndctl release gets a new section - if symbols added.
> 

Gotcha, I'll add it.

> 
> 
>> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
>> index fc6dd00..867daa4 100644
>> --- a/cxl/libcxl.h
>> +++ b/cxl/libcxl.h
>> @@ -160,6 +160,15 @@ struct cxl_port *cxl_port_get_next_all(struct cxl_port 
>> *port,
>>      for (port = cxl_port_get_first(top); port != NULL;                     \
>>           port = cxl_port_get_next_all(port, top))
>>  
>> +enum cxl_proto_error_types {
>> +    CXL_CACHE_CORRECTABLE = 1 << 12,
>> +    CXL_CACHE_UNCORRECTABLE = 1 << 13,
>> +    CXL_CACHE_FATAL = 1 << 14,
>> +    CXL_MEM_CORRECTABLE = 1 << 15,
>> +    CXL_MEM_UNCORRECTABLE = 1 << 16,
>> +    CXL_MEM_FATAL = 1 << 17,
>> +};
> 
> Please align like enum util_json_flags
> Is there a spec reference to add?

Will do, and I'll add the reference as well.

Thanks,
Ben

> 
> That's all.
> -- Alison
> 
> 
> 
>> +
>>  struct cxl_dport;
>>  struct cxl_dport *cxl_dport_get_first(struct cxl_port *port);
>>  struct cxl_dport *cxl_dport_get_next(struct cxl_dport *dport);
>> @@ -168,6 +177,10 @@ const char *cxl_dport_get_physical_node(struct 
>> cxl_dport *dport);
>>  const char *cxl_dport_get_firmware_node(struct cxl_dport *dport);
>>  struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport);
>>  int cxl_dport_get_id(struct cxl_dport *dport);
>> +int cxl_dport_inject_proto_err(struct cxl_dport *dport,
>> +                           enum cxl_proto_error_types err,
>> +                           const char *debugfs);
>> +int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, const char 
>> *debugfs);
>>  bool cxl_dport_maps_memdev(struct cxl_dport *dport, struct cxl_memdev 
>> *memdev);
>>  struct cxl_dport *cxl_port_get_dport_by_memdev(struct cxl_port *port,
>>                                             struct cxl_memdev *memdev);
>> diff --git a/cxl/meson.build b/cxl/meson.build
>> index 61b4d87..79da4e6 100644
>> --- a/cxl/meson.build
>> +++ b/cxl/meson.build
>> @@ -7,6 +7,7 @@ cxl_src = [
>>    'memdev.c',
>>    'json.c',
>>    'filter.c',
>> +  'inject-error.c',
>>    '../daxctl/json.c',
>>    '../daxctl/filter.c',
>>  ]
>> -- 
>> 2.34.1
>>
>>

Reply via email to