On Fri, Apr 05, 2024 at 01:18:56PM +0100, Jonathan Cameron wrote:
> On Mon, 25 Mar 2024 12:02:27 -0700
> nifan....@gmail.com wrote:
> 
> > From: Fan Ni <fan...@samsung.com>
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 0,
> >           "len": 134217728
> >       },
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> 
> Hi Fan,
> 
> I talk more on this inline, but to me this interface takes multiple extents
> so that we can treat them as a single 'offer' of capacity. That is they
> should be linked in the event log with the more flag and the host should
> have to handle them in one go (I known Ira and Navneet's code doesn't handle
> this yet, but that doesn't mean QEMU shouldn't).
> 
> Alternative for now would be to only support a single entry.  Keep the
> interface defined to take multiple entries but reject it at runtime.
> 
> I don't want to end up with a more complex interface in the end just
> because we allowed this form to not set the MORE flag today.
> We will need this to do tagged handling and ultimately sharing, so good
> to get it right from the start.
> 
> For tagged handling I think the right option is to have the tag alongside
> region-id not in the individual extents.  That way the interface is naturally
> used to generate the right description to the host.
> 
> >       ]
> >   }
> > }
Hi Jonathan,
Thanks for the detailed comments.

For the QMP interface, I have one question. 
Do we want the interface to follow exactly as shown in
Table 7-70 and Table 7-71 in cxl r3.1?

Fan

> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >       "path": "/machine/peripheral/cxl-dcd0",
> >       "region-id": 0,
> >       "extents": [
> >       {
> >           "offset": 134217728,
> >           "len": 134217728
> >       }
> >       ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni <fan...@samsung.com>
> 
> 
> 
> >          /* to-be-added range should not overlap with range already 
> > accepted */
> >          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> > @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const 
> > struct cxl_cmd *cmd,
> >      CXLDCExtentList *extent_list = &ct3d->dc.extents;
> >      uint32_t i;
> >      uint64_t dpa, len;
> > +    CXLDCExtent *ent;
> >      CXLRetCode ret;
> >  
> >      if (in->num_entries_updated == 0) {
> > +        /* Always remove the first pending extent when response received. 
> > */
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >          return CXL_MBOX_SUCCESS;
> >      }
> >  
> > @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const 
> > struct cxl_cmd *cmd,
> >  
> >      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
> >      if (ret != CXL_MBOX_SUCCESS) {
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> 
> Ah this deals with the todo I suggest you add to the earlier patch.
> I'd not mind so much if you hadn't been so thorough on other todo notes ;)
> Add one in the earlier patch and get rid of ti here like you do below.
> 
> However as I note below I think we need to handle these as groups of extents
> not single extents. That way we keep an 'offered' set offered at the same 
> time by
> a single command (and expose to host using the more flag) together and reject
> them on mass.
> 
> 
> >          return ret;
> >      }
> >  
> > @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const 
> > struct cxl_cmd *cmd,
> >  
> >          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
> >          ct3d->dc.total_extent_count += 1;
> > -        /*
> > -         * TODO: we will add a pending extent list based on event log 
> > record
> > -         * and process the list according here.
> > -         */
> > +
> > +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> > +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
> >      }
> >  
> >      return CXL_MBOX_SUCCESS;
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 951bd79a82..74cb64e843 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> 
> >  
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
> >          return CXL_EVENT_TYPE_FAIL;
> >      case CXL_EVENT_LOG_FATAL:
> >          return CXL_EVENT_TYPE_FATAL;
> > -/* DCD not yet supported */
> 
> Drop the comment but don't add the code.  We are handling DCD differently
> from other events, so this code should never deal with it.
> 
> > +    case CXL_EVENT_LOG_DYNCAP:
> > +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
> >      default:
> >          return -EINVAL;
> >      }
> > @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char 
> > *path, CxlEventLog log,
> >      }
> >  }
> 
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in
> 
> space after - (just looks odd otherwise)
> 
> > + * the list.
> > + * Return value: return true if has overlaps; otherwise, return false
> > + */
> > +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> > +                                           uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_overlaps_range(&range1, &range2)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * Check whether the range [dpa, dpa + len -1] is contained by extents in 
> 
> space after -
> 
> > + * the list.
> > + * Will check multiple extents containment once superset release is added.
> > + * Return value: return true if range is contained; otherwise, return false
> > + */
> > +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> > +                                    uint64_t dpa, uint64_t len)
> > +{
> > +    CXLDCExtent *ent;
> > +    Range range1, range2;
> > +
> > +    if (!list) {
> > +        return false;
> > +    }
> > +
> > +    range_init_nofail(&range1, dpa, len);
> > +    QTAILQ_FOREACH(ent, list, node) {
> > +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> > +        if (range_contains_range(&range2, &range1)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +/*
> > + * The main function to process dynamic capacity event. Currently DC 
> > extents
> > + * add/release requests are processed.
> > + */
> > +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog 
> > log,
> 
> As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
> with other event logs, we don't want to accidentally enable other events
> being added to the DC event log.
> 
> > +                                             CXLDCEventType type, uint16_t 
> > hid,
> > +                                             uint8_t rid,
> > +                                             CXLDCExtentRecordList 
> > *records,
> > +                                             Error **errp)
> > +{
> > +    Object *obj;
> > +    CXLEventDynamicCapacity dCap = {};
> > +    CXLEventRecordHdr *hdr = &dCap.hdr;
> > +    CXLType3Dev *dcd;
> > +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> > +    uint32_t num_extents = 0;
> > +    CXLDCExtentRecordList *list;
> > +    g_autofree CXLDCExtentRaw *extents = NULL;
> > +    uint8_t enc_log;
> > +    uint64_t dpa, offset, len, block_size;
> > +    int i, rc;
> > +    g_autofree unsigned long *blk_bitmap = NULL;
> > +
> > +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, "Unable to resolve CXL type 3 device");
> > +        return;
> > +    }
> > +
> > +    dcd = CXL_TYPE3(obj);
> > +    if (!dcd->dc.num_regions) {
> > +        error_setg(errp, "No dynamic capacity support from the device");
> > +        return;
> > +    }
> > +
> > +    rc = ct3d_qmp_cxl_event_log_enc(log);
> 
> enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.
> 
> > +    if (rc < 0) {
> > +        error_setg(errp, "Unhandled error log type");
> > +        return;
> > +    }
> > +    enc_log = rc;
> > +
> > +    if (rid >= dcd->dc.num_regions) {
> > +        error_setg(errp, "region id is too large");
> > +        return;
> > +    }
> > +    block_size = dcd->dc.regions[rid].block_size;
> > +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> > +
> > +    /* Sanity check and count the extents */
> > +    list = records;
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = offset + dcd->dc.regions[rid].base;
> > +
> > +        if (len == 0) {
> > +            error_setg(errp, "extent with 0 length is not allowed");
> > +            return;
> > +        }
> > +
> > +        if (offset % block_size || len % block_size) {
> > +            error_setg(errp, "dpa or len is not aligned to region block 
> > size");
> > +            return;
> > +        }
> > +
> > +        if (offset + len > dcd->dc.regions[rid].len) {
> > +            error_setg(errp, "extent range is beyond the region end");
> > +            return;
> > +        }
> > +
> > +        /* No duplicate or overlapped extents are allowed */
> > +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> > +                              len / block_size)) {
> > +            error_setg(errp, "duplicate or overlapped extents are 
> > detected");
> > +            return;
> > +        }
> > +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > +
> > +        num_extents++;
> > +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> > +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> > +                                               dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with pending DPA range");
> > +                return;
> > +            }
> > +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> > +                                                dpa, len)) {
> > +                error_setg(errp,
> > +                           "cannot release extent with non-existing DPA 
> > range");
> > +                return;
> > +            }
> > +        }
> > +        list = list->next;
> > +    }
> > +    if (num_extents == 0) {
> 
> We can just check if there is a first one.  That check can be done before
> counting them and is probably a little more elegant than leaving it to down 
> here.
> I'm not sure we can pass in an empty list but if we can (easy to poke 
> interface
> and check) then I assume records == NULL. 
> 
> > +        error_setg(errp, "no valid extents to send to process");
> > +        return;
> > +    }
> > +
> > +    /* Create extent list for event being passed to host */
> > +    i = 0;
> > +    list = records;
> > +    extents = g_new0(CXLDCExtentRaw, num_extents);
> > +    while (list) {
> > +        offset = list->value->offset;
> > +        len = list->value->len;
> > +        dpa = dcd->dc.regions[rid].base + offset;
> > +
> > +        extents[i].start_dpa = dpa;
> > +        extents[i].len = len;
> > +        memset(extents[i].tag, 0, 0x10);
> > +        extents[i].shared_seq = 0;
> > +        list = list->next;
> > +        i++;
> > +    }
> > +
> > +    /*
> > +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> > +     *
> > +     * All Dynamic Capacity event records shall set the Event Record 
> > Severity
> > +     * field in the Common Event Record Format to Informational Event. All
> > +     * Dynamic Capacity related events shall be logged in the Dynamic 
> > Capacity
> > +     * Event Log.
> > +     */
> > +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, 
> > sizeof(dCap),
> > +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> > +
> > +    dCap.type = type;
> > +    /* FIXME: for now, validity flag is cleared */
> > +    dCap.validity_flags = 0;
> > +    stw_le_p(&dCap.host_id, hid);
> > +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> > +    dCap.updated_region_id = 0;
> > +    /*
> > +     * FIXME: for now, the "More" flag is cleared as there is only one
> > +     * extent associating with each record and tag-based release is
> > +     * not supported.
> 
> This is misleading by my understanding of the specification.
> More isn't directly related to tags (though it is necessary for some
> flows with tags, when sharing is enabled anyway).
> The reference to record also isn't that relevant. The idea is you set
> it for all but the last record pushed to the event log (from a given
> action from an FM).
> 
> The whole reason to have a multi extent injection interface is to set
> the more flag to indicate that the OS needs to treat a bunch of extents
> as one 'offer' of capacity.  So a rejection from the OS needs to take
> out 'all those records'.  The proposed linux code will currently reject
> all by the first extent (I moaned about that yesterday). 
> 
> It is fine to not support this in the current code, but then I would check
> the number of extents and reject any multi extent commands until we
> do support it.
> 
> Ultimately I want a qmp command with more than one extent to mean
> they are one 'offer' of capacity and must be handled as such by
> the OS.  I.e. it can't reply with multiple unrelated acceptance
> or reject replies.
> 
> On the add side this is easy to support, the fiddly bit is if the
> OS rejects some or all of the capacity and you then need to
> take out all the extents offered that it hasn't accepted in it's reply.
> 
> Pending list will need to maintain that association.
> Maybe the easiest way is to have pending list be a list of sublists?
> That way each sublist is handled in one go and any non accepted extents
> in that sub list are dropped.
> 
>  
> > +     */
> > +    dCap.flags = 0;
> > +    for (i = 0; i < num_extents; i++) {
> > +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> > +               sizeof(CXLDCExtentRaw));
> > +
> > +        if (type == DC_EVENT_ADD_CAPACITY) {
> > +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> > +                                             extents[i].start_dpa,
> > +                                             extents[i].len,
> > +                                             extents[i].tag,
> > +                                             extents[i].shared_seq);
> > +        }
> > +
> > +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> > +                             (CXLEventRecordRaw *)&dCap)) {
> > +            cxl_event_irq_assert(dcd);
> > +        }
> > +    }
> > +}
> > +
> > +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                  CXLDCExtentRecordList  *records,
> > +                                  Error **errp)
> > +{
> > +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> 
> Drop passing in the log, it doesn't make sense given these events only occur
> on that log and we can hard code it in the function.
> 
> > +                                    DC_EVENT_ADD_CAPACITY, 0,
> > +                                    region_id, records, errp);
> > +}
> > +
> > +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> > +                                      CXLDCExtentRecordList  *records,
> > +                                      Error **errp)
> > +{
> > +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> > +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> > +                                     region_id, records, errp);
> > +}
> > +
> >  static void ct3_class_init(ObjectClass *oc, void *data)
> >  {
> 
> 
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index df3511e91b..b84063d9f4 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -494,6 +494,7 @@ struct CXLType3Dev {
> >           */
> >          uint64_t total_capacity; /* 256M aligned */
> >          CXLDCExtentList extents;
> > +        CXLDCExtentList extents_pending;
> >          uint32_t total_extent_count;
> >          uint32_t ext_list_gen_seq;
> 
> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 8cc4c72fa9..2645004666 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -19,13 +19,16 @@
> >  #
> >  # @fatal: Fatal Event Log
> >  #
> > +# @dyncap: Dynamic Capacity Event Log
> > +#
> >  # Since: 8.1
> >  ##
> >  { 'enum': 'CxlEventLog',
> >    'data': ['informational',
> >             'warning',
> >             'failure',
> > -           'fatal']
> > +           'fatal',
> > +           'dyncap']
> 
> Does this have the side effect of letting us inject error events
> onto the dynamic capacity log? 
> 
> >   }
> >  
> >  ##
> > @@ -361,3 +364,59 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> ...
> 
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to start add dynamic capacity extents flow. The device will
> > +# have to acknowledged the acceptance of the extents before they are 
> > usable.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to add
> > +# @extents: Extents to add
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> > +
> > +##
> > +# @cxl-release-dynamic-capacity:
> > +#
> > +# Command to start release dynamic capacity extents flow. The host will
> > +# need to respond to indicate that it has released the capacity before it
> > +# is made unavailable for read and write and can be re-added.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +# @region-id: id of the region where the extent to release
> > +# @extents: Extents to release
> > +#
> > +# Since : 9.0
> 
> 9.1
> 
> > +##
> > +{ 'command': 'cxl-release-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +            'region-id': 'uint8',
> > +            'extents': [ 'CXLDCExtentRecord' ]
> > +           }
> > +}
> 

Reply via email to