In <ras/ras_event.h> we define a trace event for memory errors.
The last field is:

                __field_struct(struct cper_mem_err_compact, data)

where the structure is defined in <linux/cper.h> as:

struct cper_mem_err_compact {
        __u64   validation_bits;
        __u16   node;
        __u16   card;
        __u16   module;
        __u16   bank;
        __u16   device;
        __u16   row;
        __u16   column;
        __u16   bit_pos;
        __u64   requestor_id;
        __u64   responder_id;
        __u64   target_id;
        __u16   rank;
        __u16   mem_array_handle;
        __u16   mem_dev_handle;
};

This structure was defined based on the useful bits in the
UEFI 2.4 spec appendix N, section 2.5 "Memory Error Section".

But UEFI have released a new version of the spec ... 2.5

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

and things have been updated to cope with ever increasing memory sizes
thanks to Moore's law. The old structure got a couple of tweaks as a
quick band-aid to handle current problems (__u16 isn't big enough for
the "row" entry for some 64GB DIMMs, so they squeezed bits 16:17 into a
reserved field).  But looking to the future they added a whole new GUID
record "Memory Error Section 2" that increases the width of the device,
row, column, rank and bit_pos fields from u16 to u32 and adds a couple
of completely new fields.

So the question is - how can we update the trace event to include these
new wider fields with the minimum pain to applications that look at it?
I don't know if there are any other consumers besides rasdaemon at the
moment ... but we don't want ugly transitions where you have to guess
which version of the application you need to run to work with a given
kernel version.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to