On 4/8/20 3:25 AM, Liu, Jingqi wrote: > On 4/8/2020 2:28 AM, Joao Martins wrote: >> On 4/7/20 5:55 PM, Dan Williams wrote: >>> On Tue, Apr 7, 2020 at 4:01 AM Joao Martins <joao.m.mart...@oracle.com> >>> wrote: >>>> On 4/1/20 4:13 AM, Jingqi Liu wrote: >>>>> If the backend file is devdax pmem character device, the alignment >>>>> specified by the option 'align=NUM' in the '-object memory-backend-file' >>>>> needs to match the alignment requirement of the devdax pmem character >>>>> device. >>>>> >>>>> This patch fetches the devdax pmem file 'align', so that we can compare >>>>> it with the NUM of 'align=NUM'. >>>>> The NUM needs to be larger than or equal to the devdax pmem file 'align'. >>>>> >>>>> It also fixes the problem that mmap() returns failure in qemu_ram_mmap() >>>>> when the NUM of 'align=NUM' is less than the devdax pmem file 'align'. >>>>> >>>>> Cc: Dan Williams <dan.j.willi...@intel.com> >>>>> Signed-off-by: Jingqi Liu <jingqi....@intel.com> >>>>> --- >>>>> exec.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- >>>>> 1 file changed, 45 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/exec.c b/exec.c >>>>> index de9d949902..8221abffec 100644 >>>>> --- a/exec.c >>>>> +++ b/exec.c >>>>> @@ -1736,6 +1736,42 @@ static int64_t get_file_size(int fd) >>>>> return size; >>>>> } >>>>> >>>>> +static int64_t get_file_align(int fd) >>>>> +{ >>>>> + int64_t align = -1; >>>>> +#if defined(__linux__) >>>>> + struct stat st; >>>>> + >>>>> + if (fstat(fd, &st) < 0) { >>>>> + return -errno; >>>>> + } >>>>> + >>>>> + /* Special handling for devdax character devices */ >>>>> + if (S_ISCHR(st.st_mode)) { >>>>> + g_autofree char *subsystem_path = NULL; >>>>> + g_autofree char *subsystem = NULL; >>>>> + >>>>> + subsystem_path = g_strdup_printf("/sys/dev/char/%d:%d/subsystem", >>>>> + major(st.st_rdev), >>>>> minor(st.st_rdev)); >>>>> + subsystem = g_file_read_link(subsystem_path, NULL); >>>>> + >>>>> + if (subsystem && g_str_has_suffix(subsystem, "/dax")) { >>>>> + g_autofree char *align_path = NULL; >>>>> + g_autofree char *align_str = NULL; >>>>> + >>>>> + align_path = >>>>> g_strdup_printf("/sys/dev/char/%d:%d/device/align", >>>>> + major(st.st_rdev), >>>>> minor(st.st_rdev)); >>>>> + >>>> Perhaps, you meant instead: >>>> >>>> /sys/dev/char/%d:%d/align >>>> >>> Hmm, are you sure that's working? >> It is, except that I made the slight mistake of testing with a bunch of wip >> patches on top which one of them actually adds the 'align' to child dax >> device. >> >> Argh, my apologies - and thanks for noticing. >> >>> I expect the alignment to be found >>> in the region device: >>> >>> /sys/class/dax: >>> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region1/dax1.1/dax1.0 >>> $(readlink -f /sys/dev/char/253\:263)/../align >>> $(readlink -f /sys/dev/char/253\:263)/device/align >>> >>> >>> /sys/bus/dax: >>> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region1/dax1.0/dax1.0 >>> $(readlink -f /sys/dev/char/253\:265)/../align >>> $(readlink -f /sys/dev/char/253\:265)/device/align <-- No such file >>> >>> The use of the /sys/dev/char/%d:%d/device is only supported by the >>> deprecated /sys/class/dax. > > Hi Dan, > > Thanks for your comments. > > Seems it is a mistake. > > It should be: $(readlink -f /sys/dev/char/253\:263)/../../align >
Hmm, perhaps you have an extra '../' in the path? This works for me: # ls $(readlink -f /sys/dev/char/252\:0/../align) /sys/devices/platform/e820_pmem/ndbus0/region0/dax0.0/dax0.0/../align # cat $(readlink -f /sys/dev/char/252\:0)/../align 2097152 # cat /sys/dev/char/252\:0/../align 2097152 >> I don't have the deprecated dax class enabled as could you tell, so the >> second >> case is what I was testing. Except it wasn't a namespace/nvdimm but rather an >> hmem device-dax. >> >> '../align' though covers only one case? What about hmem which '../align' >> returns >> ENOENT; perhaps using '../dax_region/align' instead which is common to both? >> Albeit that wouldn't address the sub-division devices (that I mention above) > > Seems that you mean to use $(readlink -f > /sys/dev/char/253\:263)/../../dax_region/align. > > Right ? > An extra '../' ? # ls $(readlink -f /sys/dev/char/252\:0/../dax_region/align) /sys/devices/platform/e820_pmem/ndbus0/region0/dax0.0/dax0.0/../align # cat $(readlink -f /sys/dev/char/252\:0)/../dax_region/align 2097152 # cat /sys/dev/char/252\:0/../dax_region/align 2097152 For HMAT/hmem devdax, though, only 'dax_region/align' is available for now: # ls $(readlink -f /sys/dev/char/252:0)/../align ls: cannot access /sys/devices/platform/hmem.0/dax0.0/../align: No such file or directory # ls $(readlink -f /sys/dev/char/252:0)/../dax_region/align /sys/devices/platform/hmem.0/dax0.0/../dax_region/align # cat $(readlink -f /sys/dev/char/252:0)/../dax_region/align 2097152 The 'dax_region/align' was just an idea mainly because it's common to both device-dax devices -- not sure how others feel about it. Joao > Thanks, > > Jingqi > >>> The current /sys/bus/dax device-model can >>> be a drop in replacement as long as software is not written to the >>> /sys/class sysfs layout, i.e. it uses ../ instead of device/ to walk >>> to the region properties. >>> >> /nods