Thomas,
your analysis is as good as any possible.  There should be at least an ioctl() 
call after the open() to create the objects before the pwrite64() call.  You 
would need to discuss this with Cray, use a different MPI, or potentially 
"pre-create" the file before MPI_File_open() so that O_LOV_DELAY_CREATE has no 
effect.

Cheers, Andreas

On Jul 30, 2024, at 08:40, Bertschinger, Thomas Andrew Hjorth via 
lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

Hello,

We have an application that fails doing the following on one of our systems:

...
openat(AT_FDCWD, "mpi_test.out", O_WRONLY|O_CREAT|O_NOCTTY|FASYNC, 0611) = 4
pwrite64(4, "\3\0\0\0", 4, 0)           = -1 EBADF (Bad file descriptor)
...

It opens a file with O_LOV_DELAY_CREATE (or O_NOCTTY|FASYNC as strace 
interprets it), and then immediately tries to write to it.

>From the comments above ll_file_open() in Lustre:

If opened with O_LOV_DELAY_CREATE, then we don't do the object creation or open 
until ll_lov_setstripe() ioctl is called.

It sounds like the expectation is that the process calling open() like this 
follows it up with an ioctl to set the stripe information prior to writing.

Is this correct? In other words, is it reasonable to say that the failing code 
is doing something erroneous?

Here's a minimal MPI program that reproduces the problem. The issue only arises 
when using the Cray MPI implementation, however. When tested with openmpi and 
ANL mpich, the openat() call doesn't use O_LOV_DELAY_CREATE. Since the Cray 
implementation is unfortunately not open source, I have no insight into what 
this code is "supposed" to be doing. :(

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
       int err = MPI_Init(&argc, &argv);

       MPI_File fh;
       err = MPI_File_open(MPI_COMM_WORLD, "mpi_test.out",
               MPI_MODE_WRONLY|MPI_MODE_CREATE, MPI_INFO_NULL, &fh);
       printf("MPI_File_open returned: %d\n", err);

       long data = 3;
       err = MPI_File_write(fh, &data, 1, MPI_LONG, MPI_STATUS_IGNORE);
       printf("MPI_File_write returned: %d\n", err);

       err = MPI_File_close(&fh);
       printf("MPI_File_close returned: %d\n", err);

       MPI_Finalize();
       return 0;
}

Thanks,
Thomas Bertschinger
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [lustre-discuss] que... Bertschinger, Thomas Andrew Hjorth via lustre-discuss
    • Re: [lustre-dis... Andreas Dilger via lustre-discuss

Reply via email to