Thomas, your analysis is as good as any possible. There should be at least an ioctl() call after the open() to create the objects before the pwrite64() call. You would need to discuss this with Cray, use a different MPI, or potentially "pre-create" the file before MPI_File_open() so that O_LOV_DELAY_CREATE has no effect.
Cheers, Andreas On Jul 30, 2024, at 08:40, Bertschinger, Thomas Andrew Hjorth via lustre-discuss <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote: Hello, We have an application that fails doing the following on one of our systems: ... openat(AT_FDCWD, "mpi_test.out", O_WRONLY|O_CREAT|O_NOCTTY|FASYNC, 0611) = 4 pwrite64(4, "\3\0\0\0", 4, 0) = -1 EBADF (Bad file descriptor) ... It opens a file with O_LOV_DELAY_CREATE (or O_NOCTTY|FASYNC as strace interprets it), and then immediately tries to write to it. >From the comments above ll_file_open() in Lustre: If opened with O_LOV_DELAY_CREATE, then we don't do the object creation or open until ll_lov_setstripe() ioctl is called. It sounds like the expectation is that the process calling open() like this follows it up with an ioctl to set the stripe information prior to writing. Is this correct? In other words, is it reasonable to say that the failing code is doing something erroneous? Here's a minimal MPI program that reproduces the problem. The issue only arises when using the Cray MPI implementation, however. When tested with openmpi and ANL mpich, the openat() call doesn't use O_LOV_DELAY_CREATE. Since the Cray implementation is unfortunately not open source, I have no insight into what this code is "supposed" to be doing. :( #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int err = MPI_Init(&argc, &argv); MPI_File fh; err = MPI_File_open(MPI_COMM_WORLD, "mpi_test.out", MPI_MODE_WRONLY|MPI_MODE_CREATE, MPI_INFO_NULL, &fh); printf("MPI_File_open returned: %d\n", err); long data = 3; err = MPI_File_write(fh, &data, 1, MPI_LONG, MPI_STATUS_IGNORE); printf("MPI_File_write returned: %d\n", err); err = MPI_File_close(&fh); printf("MPI_File_close returned: %d\n", err); MPI_Finalize(); return 0; } Thanks, Thomas Bertschinger _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org