On Fri, Nov 22, 2024 at 10:55 PM Jakub Wartak <jakub.war...@enterprisedb.com> wrote: > but with attached _lseeki64 dirty patch I've got success and resulting > tarball greater than 2^31 too: > > C:\compiledpg\bin>pg_basebackup.exe -U postgres -D "c:\backup7" -F t -P -X > stream -c fast --compress=none --create-slot --slot=slot11 > Password: > 18134035/18134035 kB (100%), 1/1 tablespace > C:\compiledpg\bin> > C:\compiledpg\bin>dir c:\backup7\pg_wal.tar > [..] > 11/22/2024 10:37 AM 4,026,778,112 pg_wal.tar > > PoC patch is attached and I'll register CFentry to see how the tests will go > (all of this MSVC/MinGW stuff is frightening to me, of course this will fail > on non-Win32...).
Great news. This will indeed require solving a few archeology puzzles to back-patch. For lseek(), I think it should be OK to just point it to _lseeki64() in win32_port.h as I showed earlier, because _lseeki64() was present in ancient msvcrt.dll. It's technically possible that an extension might be upset by the change but it'd have to be doing something pretty strange, like using lseek's type, or other such assumptions, not terribly likely. A more cautious change would do that just inside that translation unit to avoid upsetting anything else (or fixing anything else...), but that's probably too cautious, let's just do it in win32_port.h. For ftruncate(), though you didn't see a problem with that, Jacob is quite right that it must be able to corrupt stuff, and I retract that replacement code I showed earlier, it's not really OK to move the file position around. I would like to use _s_chsize() instead, but it arrived in msvcr80.dll so I don't think old MinGW can use it. Here's a new idea (sketch code not tested): static inline int ftruncate(int fd, pgoff_t length) { #if defined(_UCRT) || defined(_MSC_VER) /* MinGW + UCRT and all supported MSVC versions have this. */ errno = _s_chsize(fd, length); return errno == 0 ? 0 : -1; #else /* MinGW + ancient msvcrt.dll has only _chsize, limited by off_t (long). */ if (length > LONG_MAX) { /* A clear error is better than silent corruption. */ errno = EFBIG; return - 1; } return _chsize(fd, length); #endif } I'm not sure if we'd ever know if we broke MinGW + msvcrt.dll in the back branches, ie if any computer actually existing on earth would ever compile and run the second branch, and if so, be used for serious database work and then be able to reach the failure case. I think an honest but disappointing errno is probably OK in that hypothetical case. It's hard to know how far to go when programming ghost computers. Thoughts, anyone? (Obviously an exorcism is overdue in master, working on it...)