Re: [PATCH v2] libstdc++: basic_filebuf: don't flush more often than necessary.

Jonathan Wakely via Gcc-patches Tue, 08 Nov 2022 09:46:13 -0800

On Mon, 7 Nov 2022 at 17:00, Jonathan Wakely wrote:
>
> On Thu, 6 Oct 2022 at 20:03, Charles-Francois Natali via Libstdc++
> <libstd...@gcc.gnu.org> wrote:
> >
> > `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of
> > size 1024 and above, seemingly as an optimisation.
> >
> > This can have a significant performance impact if the overhead of a
> > `write` syscall is non-negligible, e.g. on a slow disk, on network
> > filesystems, or simply during IO contention because instead of flushing
> > every `BUFSIZ` (by default), we can flush every 1024 char.
> > The impact is even greater with custom larger buffers, e.g. for network
> > filesystems, because the code could issue `write` for example 1000X more
> > often than necessary with respect to the buffer size.
> > It also introduces a significant discontinuity in performance when
> > writing chunks of size 1024 and above.
> >
> > See this reproducer which writes down a fixed number of chunks to a file
> > open with `O_SYNC` - to replicate high-latency `write` - for varying
> > size of chunks:
> >
> > ```
> > $ cat test_fstream_flush.cpp
> >
> > int
> > main(int argc, char* argv[])
> > {
> >   assert(argc == 3);
> >
> >   const auto* path = argv[1];
> >   const auto chunk_size = std::stoul(argv[2]);
> >
> >   const auto fd =
> >     open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666);
> >   assert(fd >= 0);
> >
> >   auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out);
> >   auto stream = std::ostream(&filebuf);
> >
> >   const auto chunk = std::vector<char>(chunk_size);
> >
> >   for (auto i = 0; i < 1'000; ++i) {
> >     stream.write(chunk.data(), chunk.size());
> >   }
> >
> >   return 0;
> > }
> > ```
> >
> > ```
> > $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17
> > $ for i in $(seq 1021 1025); do echo -e "\n$i"; time 
> > /tmp/test_fstream_flush /tmp/foo $i; done
> >
> > 1021
> >
> > real    0m0.997s
> > user    0m0.000s
> > sys     0m0.038s
> >
> > 1022
> >
> > real    0m0.939s
> > user    0m0.005s
> > sys     0m0.032s
> >
> > 1023
> >
> > real    0m0.954s
> > user    0m0.005s
> > sys     0m0.034s
> >
> > 1024
> >
> > real    0m7.102s
> > user    0m0.040s
> > sys     0m0.192s
> >
> > 1025
> >
> > real    0m7.204s
> > user    0m0.025s
> > sys     0m0.209s
> > ```
> >
> > See the huge drop in performance at the 1024-boundary.
>
> I've finally found time to properly look at this, sorry for the delay.
>
> I thought I was unable to reproduce these numbers, then I realised I'd
> already installed a build with the patch, so was measuring the patched
> performance for both my "before" and "after" tests. Oops!
>
> My concern is that the patch doesn't only affect files on remote
> filesystems. I assume the original 1024-byte chunking behaviour is
> there for a reason, because for large writes the performance might be
> better if we just write directly instead of buffering and then writing
> again. Assuming we have a fast disk, writing straight to disk avoids
> copying in and out of the buffer. But if we have a slow disk, it's
> better to buffer and reduce the total number of disk writes. I'm
> concerned that the patch optimizes the slow disk case potentially at a
> cost for the fast disk case.
>
> I wonder whether it would make sense to check whether the buffer size
> has been manually changed, i.e. epptr() - pbase() != _M_buf_size. If
> the buffer has been explicitly set by the user, then we should assume
> they really want it to be used and so don't bypass it for writes >=
> 1024.
>
> In the absence of a better idea, I think I'm going to commit the patch
> as-is. I don't see it causing any measurable slowdown for very large
> writes on fast disks, and it's certainly a huge improvement for slow
> disks.


The patch has been pushed to trunk now, thanks for the contribution.

I removed the testcase and results from the commit message as they
don't need to be in the git log. I added a link to your email into
bugzilla though, so we can still find it easily.

Re: [PATCH v2] libstdc++: basic_filebuf: don't flush more often than necessary.

Reply via email to