On Wed, 2021-11-17 at 14:44 -0500, Jaime Casanova wrote: > I'm trying to add more parallelism by copying individual segments > of a relfilenode in different processes. Does anyone one see a big > problem in trying to do that? I'm asking because no one did it before, > that could not be a good sign.
I looked into speeding this up a while back, too. For the use case I was looking at -- Greenplum, which has huge numbers of relfilenodes -- spinning disk I/O was absolutely the bottleneck and that is typically not easily parallelizable. (In fact I felt at the time that Andres' work on async I/O might be a better way forward, at least for some filesystems.) But you mentioned that you were seeing disks that weren't saturated, so maybe some CPU optimization is still valuable? I am a little skeptical that more parallelism is the way to do that, but numbers trump my skepticism. > - why we read()/write() at all? is not a faster way of copying the file? > i'm asking that because i don't actually know. I have idly wondered if something based on splice() would be faster, but I haven't actually tried it. But there is now support for copy-on-write with the clone mode, isn't there? Or are you not able to take advantage of it? --Jacob