Hi, On 2020-04-22 09:52:53 -0400, Robert Haas wrote: > On Tue, Apr 21, 2020 at 6:57 PM Andres Freund <and...@anarazel.de> wrote: > > I agree that trying to make backups very fast is a good goal (or well, I > > think not very slow would be a good descriptor for the current > > situation). I am just trying to make sure we tackle the right problems > > for that. My gut feeling is that we have to tackle compression first, > > because without addressing that "all hope is lost" ;) > > OK. I have no objection to the idea of starting with (1) server side > compression and (2) a better compression algorithm. However, I'm not > very sold on the idea of relying on parallelism that is specific to > compression. I think that parallelism across the whole operation - > multiple connections, multiple processes, etc. - may be a more > promising approach than trying to parallelize specific stages of the > process. I am not sure about that; it could be wrong, and I'm open to > the possibility that it is, in fact, wrong.
*My* gut feeling is that you're going to have a harder time using CPU time efficiently when doing parallel compression via multiple processes and independent connections. You're e.g. going to have a lot more context switches, I think. And there will be network overhead from doing more connections (including worse congestion control). > Leaving out all the three and four digit wall times from your table: > > > method level parallelism wall-time cpu-user-time > > cpu-kernel-time size rate format > > pigz 1 10 34.35 364.14 23.55 > > 3892401867 16.6 .gz > > zstd 1 1 82.95 67.97 11.82 > > 2853193736 22.6 .zstd > > zstd 1 10 25.05 151.84 13.35 > > 2847414913 22.7 .zstd > > zstd 6 10 43.47 374.30 12.37 > > 2745211100 23.5 .zstd > > zstd 6 20 32.50 468.18 13.44 > > 2745211100 23.5 .zstd > > zstd 9 20 57.99 949.91 14.13 > > 2606535138 24.8 .zstd > > lz4 1 1 49.94 36.60 13.33 > > 7318668265 8.8 .lz4 > > pixz 1 10 92.54 925.52 37.00 > > 1199499772 53.8 .xz > > It's notable that almost all of the fast wall times here are with > zstd; the surviving entries with pigz and pixz are with ten-way > parallelism, and both pigz and lz4 have worse compression ratios than > zstd. My impression, though, is that LZ4 might be getting a bit of a > raw deal here because of the repetitive nature of the data. I theorize > based on some reading I did yesterday, and general hand-waving, that > maybe the compression ratios would be closer together on a more > realistic data set. I agree that most datasets won't get even close to what we've seen here. And that disadvantages e.g. lz4. To come up with a much less compressible case, I generated data the following way: CREATE TABLE random_data(id serial NOT NULL, r1 float not null, r2 float not null, r3 float not null); ALTER TABLE random_data SET (FILLFACTOR = 100); ALTER SEQUENCE random_data_id_seq CACHE 1024 -- with pgbench, I ran this in parallel for 100s INSERT INTO random_data(r1,r2,r3) SELECT random(), random(), random() FROM generate_series(1, 100000); -- then created indexes, using a high fillfactor to ensure few zeroed out parts ALTER TABLE random_data ADD CONSTRAINT random_data_id_pkey PRIMARY KEY(id) WITH (FILLFACTOR = 100); CREATE INDEX random_data_r1 ON random_data(r1) WITH (fillfactor = 100); this results in a 16GB base backup. I think this is probably a good bit less compressible than most PG databases. method level parallelism wall-time cpu-user-time cpu-kernel-time size rate format gzip 1 1 305.37 299.72 5.52 7067232465 2.28 lz4 1 1 33.26 27.26 5.99 8961063439 1.80 .lz4 lz4 3 1 188.50 182.91 5.58 8204501460 1.97 .lz4 zstd 1 1 66.41 58.38 6.04 6925634128 2.33 .zstd zstd 1 10 9.64 67.04 4.82 6980075316 2.31 .zstd zstd 3 1 122.04 115.79 6.24 6440274143 2.50 .zstd zstd 3 10 13.65 106.11 5.64 6438439095 2.51 .zstd zstd 9 10 100.06 955.63 6.79 5963827497 2.71 .zstd zstd 15 10 259.84 2491.39 8.88 5912617243 2.73 .zstd pixz 1 10 162.59 1626.61 15.52 5350138420 3.02 .xz plzip 1 20 135.54 2705.28 9.25 5270033640 3.06 .lz > It's also notable that lz1 -1 is BY FAR the winner in terms of > absolute CPU consumption. So I kinda wonder whether supporting both > LZ4 and ZSTD might be the way to go, especially since once we have the > LZ4 code we might be able to use it for other things, too. Yea. I think the case for lz4 is far stronger in other places. E.g. having lz4 -1 for toast can make a lot of sense, suddenly repeated detoasting is much less of an issue, while still achieving higher compression than pglz. .oO(Now I really see how pglz compares to the above) > > One thing this reminded me of is whether using a format (tar) that > > doesn't allow efficient addressing of individual files is a good idea > > for base backups. The compression rates very likely will be better when > > not compressing tiny files individually, but at the same time it'd be > > very useful to be able to access individual files more efficiently than > > O(N). I can imagine that being important for some cases of incremental > > backup assembly. > > Yeah, being able to operate directly on the compressed version of the > file would be very useful, but I'm not sure that we have great options > available there. I think the only widely-used format that supports > that is ".zip", and I'm not too sure about emitting zip files. I don't really see a problem with emitting .zip files. It's an extremely widely used container format for all sorts of file formats these days. Except for needing a bit more complicated (and I don't think it's *that* big of a difference) code during generation / unpacking, it seems clearly advantageous over .tar.gz etc. > Apparently, pixz also supports random access to archive members, and > it did have on entry that survived my arbitrary cut in the table > above, but the last release was in 2015, and it seems to be only a > command-line tool, not a library. It also depends on libarchive and > liblzma, which is not awful, but I'm not sure we want to suck in that > many dependencies. But that's really a secondary thing: I can't > imagine us depending on something that hasn't had a release in 5 > years, and has less than 300 total commits. Oh, yea. I just looked at the various tools I could find that did parallel compression. > Other options include, perhaps, (1) emitting a tarfile of compressed > files instead of a compressed tarfile Yea, that'd help some. Although I am not sure how good the tooling to seek through tarfiles in an O(files) rather than O(bytes) manner is. I think there some cases where using separate compression state for each file would hurt us. Some of the archive formats have support for reusing compression state, but I don't know which. > , and (2) writing our own index files. We don't know when we begin > emitting the tarfile what files we're going to find our how big they > will be, so we can't really emit a directory at the beginning of the > file. Even if we thought we knew, files can disappear or be truncated > before we get around to archiving them. However, when we reach the end > of the file, we do know what we included and how big it was, so > possibly we could generate an index for each tar file, or include > something in the backup manifest. Hm. There's some appeal to just store offsets in the manifest, and to make sure it's a seakable offset in the compression stream. OTOH, it makes it pretty hard for other tools to generate a compatible archive. > > The other big benefit is that zstd's library has multi-threaded > > compression built in, whereas that's not the case for other libraries > > that I am aware of. > > Wouldn't it be a problem to let the backend become multi-threaded, at > least on Windows? We already have threads in windows, e.g. the signal handler emulation stuff runs in one. Are you thinking of this bit in postmaster.c: #ifdef HAVE_PTHREAD_IS_THREADED_NP /* * On macOS, libintl replaces setlocale() with a version that calls * CFLocaleCopyCurrent() when its second argument is "" and every relevant * environment variable is unset or empty. CFLocaleCopyCurrent() makes * the process multithreaded. The postmaster calls sigprocmask() and * calls fork() without an immediate exec(), both of which have undefined * behavior in a multithreaded program. A multithreaded postmaster is the * normal case on Windows, which offers neither fork() nor sigprocmask(). */ if (pthread_is_threaded_np() != 0) ereport(FATAL, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("postmaster became multithreaded during startup"), errhint("Set the LC_ALL environment variable to a valid locale."))); #endif ? I don't really see any of the concerns there to apply for the base backup case. Greetings, Andres Freund