On Fri, 2021-07-23 at 17:42 +0200, Pavel Stehule wrote: > čt 22. 7. 2021 v 0:12 odesílatel Jacob Champion <pchamp...@vmware.com> napsal: > > > > Pavel, I'd be interested to see what your benchmarks find with this > > code. Does this fix the original issue for you? > > I can confirm that the original issue is fixed.
Great! > I tested performance > > I had three data sets > > 1. typical data - mix ascii and utf characters typical for czech > language - 25K lines - there is very small slowdown 2ms from 24 to > 26ms (stored file of this result has 3MB) > > 2. the worst case - this reports has only emoji 1000 chars * 10K rows > - there is more significant slowdown - from 160 ms to 220 ms (stored > file has 39MB) I assume the stored file size has grown with this patch, since we're now printing the correct number of spacing/border characters? > 3. a little bit of obscure datasets generated by \x and select * from > pg_proc - it has 99K lines - and there are a lot of unicode > decorations (borders). The line has 17K chars. (the stored file has > 1.7GB) > In this dataset I see a slowdown from 4300 to 4700 ms. These results are lining up fairly well with my profiling. To isolate the effects of the new algorithm (as opposed to printing time) I redirected to /dev/null: psql postgres -c "select repeat(unistr('\u115D'), 10000000);" > /dev/null This is what I expect to be the worst case for the new patch: a huge string consisting of nothing but \u115D, which is in the first interval and therefore takes the maximum number of iterations during the binary search. For that command, the wall time slowdown with this patch was about 240ms (from 1.128s to 1.366s, or 21% slower). Callgrind shows an increase of 18% in the number of instructions executed with the interval table patch, all of it coming from PQdsplen (no surprise). PQdsplen itself has a 36% increase in instruction count for that run. I also did a microbenchmark of PQdsplen (code attached, requires Google Benchmark [1]). The three cases I tested were standard ASCII characters, a smiley-face emoji, and the worst-case \u115F character. Without the patch: ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ ... BM_Ascii_mean 4.97 ns 4.97 ns 5 BM_Ascii_median 4.97 ns 4.97 ns 5 BM_Ascii_stddev 0.035 ns 0.035 ns 5 ... BM_Emoji_mean 6.30 ns 6.30 ns 5 BM_Emoji_median 6.30 ns 6.30 ns 5 BM_Emoji_stddev 0.045 ns 0.045 ns 5 ... BM_Worst_mean 12.4 ns 12.4 ns 5 BM_Worst_median 12.4 ns 12.4 ns 5 BM_Worst_stddev 0.038 ns 0.038 ns 5 With the patch: ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ ... BM_Ascii_mean 4.59 ns 4.59 ns 5 BM_Ascii_median 4.60 ns 4.60 ns 5 BM_Ascii_stddev 0.069 ns 0.068 ns 5 ... BM_Emoji_mean 11.8 ns 11.8 ns 5 BM_Emoji_median 11.8 ns 11.8 ns 5 BM_Emoji_stddev 0.059 ns 0.059 ns 5 ... BM_Worst_mean 18.5 ns 18.5 ns 5 BM_Worst_median 18.5 ns 18.5 ns 5 BM_Worst_stddev 0.077 ns 0.077 ns 5 So an incredibly tiny improvement in the ASCII case, which is reproducible across multiple runs and not just a fluke (I assume because the code is smaller now and has better cache line characteristics?). A ~90% slowdown for the emoji case, and a ~50% slowdown for the worst-performing characters. That seems perfectly reasonable considering we're talking about dozens of nanoseconds. > 9% looks too high, but in absolute time it is 400ms for 99K lines and > very untypical data, or 2ms for more typical results., 2ms are > nothing (for interactive work). More - this is from a pspg > perspective. In psql there can be overhead of network, protocol > processing, formatting, and more and more, and psql doesn't need to > calculate display width of decorations (borders), what is the reason > for slowdowns in pspg. Yeah. Considering the alignment code is for user display, the absolute performance is going to dominate, and I don't see any red flags so far. If you're regularly dealing with unbelievably huge amounts of emoji, I think the amount of extra time we're seeing here is unlikely to be a problem. If it is, you can always turn alignment off. (Do you rely on horizontal alignment for lines with millions of characters, anyway?) Laurenz, Michael, what do you think? --Jacob [1] https://github.com/google/benchmark
/* * Copyright 2021 VMware, Inc. * SPDX-License-Identifier: PostgreSQL */ #include <benchmark/benchmark.h> #include <libpq-fe.h> static void BM_Ascii(benchmark::State& state) { int enc = pg_char_to_encoding("UTF8"); for (auto _ : state) { PQdsplen("a", enc); } } BENCHMARK(BM_Ascii); static void BM_Emoji(benchmark::State& state) { int enc = pg_char_to_encoding("UTF8"); for (auto _ : state) { PQdsplen(u8"\U0001F600", enc); } } BENCHMARK(BM_Emoji); static void BM_Worst(benchmark::State& state) { int enc = pg_char_to_encoding("UTF8"); for (auto _ : state) { PQdsplen(u8"\u115F", enc); } } BENCHMARK(BM_Worst); BENCHMARK_MAIN();
# # Copyright 2021 VMware, Inc. # SPDX-License-Identifier: PostgreSQL # CC := $(CXX) CPPFLAGS := -I$(shell pg_config --includedir) LDFLAGS := -pthread -L$(shell pg_config --libdir) LDLIBS := -lbenchmark -lpq main: main.o .PHONY: clean clean: rm -f main main.o .PHONY: check check: main sudo cpupower frequency-set --governor performance ./main --benchmark_repetitions=5 sudo cpupower frequency-set --governor powersave