On 2020-04-30 2:18 a.m., Sumanta Mukherjee wrote:
Hi,
Would it be possible to put in the absolute numbers of the perf
so that it is easier to understand the amount of improvement with
and without the patch and different loads and workers.
Here is the parameters used to record the perf data on both server and
client side, for example, after applied the patch15 using 4 workers with
load,
perf record -o postgres_patch_j4_load -e block:block_rq_insert -e
cpu-clock -e cycles:k -e skb:consume_skb -aR -s --
/home/ec2-user/after/bin/postgres -D /mnt/test/data
perf record -o backup_patch_j4_load -e block:block_rq_insert -e
cpu-clock -e cycles:k -e skb:consume_skb -aR -s --
/home/ec2-user/after/bin/pg_basebackup -h ${PG_SERVER} -p 5432 -D
/mnt/backup/data -v
And this is how the report is generated.
perf report -i postgres_patch_j4_load --stdio > postgres_patch_j4_load.txt
The original perf data files are still available, can you please clarify
which parameter you would like to be added for regenerating the report,
or any other parameters need to be added to recreate the perf.data and
then generate the report?
I am also unsure why the swapper is taking such a huge percentage of
the absolute time
in the base run of just the postgres server and pg_basebackup client.
With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zh...@highgo.ca
<mailto:david.zh...@highgo.ca>> wrote:
Hi,
Thanks a lot for sharing the test results. Here is the our test
results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
Instance Type :t2.xlarge
Volume type :io1
Memory (MiB) :16GB
vCPU # :4
Architecture :x86_64
IOP :6000
Database Size (GB) :45 (Server)
case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.65% postgres [kernel.kallsyms] [k] __elv_add_request
2.27% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command Shared Object Symbol
# ........ ............... ....................
.............................................
#
64.73% swapper [kernel.kallsyms] [k]
native_safe_halt
10.89% postgres [vdso] [.]
__vdso_gettimeofday
5.64% postgres [kernel.kallsyms] [k] do_syscall_64
5.43% postgres libpthread-2.26.so
<http://libpthread-2.26.so> [.] __libc_recv
1.72% postgres [kernel.kallsyms] [k]
pvclock_clocksource_read
* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command Shared Object Symbol
# ........ ............... .................
...........................
#
91.58% swapper [kernel.kallsyms] [k] consume_skb
7.09% postgres [kernel.kallsyms] [k] consume_skb
0.61% kswapd0 [kernel.kallsyms] [k] consume_skb
0.44% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
96.78% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
2.82% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.29% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.09% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ..................
.............................................
#
87.99% swapper [kernel.kallsyms] [k] native_safe_halt
3.14% swapper [kernel.kallsyms] [k] __lock_text_start
0.48% swapper [kernel.kallsyms] [k]
__softirqentry_text_start
0.37% pg_basebackup [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
0.35% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command Shared Object Symbol
# ........ ............... .................
...........................
#
95.12% swapper [kernel.kallsyms] [k] consume_skb
3.23% pg_basebackup [kernel.kallsyms] [k] consume_skb
0.83% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.45% kswapd0 [kernel.kallsyms] [k] consume_skb
case 2: postgres server: with patch and with load, 4 backup
workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
98.88% postgres [kernel.kallsyms] [k] __elv_add_request
0.66% perf [kernel.kallsyms] [k] __elv_add_request
0.42% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.01% sshd [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command Shared Object Symbol
# ........ ............... .....................
.............................................
#
52.73% swapper [kernel.kallsyms] [k]
native_safe_halt
8.31% postgres [vdso] [.]
__vdso_gettimeofday
4.46% postgres [kernel.kallsyms] [k] do_syscall_64
4.16% postgres libpthread-2.26.so
<http://libpthread-2.26.so> [.] __libc_recv
1.58% postgres [kernel.kallsyms] [k]
__lock_text_start
1.52% postgres [kernel.kallsyms] [k]
pvclock_clocksource_read
0.81% postgres [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command Shared Object Symbol
# ........ ............... .................
...........................
#
85.81% postgres [kernel.kallsyms] [k] consume_skb
12.03% swapper [kernel.kallsyms] [k] consume_skb
0.97% postgres [kernel.kallsyms] [k]
__consume_stateless_skb
0.85% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
0.24% perf [kernel.kallsyms] [k] consume_skb
case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.26% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
1.45% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.95% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.14% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................
.................................................
#
89.25% swapper [kernel.kallsyms] [k] native_safe_halt
0.93% pg_basebackup [kernel.kallsyms] [k] __lock_text_start
0.91% swapper [kernel.kallsyms] [k] __lock_text_start
0.69% pg_basebackup [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
0.45% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command Shared Object Symbol
# ........ ............... .................
...........................
#
90.28% pg_basebackup [kernel.kallsyms] [k] consume_skb
9.09% swapper [kernel.kallsyms] [k] consume_skb
0.29% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.21% sshd [kernel.kallsyms] [k] consume_skb
The detailed perf report is attached, with different scenarios,
i.e. without patch (with and without load for server and client) ,
with patch (with and without load for 1, 2, 4, 8 workers for both
server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
Hi,
We at EnterpriseDB did some performance testing around this
parallel backup to check how this is beneficial and below are the
results. In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.
We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
*Machine details: *
1: Server (on which local backups performed and used as server
for remote backups)
2: Client (Used as a client for remote backups)
*Server:*
RAM:500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem:ext4
*Client:*
RAM:490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem:ext4
Below are the results for the local test:
Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased
Results for the remote test:
Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased
Client & Server on the same machine, the result shows around 50%
improvement in parallel run with worker 4 and 8. We don’t see
the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we
don’t see any major benefit in performance. This testing result
matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see
the CPU usage and other information. What we noticed is that
server is consuming the CPU almost 100% whole the time and
pg_stat_activity shows that server is busy with ClientWrite most
of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the
benchmarking.
Thanks to Rushabh Lathia for helping me with this testing.
On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila
<amit.kapil...@gmail.com <mailto:amit.kapil...@gmail.com>> wrote:
On Mon, Apr 27, 2020 at 10:23 PM David Zhang
<david.zh...@highgo.ca <mailto:david.zh...@highgo.ca>> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with
and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and
the other
> one for pg_basebackup with the same machine configuration
showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel
like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a
lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting
frequently
scheduled out and it slowed down the overall backup process.
And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster.
One idea
to find this out is to check how much time backup takes when
we run it
with and without pgbench workload on HEAD (aka unpatched
code). Even
if what I am saying is true or there is some other reason due
to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple
backends for
backup but still, it is good to find that information as it
might help
in designing this feature better.
> The perf report files for both Postgres server and
pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present
in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see
how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
--
Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
--
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca <http://www.highgo.ca>
--
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca