[HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
B_NAME] The averages of running pgbench three times are: wal_sync_method=fdatasync: tps = 43,179 wal_sync_method=pmem_drain: tps = 45,254 C-5-2. pclinet_thread: my insert benchmark Preparation CREATE TABLE [TABLE_NAME] (id int8, value text); ALTER TABLE [TABLE_NAME] ALTER value SET STORAGE external; PREPARE insert_sql (int8) AS INSERT INTO %s (id, value) values ($1, ' [1K_data]'); Execution BEGIN; EXECUTE insert_sql(%lld); COMMIT; Note: I ran this quer 5M times with 32 threads. # ./pclient_thread Invalid Arguments: Usage: ./pclient_thread [The number of threads] [The number to insert tuples] [data size(KB)] # numactl -N 1 ./pclient_thread 32 5242880 1 The averages of running this benchmark three times are: wal_sync_method=fdatasync: tps = 67,780 wal_sync_method=pmem_drain: tps = 131,962 -- Yoshimi Ichiyanagi 0001-Add-configure-option-for-PMDK.patch Description: Binary data 0002-Read-write-WAL-files-using-PMDK.patch Description: Binary data 0003-Walreceiver-WAL-IO-using-PMDK.patch Description: Binary data
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Thank you for your reply. Wed, 17 Jan 2018 15:29:11 -0500Robert Haas wrote : >> Using pgbench which is a PostgreSQL general benchmark, the postgres server >> to which the patches is applied is about 5% faster than original server. >> And using my insert benchmark, it is up to 90% faster than original one. >> I will describe these details later. > >Interesting. But your insert benchmark looks highly artificial... in >real life, you would not insert the same long static string 160 >million times. Or if you did, you would use COPY or INSERT .. SELECT. I made this benchmark in order to put very heavy WAL I/O load on PMEM. PMEM is very fast. I ran the micro-benchmark test like fio on PMEM. This workload involved 8K Bytes-block synchronous sequential writes, and the total write size was 40G Bytes. The micro-benchmark result was the following. Using DAX FS(like fdatasync): 5,559 MB/sec Using DAX FS and PMDK(like pmem_drain): 13,177 MB/sec Using pgbench, the postgres server to which my patches were applied was only 5% faster than the original server. >> The averages of running pgbench three times are: >> wal_sync_method=fdatasync: tps = 43,179 >> wal_sync_method=pmem_drain: tps = 45,254 While this pgbench was running, the utilization of 8 CPU cores(on which the postgres server was runnnig) was about 800%, and the throughput of WAL I/O was about 10 MB/sec. I thought that pgbench was not enough to put heavy WAL I/O load on PMEM. So I made and ran the WAL I/O intensive test. Do you know any good WAL I/O intensive benchmarks? DBT2? Wed, 17 Jan 2018 15:40:25 -0500Robert Haas wrote : >> C-5. Running the 2 benchmarks(1. pgbench, 2. my insert benchmark) >> C-5-1. pgbench >> # numactl -N 1 pgbench -c 32 -j 8 -T 120 -M prepared [DB_NAME] >> >> The averages of running pgbench three times are: >> wal_sync_method=fdatasync: tps = 43,179 >> wal_sync_method=pmem_drain: tps = 45,254 > >What scale factor was used for this test? This scale factor was 200. # numactl -N 0 pgbench -s 200 -i [DB_NAME] >Was the only non-default configuration setting wal_sync_method? i.e. >synchronous_commit=on? No change to max_wal_size? No, I used the following parameter in postgresql.conf to prevent checkpoints from occurring while running the tests. # - Settings - wal_level = replica fsync = on synchronous_commit = on wal_sync_method = pmem_drain full_page_writes = on wal_compression = off # - Checkpoints - checkpoint_timeout = 1d max_wal_size = 20GB min_wal_size = 20GB >This seems like an exceedingly short test -- normally, for write >tests, I recommend the median of 3 30-minute runs. It also seems >likely to be client-bound, because of the fact that jobs = clients/4. >Normally I use jobs = clients or at least jobs = clients/2. > Thank you for your kind proposal. I did that. # numactl -N 0 pgbench -s 200 -i [DB_NAME] # numactl -N 1 pgbench -c 32 -j 32 -T 1800 -M prepared [DB_NAME] The averages of running pgbench three times are: wal_sync_method=fdatasync: tps = 39,966 wal_sync_method=pmem_drain: tps = 41,365 -- Yoshimi Ichiyanagi
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Fri, 19 Jan 2018 09:42:25 -0500Robert Haas wrote : > >I think that you really need to include the checkpoints in the tests. >I would suggest setting max_wal_size and/or checkpoint_timeout so that >you reliably complete 2 checkpoints in a 30-minute test, and then do a >comparison on that basis. Experimental setup: - Server: HP ProLiant DL360 Gen9 CPU:Xeon E5-2667 v4 (3.20GHz); 2 processors(without HT) DRAM: DDR4-2400; 32 GiB/processor (8GiB/socket x 4 sockets/processor) x 2 processors NVDIMM: DDR4-2133; 32 GiB/processor (node 0: 8GiB/socket x 2 sockets/processor, node 1: 8GiB/socket x 6 sockets/processor) HDD:Seagate Constellation2 2.5inch SATA 3.0. 6Gb/s 1TB 7200rpm x 1 SATA-SSD: Crucial_CT500MX200SSD1 (SATA 3.2, SATA 6Gb/s) OS: Ubuntu 16.04, linux-4.12 DAX FS: ext4 PMDK: master(at)Aug 30, 2017 PostgreSQL: master Note: I bound the postgres processes to one NUMA node, and the benchmarks to other NUMA node. - postgresql.conf - # - Settings - wal_level = replica fsync = on synchronous_commit = on wal_sync_method = pmem_drain/fdatasync/open_datasync full_page_writes = on wal_compression = off # - Checkpoints - checkpoint_timeout = 12min max_wal_size = 20GB min_wal_size = 20GB - Executed commands: # numactl -N 1 pg_ctl start -D [PG_DIR] -l [LOG_FILE] # numactl -N 0 pgbench -s 200 -i [DB_NAME] # numactl -N 0 pgbench -c 32 -j 32 -T 1800 -r [DB_NAME] -M prepared The results: A) Applied the patches to PG src, and compiled PG with libpmem B) Applied the patches to PG src, and compiled PG without libpmem C) Original PG The averages of running pgbench three times on *PMEM* are: A) wal_sync_method = pmem_drain tps = 41660.42524 wal_sync_method = open_datasync tps = 39913.49897 wal_sync_method = fdatasync tps = 39900.83396 C) wal_sync_method = open_datasync tps = 40335.50178 wal_sync_method = fdatasync tps = 40649.57772 The averages of running pgbench three times on *SATA-SSD* are: B) wal_sync_method = open_datasync tps = 7224.07146 wal_sync_method = fdatasync tps = 7222.19177 C) wal_sync_method = open_datasync tps = 7258.79093 wal_sync_method = fdatasync tps = 7263.19878 >From the above results, it show that wal_sync_method=pmem_drain was about faster than wal_sync_method=open_datasync/fdatasync. When pgbench ran on SATA-SSD, wal_sync_method=fdatasync was as fast as wal_sync_method=open_datasync. >> Do you know any good WAL I/O intensive benchmarks? DBT2? > >pgbench is quite a WAL-intensive benchmark; it is much more >write-heavy than what most systems experience in real life, at least >in my experience. Your comparison of DAX FS to DAX FS + PMDK is very >interesting, but in real life the bandwidth of DAX FS is already so >high -- and the latency so low -- that I think most real-world >workloads won't gain very much. At least, that is my impression based >on internal testing EnterpriseDB did a few months back. (Thanks to >Mithun and Kuntal for that work.) In the near future, many physical devices will send sensing data (IoT might allow devices to exhaust tens Giga network bandwidth). The amount of data inserted in the DB will significantly increase. I think that PMEM will be needed for use cases like IoT. Thu, 25 Jan 2018 09:30:45 -0500Robert Haas wrote : >Well, some day persistent memory may be a common enough storage >technology that such a change makes sense, but these days most people >have either SSD or spinning disks, where the change would probably be >a net negative. It seems more like something we might think about >changing in PG 20 or PG 30. > Oracle and Microsoft SQL Server suported PMEM [1][2]. I think it is not too early for PostgreSQL to support PMEM. [1] http://dbheartbeat.blogspot.jp/2017/11/doag-2017-oracle-18c-dbim-oracle.htm [2] https://www.snia.org/sites/default/files/PM-Summit/2018/presentations/06_PM_Summit_2018_Talpey-Final_Post-CORRECTED.pdf -- Yoshimi Ichiyanagi
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
>On Tue, Jan 30, 2018 at 3:37 AM, Yoshimi Ichiyanagi > wrote: >> Oracle and Microsoft SQL Server suported PMEM [1][2]. >> I think it is not too early for PostgreSQL to support PMEM. > >I agree; it's good to have the option available for those who have >access to the hardware. > >If you haven't added your patch to the next CommitFest, please do so. Thank you for your time. I added my patches to the CommitFest 2018-3. https://commitfest.postgresql.org/17/1485/ Oh by the way, we submitted this proposal(Introducing PMDK into PostgreSQL) to PGcon2018. If our proposal is accepted and you have time, please listen to our presentation. -- Yoshimi Ichiyanagi Mailto : ichiyanagi.yosh...@lab.ntt.co.jp
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
<20180301103641.tudam4mavba3g...@alap3.anarazel.de> Thu, 1 Mar 2018 02:36:41 -0800Andres Freund wrote : Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory >On 2018-02-05 09:59:25 +0900, Yoshimi Ichiyanagi wrote: >> I added my patches to the CommitFest 2018-3. >> https://commitfest.postgresql.org/17/1485/ > >Unfortunately this is the last CF for the v11 development cycle. This is >a major project submitted late for v11, there's been no code level >review, the goals aren't agreed upon yet, etc. So I'd unfortunately like >to move this to the next CF? I get it. I modified the status to "move to next CF". -- Yoshimi Ichiyanagi NTT laboratories
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
I'm sorry for the delay in replying your mail. <91411837-8c65-bf7d-7ca3-d69bdcb49...@iki.fi> Thu, 1 Mar 2018 18:40:05 +0800Heikki Linnakangas wrote : >Interesting. How does this compare with using good old mmap()? The libpmem's pmem_map_file() supported 2M/1G(the size of huge page) alignment, since it could reduce the number of page faults. In addition, libpmem's pmem_memcpy_nodrain() is the function to copy data using single instruction, multiple data(SIMD) instructions and NT store instructions(MOVNT). As a result, using these APIs is faster than using old mmap()/memcpy(). Please see the PGCon2018 presentation[1] for the details. [1] https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf <83eafbfd-d9c5-6623-2423-7cab1be38...@iki.fi> Fri, 20 Jul 2018 23:18:05 +0300Heikki Linnakangas wrote : >I think the way forward with this patch would be to map WAL segments >with plain old mmap(), and use msync(). If that's faster than the status >quo, great. If not, it would still be a good stepping stone for actually >using PMDK. I think so too. I wrote this patch to replace read/write syscalls with libpmem's API only. I believe that PMDK can make the current PostgreSQL faster. > If nothing else, it would provide a way to test most of the >code paths, without actually having a persistent memory device, or >libpmem. The examples at http://pmem.io/pmdk/libpmem/ actually sugest >doing exactly that: use libpmem to map a file to memory, and check if it >lives on persistent memory using libpmem's pmem_is_pmem() function. If >it returns yes, use pmem_drain(), if it return false, fall back to using >msync(). When PMEM_IS_PMEM_FORCE(the environment variable[2]) is set to 1, pmem_is_pmem() return yes. Linux 4.15 and more supported MAP_SYNC and MAP_SHARED_VALIDATE of mmap() flags to check if the mapped file is stored on PMEM. An application that used both flags in its mmap() call can be sure that MAP_SYNC is actually supported by both the kernel and the filesystem that the mapped file is stored in[3]. But pmem_is_pmem() doesn't support this mechanism for now. [2] http://pmem.io/pmdk/manpages/linux/v1.4/libpmem/libpmem.7.html [3] https://lwn.net/Articles/758594/ -- Yoshimi Ichiyanagi NTT Software Innovation Center e-mail : ichiyanagi.yosh...@lab.ntt.co.jp