pg_rewind : feature to rewind promoted standby is broken!
I think pg_rewind's feature to rewind the promoted standby as a new standby is broken in 11 STEPS: 1. create master standby setup. Use below script for same. 2. Promote the standby [mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby promote waiting for server to promote done server promoted 3. In promoted standby create a database and a table in the new database. [mithuncy@localhost pgrewmasterbin]$ ./bin/psql -p 5433 postgres postgres=# create database db1; CREATE DATABASE postgres=# \c db1 You are now connected to database "db1" as user "mithuncy". db1=# create table t1 (t int); CREATE TABLE 4. try to rewind the newly promoted standby (with old master as source) [mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby stop waiting for server to shut down... done server stopped [mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" servers diverged at WAL location 0/360 on timeline 1 rewinding from last common checkpoint at 0/260 on timeline 1 could not remove directory "standby/base/16384": Directory not empty Failure, exiting Note: dry run was successful! [mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" -n servers diverged at WAL location 0/360 on timeline 1 rewinding from last common checkpoint at 0/260 on timeline 1 Done! Also I have tested same in version 10 it works fine there. Did below commit has broken this feature? (Thanks to kuntal for identifying same) commit 266b6acb312fc440c1c1a2036aa9da94916beac6 Author: Fujii Masao Date: Thu Mar 29 04:56:52 2018 +0900 Make pg_rewind skip files and directories that are removed during server start. -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com standby-server-setup.sh Description: Bourne shell script
Re: pg_rewind : feature to rewind promoted standby is broken!
On Wed, Mar 13, 2019 at 1:38 PM Michael Paquier wrote: > On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote: > > And you are pointing out to the correct commit. The issue is that > > process_target_file() has added a call to check_file_excluded(), and > > this skips all the folders which it thinks can be skipped. One > > problem though is that we also filter out pg_internal.init, which is > > present in each database folder, and remains in the target directory > > marked for deletion. Then, when the deletion happens, the failure > > happens as the directory is not fully empty. > > Okay, here is a refined patch with better comments, the addition of a > test case (creating tables in the new databases in 002_databases.pl is > enough to trigger the problem). > I have not looked into the patch but quick test show it has fixed the above issue. [mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" -n servers diverged at WAL location 0/300 on timeline 1 rewinding from last common checkpoint at 0/260 on timeline 1 Done! [mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" servers diverged at WAL location 0/300 on timeline 1 rewinding from last common checkpoint at 0/260 on timeline 1 Done! -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy wrote: > Thanks Hans, for a simple reproducible tests. > > The "worker.bgw_restart_time" is never set for autoprewarm workers so on > error it get restarted after some period of time (default behavior). Since > database itself is dropped our attempt to connect to that database failed > and then worker exited. But again got restated by postmaster then we start > seeing above DSM segment error. > > I think every autoprewarm worker should be set with > "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be > repeated prewarm attempt of a dropped database. I will try to think further > and submit a patch for same. > Here is the patch for same, autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown. The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop. I think there is no need to restart at all. Following are the normal error we might encounter. 1. Connecting database is droped -- So we need to skip to next database which master will do by starting a new wroker. So not needed. 2. Relation is droped -- try_relation_open(reloid, AccessShareLock) is used so error due to dropped relation is handled also avoids concurrent truncation. 3. smgrexists is used before reading from a fork file. Again error is handled. 4. before reading the block we have check as below. So previously truncated pages will not be read again. /* Check whether blocknum is valid and within fork file size. */ if (blk->blocknum >= nblocks) I think if any other unexpected errors occurs that should be fatal so restarting will not be correcting same. Hence there is no need to restart the per database worker process. I tried to dig why we did not set it earlier. It used to be never restart, but it changed after fixing comments [1]. At that time we did not make explicit database connection per worker and did not handle many error cases as now. So it appeared fair. But, when code changed to make database connection per worker, we should have set every worker with BGW_NEVER_RESTART. Which I think was a mistake. NOTE : On zero exit status we will not restart the bgworker (see @CleanupBackgroundWorker@ and @maybe_start_bgworkers@) [1] https://www.postgresql.org/message-id/CA%2BTgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g%40mail.gmail.com -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com never_restart_apw_worker_01.patch Description: Binary data
Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Thanks Robert, On Mon, Mar 18, 2019 at 9:01 PM Robert Haas wrote: > On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy wrote: > > autoprewarm waorker should not be restarted. As per the code > @apw_start_database_worker@ master starts a worker per database and wait > until it exit by calling WaitForBackgroundWorkerShutdown. The call > WaitForBackgroundWorkerShutdown cannot handle the case if the worker was > restarted. The WaitForBackgroundWorkerShutdown() get the status > BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got > restarted. So master will next detach the shared memory and next restarted > worker keep failing going in a unending loop. > > Ugh, that seems like a silly oversight. Does it fix the reported problem? > -- Yes this fixes the reported issue, Hans Buschmann has given below steps to reproduce. > This seems easy to reproduce: > > - Install/create a database with autoprewarm on and pg_prewarm loaded. > - Fill the autoprewarm cache with some data > - pg_dump the database > - drop the database > - create the database and pg_restore it from the dump > - start the instance and logs are flooded -- It is explained earlier [1] that they used older autoprewarm.blocks which was generated before drop database. So on restrart autoprewarm worker failed to connect to droped database and then lead to retry loop. This patch should fix same. NOTE : Also, another kind of error user might see because of same bug is, restarted worker getting connected to next database in autoprewarm.blocks because autoprewarm master updated shared data "apw_state->database = current_db;" to start new worker for next database. Both restarted worker and newly created worker will connect to same database(next one) and try to load same pages. Hence end up with spurious log messages like "LOG: autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks" If I understand correctly, the commit message would be something like this: > > == > Don't auto-restart per-database autoprewarm workers. > > We should try to prewarm each database only once. Otherwise, if > prewarming fails for some reason, it will just keep retrying in an > infnite loop. The existing code was intended to implement this > behavior, but because it neglected to set worker.bgw_restart_time, the > per-database workers keep restarting, contrary to what was intended. > > Mithun Cy, per a report from Hans Buschmann > == > > Does that sound right? > -- Yes I Agree. [1] https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E202569F21%40gigant.nidsa.net -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Tue, Dec 19, 2017 at 5:52 AM, Masahiko Sawada wrote: > On Mon, Dec 18, 2017 at 2:04 PM, Masahiko Sawada > wrote: >> On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas wrote: >>> >>> I have to admit that result is surprising to me. >> >> I think the environment I used for performance measurement did not >> have enough resources. I will do the same benchmark on an another >> environment to see if it was a valid result, and will share it. >> > I did performance measurement on an different environment where has 4 > cores and physically separated two disk volumes. Also I've change the > benchmarking so that COPYs load only 300 integer tuples which are not > fit within single page, and changed tables to unlogged tables to > observe the overhead of locking/unlocking relext locks. I ran same test as asked by Robert it was just an extension of tests [1] pointed by Amit Kapila, Machine : cthulhu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):128 On-line CPU(s) list: 0-127 Thread(s) per core:2 Core(s) per socket:8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family:6 Model: 47 Model name:Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 CPU max MHz: 2129. CPU min MHz: 1064. BogoMIPS: 4266.59 Virtualization:VT-x Hypervisor vendor: vertical Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 It has 2 discs with different filesytem as below /dev/mapper/vg_mag-data2ext4 5.1T 3.6T 1.2T 76% /mnt/data-mag2 /dev/mapper/vg_mag-data1xfs 5.1T 1.6T 3.6T 31% /mnt/data-mag I have created 2 tables each one on above filesystem. test_size_copy.sh --> automated script to run copy test. copy_script1, copy_script2 -> copy pg_bench script's used by test_size_copy.sh to load to 2 different tables. To run above copy_scripts in parallel I have run it with equal weights as below. ./pgbench -c $threads -j $threads -f copy_script1@1 -f copy_script2@1 -T 120 postgres >> test_results.txt Results : --- ClientsHEAD-TPS ---- 184.460734 2121.359035 4175.886335 8268.764828 16 369.996667 32 439.032756 64 482.185392 ClientsN_RELEXTLOCK_ENTS = 1024%diff with DEAD -- 187.1657773.20272258112273 2131.0940378.02165409439848 4181.6671043.2866504381935 8267.412856-0.503031594595423 16376.1186711.65461058058666 32460.7563574.94805927419228 64492.7239752.18558736428913 Not much of an improvement from HEAD ClientsN_RELEXTLOCK_ENTS = 1%diff with HEAD - 186.2885742.16412990206786 2131.3986678.27266960387414 4168.681079-4.09654109854526 8245.841999-8.52895416806549 16321.972147-12.9797169226933 32375.783299-14.4065462395703 64360.134531-25.3120196142317 So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? [1]https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com test_size_copy.sh Description: Bourne shell script copy_script1 Description: Binary data copy_script2 Description: Binary data
Possible performance regression in version 10.1 with pgbench read-write tests.
Hi all, When I was trying to do read-write pgbench bench-marking of PostgreSQL 9.6.6 vs 10.1 I found PostgreSQL 10.1 regresses against 9.6.6 in some cases. Non Default settings and test == Server: ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 & Pgbench: CASE 1: when data fits shared buffers. ./pgbench -i -s 1000 postgres CASE 2: when data exceeds shared buffers. ./pgbench -i -s 1000 postgres ./pgbench -c $threads -j $threads -T 1800 -M prepared postgres Script "perf_buff_mgmt_write-2.sh" which is added below can be used to run same. Machine : "cthulhu" 8 node numa machine with 128 hyper threads. === >numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103 node 0 size: 65498 MB node 0 free: 37885 MB node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111 node 1 size: 65536 MB node 1 free: 31215 MB node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119 node 2 size: 65536 MB node 2 free: 15331 MB node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127 node 3 size: 65536 MB node 3 free: 36774 MB node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40 node 4 size: 65536 MB node 4 free: 62 MB node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48 node 5 size: 65536 MB node 5 free: 9653 MB node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56 node 6 size: 65536 MB node 6 free: 50209 MB node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64 node 7 size: 65536 MB node 7 free: 43966 MB CASE 1: In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS : 35554.573858 and in 10.1 at 72 clients TPS dips to 26882.828133 so nearly 23% decrease in TPS. CASE 2: In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS : 24861.074079 and in 10.1 at 72 clients TPS dips to 18372.565663 so nearly 26% decrease in TPS. Added "Postgresql_benchmarking_9.6vs10.ods" which gives more detailed TPS numbers. And, TPS is median of 3 runs result. I have not run bisect yet to find what has caused the issue. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com Postgresql_benchmarking_9.6vs10.ods Description: application/vnd.oasis.opendocument.spreadsheet perf_buff_mgmt_write-2.sh Description: Bourne shell script
Re: Possible performance regression in version 10.1 with pgbench read-write tests.
On Wed, Jan 24, 2018 at 7:36 AM, Amit Kapila wrote: > Both the cases look identical, but from the document attached, it > seems the case-1 is for scale factor 300. Oops sorry it was a typo. CASE 1 is scale factor 300 which will fit in shared buffer =8GB. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
DROP OWNED CASCADE vs Temp tables
I have a test where a user creates a temp table and then disconnect, concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems this causes race condition between temptable deletion during disconnection (@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation which will try to remove same temp table when they find them as part of pg_shdepend. Which will result in internal error cache lookup failed as below. DROP OWNED BY test_role CASCADE; 2020-01-07 12:35:06.524 IST [26064] ERROR: cache lookup failed for relation 41019 2020-01-07 12:35:06.524 IST [26064] STATEMENT: DROP OWNED BY test_role CASCADE; reproduce.sql:8: ERROR: cache lookup failed for relation 41019 TEST = create database test_db; create user test_superuser superuser; \c test_db test_superuser CREATE ROLE test_role nosuperuser login password 'test_pwd' ; \c test_db test_role CREATE TEMPORARY TABLE tmp_table(col1 int); \c test_db test_superuser DROP OWNED BY test_role CASCADE; -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: mysql_fdw crash
On Tue, Nov 20, 2018 at 7:59 PM Tomas Vondra wrote: > > On 11/20/18 3:06 PM, 066ce...@free.fr wrote: > > Hi, > > > >> When gdb will be active, then use command c, and then run query in > >> session. gdb should to catch segfault. > > > > Thank you very much. It's been helpfull. > > > > BTW behaviour is strange. When I'm executing following, I do have always a > > SEGV : > > > > psql (11.1) > > Type "help" for help. > > > > herve=# CREATE OR REPLACE FUNCTION public.test_bug2(text,integer,timestamp > > with time zone) > > herve-# RETURNS integer > > herve-# > > herve-# AS ' > > herve'# > > herve'# select coalesce(max(id),1) from sact_v1.autocalls where > > label=$1 and machine_id=$2 and created_date=$3; > > herve'# ' > > herve-# LANGUAGE sql; > > CREATE FUNCTION > > herve=# select test_bug2('BSM_CRITICAL_SYSLOG',18843,now()); > > > > The GDB session : > > > > Continuing. > > > > Program received signal SIGSEGV, Segmentation fault. > > prepare_query_params (param_types=0x1c86ac8, param_values=0x1c86ac0, > > param_exprs=0x1c86ab8, param_flinfo=0x1c86ab0, numParams=3, > > fdw_exprs=0x1c6b5b8, node=0x1c792d8) at mysql_fdw.c:2139 > > 2139*param_types[i] = exprType(param_expr); > > (gdb) bt > > So which part of that expression triggers the segfault? Try printing the > different parts, i.e. > > p i > p param_types[i] > > It might be helpful to also install the debug package, which would give > us more readable backtraces. > > BTW, considering the failure is in mysql_fdw.c, this very much seems > like a bug in mysql_fdw - have you tried reporting it through the > project github repository? > > https://github.com/EnterpriseDB/mysql_fdw/issues > > That's probably more likely to help, and even if we find a bug here we > can't really commit that (perhaps some of the mysql_fdw authors are > watching this list, but I'm not sure about that). Thanks for reporting, Oid **param_types) { int i; ListCell *lc; Assert(numParams > 0); /* Prepare for output conversion of parameters used in remote query. */ *param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams); *param_types = (Oid *) palloc0(sizeof(Oid) * numParams); i = 0; foreach(lc, fdw_exprs) { Node *param_expr = (Node *) lfirst(lc); Oid typefnoid; boolisvarlena; *param_types[i] = exprType(param_expr); Seems some basic mistake I think it should as below (*param_types)[i] = exprType(param_expr); After this it works postgres=# select test_bug2('BSM_CRITICAL_SYSLOG',18843,now()); test_bug2 --- 1 (1 row) -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: zheap: a new storage format for PostgreSQL
> On Thu, Mar 1, 2018 at 7:39 PM Amit Kapila wrote: I did some testing for performance of COPY command for zheap against heap, here are my results, Machine : cthulhu, (is a 8 node numa machine with 500GB of RAM) server non default settings: shared buffers 32GB, max_wal_size = 20GB, min_wal_size = 15GB Test tables and data: I have used pgbench_accounts table of pgbench tool as data source with 3 different scale factors 100, 1000, 2000. Both heap and zheap table is lookalike of pgbench_accounts CREATE TABLE pgbench_zheap (LIKE pgbench_accounts) WITH (storage_engine='zheap'); CREATE TABLE pgbench_heap (LIKE pgbench_accounts) WITH (storage_engine='heap'); Test Commands: Command to generate datafile: COPY pgbench_accounts TO '/mnt/data-mag/ mithun.cy/zheapperfbin/bin/pgbench.data'; Command to load from datafile: COPY pgbench_heap FROM '/mnt/data-mag/ mithun.cy/zheapperfbin/bin/pgbench.data'; -- heap table COPY pgbench_zheap FROM '/mnt/data-mag/ mithun.cy/zheapperfbin/bin/pgbench.data'; -- zheap table Results == Scale factor : 100 zheap table size : 1028 MB heap table size: 1281 MB -- table size reduction: 19% size reduction. zheap wal size: 1007 MB heap wal size: 1024 MB -- wal size difference: 1.6% size reduction. zheap COPY execution time: 24869.451 ms heap COPY execution time: 25858.773 ms -- % of improvement -- 3.8% reduction in execution time for zheap Scale factor : 1000 - zheap table size : 10 GB heap table size: 13 GB -- table size reduction: 23% size reduction. zheap wal size: 10071 MB heap wal size: 10243 MB -- wal size difference: 1.67% size reduction. zheap COPY execution time: 270790.235 ms heap COPY execution time: 280325.632 ms -- % of improvement -- 3.4% reduction in execution time for zheap Scale factor : 2000 - zheap table size : 20GB heap table size: 25GB -- table size reduction: 20% size reduction. zheap wal size: 20142 MB heap wal size: 20499 MB -- wal size difference: 1.7% size reduction. zheap COPY execution time: 523702.904 ms heap COPY execution time: 537537.720 ms -- % of improvement -- 2.5 % reduction in execution time for zheap COPY command seems to have improved very slightly with zheap in both with size of wal and execution time. I also did some tests with insert statement where I could see some regression in zheap when compared to heap with respect to execution time. With further more investigation I will reply here. -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: zheap: a new storage format for PostgreSQL
On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila wrote: > > On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule wrote: > > > > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy > > napsal: > >> > >> COPY command seems to have improved very slightly with zheap in both with > >> size of wal and execution time. I also did some tests with insert > >> statement where I could see some regression in zheap when compared to heap > >> with respect to execution time. With further more investigation I will > >> reply here. > >> > > > > 20% of size reduction looks like effect of fill factor. > > > > I think it is because of smaller zheap tuple sizes. Mithun can tell > more about setup whether he has used different fillfactor or anything > else which could lead to such a big difference. Yes default fillfactor is unaltered, zheap tuples sizes are less and alinged each at 2 Bytes Length of each item. (all Items are identical) = postgres=# SELECT lp_len FROM zheap_page_items(get_raw_page('pgbench_zheap', 9)) limit 1; lp_len 102 (1 row) postgres=# SELECT lp_len FROM heap_page_items(get_raw_page('pgbench_heap', 9)) limit 1; lp_len 121 (1 row) Total tuples per page = postgres=# SELECT count(*) FROM zheap_page_items(get_raw_page('pgbench_zheap', 9)); count --- 76 (1 row) postgres=# SELECT count(*) FROM heap_page_items(get_raw_page('pgbench_heap', 9)); count --- 61 (1 row) because of this zheap takes less space as reported above. -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Dec 8, 2018 at 6:35 PM Amit Kapila wrote: > > On Fri, Dec 7, 2018 at 7:25 PM John Naylor wrote: > > > > On 12/6/18, Amit Kapila wrote: > > > On Thu, Dec 6, 2018 at 10:53 PM John Naylor wrote: > > >> > > >> I've added an additional regression test for finding the right block I did run some performance tests on the latest patch v11, I see small regression in execution time of COPY statement. Tests I have used is same as provided in [1] just that I ran it for fill factor 20 and 70. Here are my results! Machine : cthulhu (Intel based 8 numa machine) Server setting is default, configured with HEAP_FSM_CREATION_THRESHOLD = 4, Entire data directory was on HDD. Results are execution time(unit ms) taken by copy statement when number of records equal to exact number which fit HEAP_FSM_CREATION_THRESHOLD = 4 pages. For fill factor 20 it is till tid (3, 43) and for scale factor 70 till tid (3, 157). Result is taken as a median of 10 runs. Fill factor 20 Tables BasePatch % of increase in execution time 500121.97 125.315 2.7424776584 1000 246.592253.789 2.9185861666 Fill factor 70 500211.502217.128 2.6600221275 1000 420.309432.606 2.9257046601 So 2-3% consistent regression, And on every run I can see for patch v11 execution time is slightly more than base. I also tried to insert more records till 8 pages and same regression is observed! So I guess even HEAP_FSM_CREATION_THRESHOLD = 4 is not perfect! [1] https://www.postgresql.org/message-id/CAJVSVGX%3D2Q52fwijD9cjeq1UdiYGXns2_9WAPFf%3DE8cwbFCDvQ%40mail.gmail.com -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Dec 6, 2018 at 10:53 PM John Naylor wrote: > On 12/3/18, Amit Kapila wrote: > > fsm_local_set is being called from RecordAndGetPageWithFreeSpace and > > GetPageWithFreeSpace whereas the change we have discussed was specific > > to GetPageWithFreeSpace, so not sure if we need any change in > > fsm_local_set. I have some minor comments for pg_upgrade patch 1. Now we call stat main fork file in transfer_relfile() +sret = stat(old_file, &statbuf); +/* Save the size of the first segment of the main fork. */ +if (type_suffix[0] == '\0' && segno == 0) +first_seg_size = statbuf.st_size; But we do not handle the case if stat has returned any error! 2. src/bin/pg_upgrade/pg_upgrade.h char *relname; + +charrelkind;/* relation relkind -- see pg_class.h */ I think we can remove the added empty line. -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
Hi John Naylor, On Tue, Jan 8, 2019 at 2:27 AM John Naylor wrote: > I've attached two patches for testing. Each one applies on top of the > current patch. Thanks for the patch, I did a quick test for both of the patches same tests as in [1], now for fillfactors 20, 70, 100 (Note for HEAP_FSM_CREATION_THRESHOLD = 4 highest tid inserted was 20 fillfactor is (3,43), for 70 fillfactor is (3, 157) and for 100 fillfactor is (3, 225), so exactly 4 pages are used) Machine : cthulhu, same as before [2] and server settings is default. Test: COPY command as in [1], for 500 tables. Fill factor 20 execution time in ms%increase in execution time Base 119.238 v11-all-pages121.974 2.2945705228 v11-Every-other-page 114.455 -4.0113051209 v11-last-page113.573 -4.7510021973 Fill factor 70 execution time in ms %increase in execution time Base 209.991 v11-all-pages 211.0760.5166888105 v11-Every-other-page206.476 -1.6738812616 v11-last-page 203.591 -3.0477496655 Fill factor 100 execution time in ms%increase in execution time Base 269.691 v11-all-pages 270.078 0.1434975583 v11-Every-other-page262.691-2.5955630703 v11-last-page 260.293-3.4847288193 Observations 1. Execution time of both base and v11-all-pages patch has improved than my earlier results [2]. But still v11-all-pages is slightly behind base. 2. v11-Every-other-page and v11-last-page patches improve the performance from base. 3. IMHO v11-Every-other-page would be ideal to consider it improves the performance and also to an extent avoid expansion if space is already available. [1] https://www.postgresql.org/message-id/CAJVSVGX%3D2Q52fwijD9cjeq1UdiYGXns2_9WAPFf%3DE8cwbFCDvQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAD__Ouj%3Dat4hy2wYidK90v92qSRLjU%2BQe4y-PwfjLLeGkhc6ZA%40mail.gmail.com -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Jan 11, 2019 at 3:54 AM John Naylor wrote: > > On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila wrote: > > Thanks, Mithun for performance testing, it really helps us to choose > > the right strategy here. Once John provides next version, it would be > > good to see the results of regular pgbench (read-write) runs (say at > > 50 and 300 scale factor) and the results of large copy. I don't think > > there will be any problem, but we should just double check that. > > Attached is v12 using the alternating-page strategy. I've updated the > comments and README as needed. In addition, I've Below are my performance tests and numbers Machine : cthulhu Tests and setups Server settings: max_connections = 200 shared_buffers=8GB checkpoint_timeout =15min maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 min_wal_size=15GB and max_wal_size=20GB. pgbench settings: --- read-write settings (TPCB like tests) ./pgbench -c $threads -j $threads -T $time_for_reading -M prepared postgres scale factor 50 -- median of 3 TPS clients v12-patch base patch% diff 1 826.081588 834.328238 -0.9884179421 16 10805.807081 10800.6628050.0476292621 32 19722.277019 19641.5466280.4110185034 64 30232.681889 30263.616073 -0.1022157561 scale factor 300 -- median of 3 TPS clients v12-patch base patch% diff 1 813.646062 822.18648 -1.038744641 16 11379.02870211277.055860.9042505709 32 21688.08409321613.044463 0.3471960192 64 36288.85711 36348.6178 -0.1644098005 Copy command Test: setup ./psql -d postgres -c "COPY pgbench_accounts TO '/mnt/data-mag/ mithun.cy/fsmbin/bin/dump.out' WITH csv" ./psql -d postgres -c "CREATE UNLOGGED TABLE pgbench_accounts_ulg (LIKE pgbench_accounts) WITH (fillfactor = 100);" Test run: TRUNCATE TABLE pgbench_accounts_ulg; \timing COPY pgbench_accounts_ulg FROM '/mnt/data-mag/mithun.cy/fsmbin/bin/dump.out' WITH csv; \timing execution time in ms. (scale factor indicates size of pgbench_accounts) scale factor v12-patchbase patch % diff 300 77166.407 77862.041 -0.8934186557 50 13329.233 13284.583 0.3361038882 So for large table tests do not show any considerable performance variance from base code! On Fri, Jan 11, 2019 at 3:54 AM John Naylor wrote: > On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila > wrote: > > Thanks, Mithun for performance testing, it really helps us to choose > > the right strategy here. Once John provides next version, it would be > > good to see the results of regular pgbench (read-write) runs (say at > > 50 and 300 scale factor) and the results of large copy. I don't think > > there will be any problem, but we should just double check that. > > Attached is v12 using the alternating-page strategy. I've updated the > comments and README as needed. In addition, I've > > -handled a possible stat() call failure during pg_upgrade > -added one more assertion > -moved the new README material into a separate paragraph > -added a comment to FSMClearLocalMap() about transaction abort > -corrected an outdated comment that erroneously referred to extension > rather than creation > -fleshed out the draft commit messages > -- Thanks and Regards Mithun Chicklore Yogendra EnterpriseDB: http://www.enterprisedb.com
Re: Possible performance regression in version 10.1 with pgbench read-write tests.
Hi Andres, On Fri, Jul 20, 2018 at 1:21 AM, Andres Freund wrote: > Hi, > > On 2018-01-24 00:06:44 +0530, Mithun Cy wrote: > > Server: > > ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c > > max_wal_size=20GB -c checkpoint_timeout=900 -c > > maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 & > > Which kernel & glibc version does this server have? > [mithun.cy@cthulhu ~]$ cat /proc/version Linux version 3.10.0-693.5.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Fri Oct 20 20:32:50 UTC 2017 [mithun.cy@cthulhu ~]$ ldd --version ldd (GNU libc) 2.17 -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: Possible performance regression in version 10.1 with pgbench read-write tests.
On Fri, Jul 20, 2018 at 10:52 AM, Thomas Munro < thomas.mu...@enterprisedb.com> wrote: > On Fri, Jul 20, 2018 at 7:56 AM, Tom Lane wrote: > > > > It's not *that* noticeable, as I failed to demonstrate any performance > > difference before committing the patch. I think some more investigation > > is warranted to find out why some other people are getting different > > results > Maybe false sharing is a factor, since sizeof(sem_t) is 32 bytes on > Linux/amd64 and we're probably hitting elements clustered at one end > of the array? Let's see... I tried sticking padding into > PGSemaphoreData and I got ~8% more TPS (72 client on multi socket > box, pgbench scale 100, only running for a minute but otherwise the > same settings that Mithun showed). > > --- a/src/backend/port/posix_sema.c > +++ b/src/backend/port/posix_sema.c > @@ -45,6 +45,7 @@ > typedef struct PGSemaphoreData > { > sem_t pgsem; > + charpadding[PG_CACHE_LINE_SIZE - sizeof(sem_t)]; > } PGSemaphoreData; > > That's probably not the right idiom and my tests probably weren't long > enough, but there seems to be some effect here. > I did a quick test applying the patch with same settings as initial mail I have reported (On postgresql 10 latest code) 72 clients CASE 1: Without Patch : TPS 29269.823540 With Patch : TPS 36005.544960.--- 23% jump Just Disabling using unnamed POSIX semaphores: TPS 34481.207959 So it seems that is the issue as the test is being run on 8 node numa machine. I also came across a presentation [1] : slide 20 which says one of those futex architecture is bad for NUMA machine. I am not sure the new fix for same is included as part of Linux version 3.10.0-693.5.2.el7.x86_64 which is on my test machine. [1] https://www.slideshare.net/davidlohr/futex-scaling-for-multicore-systems -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: Pluggable Storage - Andres's take
On Mon, Sep 10, 2018 at 7:33 PM, Amit Kapila wrote: > On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi > wrote: >> >> On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi >> wrote: >>> >> pg_stat_get_tuples_hot_updated and others: >> /* >> * Counter tuples_hot_updated stores number of hot updates for heap table >> * and the number of inplace updates for zheap table. >> */ >> if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL || >> RelationStorageIsZHeap(rel)) >> result = 0; >> else >> result = (int64) (tabentry->tuples_hot_updated); >> >> >> Is the special condition is needed? The values should be 0 because of zheap >> right? >> > > I also think so. Beena/Mithun has worked on this part of the code, so > it is better if they also confirm once. Yes pg_stat_get_tuples_hot_updated should return 0 for zheap. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com