pg_rewind : feature to rewind promoted standby is broken!

2019-03-12 Thread Mithun Cy
I think pg_rewind's feature to rewind the promoted standby as a new
standby is broken in 11

STEPS:
1. create master standby setup.
Use below script for same.

2. Promote the standby
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby promote
waiting for server to promote done
server promoted

3. In promoted standby create a database and a table in the new database.
[mithuncy@localhost pgrewmasterbin]$ ./bin/psql -p 5433 postgres
postgres=# create database db1;
CREATE DATABASE
postgres=# \c db1
You are now connected to database "db1" as user "mithuncy".
db1=# create table t1 (t int);
CREATE TABLE

4. try to rewind the newly promoted standby (with old master as source)
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby stop
waiting for server to shut down... done
server stopped
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy
dbname=postgres"
servers diverged at WAL location 0/360 on timeline 1
rewinding from last common checkpoint at 0/260 on timeline 1
could not remove directory "standby/base/16384": Directory not empty
Failure, exiting

Note: dry run was successful!
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy
dbname=postgres" -n
servers diverged at WAL location 0/360 on timeline 1
rewinding from last common checkpoint at 0/260 on timeline 1
Done!

Also I have tested same in version 10 it works fine there.

Did below commit has broken this feature? (Thanks to kuntal for
identifying same)
commit 266b6acb312fc440c1c1a2036aa9da94916beac6
Author: Fujii Masao 
Date:   Thu Mar 29 04:56:52 2018 +0900
Make pg_rewind skip files and directories that are removed during server start.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


standby-server-setup.sh
Description: Bourne shell script


Re: pg_rewind : feature to rewind promoted standby is broken!

2019-03-13 Thread Mithun Cy
On Wed, Mar 13, 2019 at 1:38 PM Michael Paquier  wrote:

> On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote:
> > And you are pointing out to the correct commit.  The issue is that
> > process_target_file() has added a call to check_file_excluded(), and
> > this skips all the folders which it thinks can be skipped.  One
> > problem though is that we also filter out pg_internal.init, which is
> > present in each database folder, and remains in the target directory
> > marked for deletion.  Then, when the deletion happens, the failure
> > happens as the directory is not fully empty.
>
> Okay, here is a refined patch with better comments, the addition of a
> test case (creating tables in the new databases in 002_databases.pl is
> enough to trigger the problem).
>

I have not looked into the patch but quick test show it has fixed the above
issue.

[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" -n
servers diverged at WAL location 0/300 on timeline 1
rewinding from last common checkpoint at 0/260 on timeline 1
Done!
[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres"
servers diverged at WAL location 0/300 on timeline 1
rewinding from last common checkpoint at 0/260 on timeline 1
Done!

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

2019-03-18 Thread Mithun Cy
t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy  wrote:

> Thanks Hans, for a simple reproducible tests.
>
> The  "worker.bgw_restart_time" is never set for autoprewarm workers so on
> error it get restarted after some period of time (default behavior). Since
> database itself is dropped our attempt to connect to that database failed
> and then worker exited. But again got restated by postmaster then we start
> seeing above DSM segment error.
>
> I think every autoprewarm worker should be set with
> "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
> repeated prewarm attempt of a dropped database. I will try to think further
> and submit a patch for same.
>

Here is the patch for same,

autoprewarm waorker should not be restarted. As per the code
@apw_start_database_worker@
master starts a worker per database and wait until it exit by calling
WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
cannot handle the case if the worker was restarted. The
WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call
GetBackgroundWorkerPid() if worker got restarted. So master will next
detach the shared memory and next restarted worker keep failing going in a
unending loop.

I think there is no need to restart at all. Following are the normal error
we might encounter.
1. Connecting database is droped -- So we need to skip to next database
which master will do by starting a new wroker. So not needed.
2. Relation is droped -- try_relation_open(reloid, AccessShareLock) is used
so error due to dropped relation is handled also avoids concurrent
truncation.
3. smgrexists is used before reading from a fork file. Again error is
handled.
4. before reading the block we have check as below. So previously truncated
pages will not be read again.
/* Check whether blocknum is valid and within fork file size. */
if (blk->blocknum >= nblocks)

I think if any other unexpected errors occurs that should be fatal so
restarting will not be correcting same. Hence there is no need to restart
the per database worker process.

I tried to dig why we did not set it earlier. It used to be never restart,
but it changed after fixing comments [1]. At that time we did not make
explicit database connection per worker and did not handle many error cases
as now. So it appeared fair. But, when code changed to make database
connection per worker, we should have set every worker with
BGW_NEVER_RESTART. Which I think was a mistake.

NOTE : On zero exit status we will not restart the bgworker (see
@CleanupBackgroundWorker@
and @maybe_start_bgworkers@)
[1]
https://www.postgresql.org/message-id/CA%2BTgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g%40mail.gmail.com
-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


never_restart_apw_worker_01.patch
Description: Binary data


Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

2019-03-18 Thread Mithun Cy
Thanks Robert,
On Mon, Mar 18, 2019 at 9:01 PM Robert Haas  wrote:

> On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy  wrote:
> > autoprewarm waorker should not be restarted. As per the code
> @apw_start_database_worker@ master starts a worker per database and wait
> until it exit by calling WaitForBackgroundWorkerShutdown.  The call
> WaitForBackgroundWorkerShutdown cannot handle the case if the worker was
> restarted. The WaitForBackgroundWorkerShutdown() get the status
> BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got
> restarted. So master will next detach the shared memory and next restarted
> worker keep failing going in a unending loop.
>
> Ugh, that seems like a silly oversight.  Does it fix the reported problem?
>

-- Yes this fixes the reported issue, Hans Buschmann has given below steps
to reproduce.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded

-- It is explained earlier [1] that they used older autoprewarm.blocks
which was generated before drop database. So on restrart autoprewarm worker
failed to connect to droped database and then lead to retry loop. This
patch should fix same.

NOTE : Also, another kind of error user might see because of same bug is,
restarted worker getting connected to next database in autoprewarm.blocks
because autoprewarm master updated shared data "apw_state->database =
current_db;" to start new worker for next database. Both restarted worker
and newly created worker will connect to same database(next one) and try to
load same pages. Hence end up with spurious log messages like  "LOG:
autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks"

If I understand correctly, the commit message would be something like this:
>
> ==
> Don't auto-restart per-database autoprewarm workers.
>
> We should try to prewarm each database only once.  Otherwise, if
> prewarming fails for some reason, it will just keep retrying in an
> infnite loop.  The existing code was intended to implement this
> behavior, but because it neglected to set worker.bgw_restart_time, the
> per-database workers keep restarting, contrary to what was intended.
>
> Mithun Cy, per a report from Hans Buschmann
> ==
>
> Does that sound right?
>

-- Yes I Agree.

[1]
https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E202569F21%40gigant.nidsa.net

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-01-01 Thread Mithun Cy
On Tue, Dec 19, 2017 at 5:52 AM, Masahiko Sawada  wrote:
> On Mon, Dec 18, 2017 at 2:04 PM, Masahiko Sawada  
> wrote:
>> On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas  wrote:
>>>
>>> I have to admit that result is surprising to me.
>>
>> I think the environment I used for performance measurement did not
>> have enough resources. I will do the same benchmark on an another
>> environment to see if it was a valid result, and will share it.
>>
> I did performance measurement on an different environment where has 4
> cores and physically separated two disk volumes. Also I've change the
> benchmarking so that COPYs load only 300 integer tuples which are not
> fit within single page, and changed tables to unlogged tables to
> observe the overhead of locking/unlocking relext locks.

I ran same test as asked by Robert it was just an extension of tests
[1] pointed by Amit Kapila,

Machine : cthulhu

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8
NUMA node(s):  8
Vendor ID: GenuineIntel
CPU family:6
Model: 47
Model name:Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:  2
CPU MHz:   1064.000
CPU max MHz:   2129.
CPU min MHz:   1064.
BogoMIPS:  4266.59
Virtualization:VT-x
Hypervisor vendor: vertical
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  24576K
NUMA node0 CPU(s): 0-7,64-71
NUMA node1 CPU(s): 8-15,72-79
NUMA node2 CPU(s): 16-23,80-87
NUMA node3 CPU(s): 24-31,88-95
NUMA node4 CPU(s): 32-39,96-103
NUMA node5 CPU(s): 40-47,104-111
NUMA node6 CPU(s): 48-55,112-119
NUMA node7 CPU(s): 56-63,120-127

It has 2 discs with different filesytem as below
/dev/mapper/vg_mag-data2ext4  5.1T  3.6T  1.2T  76% /mnt/data-mag2
/dev/mapper/vg_mag-data1xfs   5.1T  1.6T  3.6T  31% /mnt/data-mag

I have created 2 tables each one on above filesystem.

test_size_copy.sh --> automated script to run copy test.
copy_script1, copy_script2 -> copy pg_bench script's used by
test_size_copy.sh to load to 2 different tables.

To run above copy_scripts in parallel I have run it with equal weights as below.
./pgbench -c $threads -j $threads -f copy_script1@1 -f copy_script2@1
-T 120 postgres >> test_results.txt


Results :
---

ClientsHEAD-TPS
----
184.460734
2121.359035
4175.886335
8268.764828
16  369.996667
32  439.032756
64  482.185392


ClientsN_RELEXTLOCK_ENTS = 1024%diff with DEAD
--
187.1657773.20272258112273
2131.0940378.02165409439848
4181.6671043.2866504381935
8267.412856-0.503031594595423
16376.1186711.65461058058666
32460.7563574.94805927419228
64492.7239752.18558736428913

Not much of an improvement from HEAD

ClientsN_RELEXTLOCK_ENTS = 1%diff with HEAD
-
186.2885742.16412990206786
2131.3986678.27266960387414
4168.681079-4.09654109854526
8245.841999-8.52895416806549
16321.972147-12.9797169226933
32375.783299-14.4065462395703
64360.134531-25.3120196142317


So in case of  N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ?


[1]https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


test_size_copy.sh
Description: Bourne shell script


copy_script1
Description: Binary data


copy_script2
Description: Binary data


Possible performance regression in version 10.1 with pgbench read-write tests.

2018-01-23 Thread Mithun Cy
Hi all,

When I was trying to do read-write pgbench bench-marking of PostgreSQL
9.6.6 vs 10.1 I found PostgreSQL 10.1 regresses against 9.6.6 in some
cases.

Non Default settings and test
==
Server:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 &

Pgbench:
CASE 1: when data fits shared buffers.
./pgbench -i -s 1000 postgres

CASE 2: when data exceeds shared buffers.
./pgbench -i -s 1000 postgres

./pgbench -c $threads -j $threads -T 1800 -M prepared postgres

Script "perf_buff_mgmt_write-2.sh" which is added below can be used to run same.


Machine : "cthulhu" 8 node numa machine with 128 hyper threads.
===
>numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 37885 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 31215 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 15331 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 36774 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 62 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 9653 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 50209 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 43966 MB

CASE 1:
In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS :
35554.573858 and in 10.1 at 72 clients TPS dips to 26882.828133 so
nearly 23% decrease in TPS.

CASE 2:
In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS :
24861.074079 and in 10.1 at 72 clients TPS dips to 18372.565663 so
nearly 26% decrease in TPS.

Added "Postgresql_benchmarking_9.6vs10.ods" which gives more detailed
TPS numbers. And, TPS is median of 3 runs result.

I have not run bisect yet to find what has caused the issue.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Postgresql_benchmarking_9.6vs10.ods
Description: application/vnd.oasis.opendocument.spreadsheet


perf_buff_mgmt_write-2.sh
Description: Bourne shell script


Re: Possible performance regression in version 10.1 with pgbench read-write tests.

2018-01-23 Thread Mithun Cy
On Wed, Jan 24, 2018 at 7:36 AM, Amit Kapila  wrote:

> Both the cases look identical, but from the document attached, it
> seems the case-1 is for scale factor 300.

Oops sorry it was a typo. CASE 1 is scale factor 300 which will fit in
shared buffer =8GB.



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



DROP OWNED CASCADE vs Temp tables

2020-01-06 Thread Mithun Cy
I have a test where a user creates a temp table and then disconnect,
concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems
this causes race condition between temptable deletion during disconnection
(@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation
which will try to remove same temp table when they find them as part of
pg_shdepend. Which will result in internal error cache lookup failed as
below.

DROP OWNED BY test_role CASCADE;
2020-01-07 12:35:06.524 IST [26064] ERROR:  cache lookup failed for
relation 41019
2020-01-07 12:35:06.524 IST [26064] STATEMENT:  DROP OWNED BY test_role
CASCADE;
reproduce.sql:8: ERROR:  cache lookup failed for relation 41019

TEST
=
create database test_db;
create user test_superuser superuser;
\c test_db test_superuser
CREATE ROLE test_role nosuperuser login password 'test_pwd' ;
\c test_db test_role
CREATE TEMPORARY TABLE tmp_table(col1 int);
\c test_db test_superuser
DROP OWNED BY test_role CASCADE;


-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: mysql_fdw crash

2018-11-20 Thread Mithun Cy
On Tue, Nov 20, 2018 at 7:59 PM Tomas Vondra
 wrote:
>
> On 11/20/18 3:06 PM, 066ce...@free.fr wrote:
> > Hi,
> >
> >> When gdb will be active, then use command c, and then run query in 
> >> session. gdb should to catch segfault.
> >
> > Thank you very much. It's been helpfull.
> >
> > BTW behaviour is strange. When I'm executing following, I do have always a 
> > SEGV :
> >
> > psql (11.1)
> > Type "help" for help.
> >
> > herve=# CREATE OR REPLACE FUNCTION public.test_bug2(text,integer,timestamp 
> > with time zone)
> > herve-#  RETURNS integer
> > herve-#
> > herve-# AS '
> > herve'#
> > herve'# select coalesce(max(id),1) from sact_v1.autocalls where  
> > label=$1 and machine_id=$2 and created_date=$3;
> > herve'# '
> > herve-#  LANGUAGE sql;
> > CREATE FUNCTION
> > herve=# select test_bug2('BSM_CRITICAL_SYSLOG',18843,now());
> >
> > The GDB session :
> >
> > Continuing.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > prepare_query_params (param_types=0x1c86ac8, param_values=0x1c86ac0, 
> > param_exprs=0x1c86ab8, param_flinfo=0x1c86ab0, numParams=3, 
> > fdw_exprs=0x1c6b5b8, node=0x1c792d8) at mysql_fdw.c:2139
> > 2139*param_types[i] = exprType(param_expr);
> > (gdb) bt
>
> So which part of that expression triggers the segfault? Try printing the
> different parts, i.e.
>
>  p i
>  p param_types[i]
>
> It might be helpful to also install the debug package, which would give
> us more readable backtraces.
>
> BTW, considering the failure is in mysql_fdw.c, this very much seems
> like a bug in mysql_fdw - have you tried reporting it through the
> project github repository?
>
>  https://github.com/EnterpriseDB/mysql_fdw/issues
>
> That's probably more likely to help, and even if we find a bug here we
> can't really commit that (perhaps some of the mysql_fdw authors are
> watching this list, but I'm not sure about that).

Thanks for reporting,

 Oid **param_types)
{
int i;
ListCell   *lc;

Assert(numParams > 0);

/* Prepare for output conversion of parameters used in remote query. */
*param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);

*param_types = (Oid *) palloc0(sizeof(Oid) * numParams);

i = 0;
foreach(lc, fdw_exprs)
{
Node   *param_expr = (Node *) lfirst(lc);
Oid typefnoid;
boolisvarlena;

*param_types[i] = exprType(param_expr);

Seems some basic mistake I think it should as below
(*param_types)[i] = exprType(param_expr);

After this it works
postgres=#  select test_bug2('BSM_CRITICAL_SYSLOG',18843,now());
 test_bug2
---
 1
(1 row)


-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com



Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Mithun Cy
> On Thu, Mar 1, 2018 at 7:39 PM Amit Kapila 
wrote:

I did some testing for performance of COPY command for zheap against heap,
here are my results,
Machine : cthulhu, (is a 8 node numa machine with 500GB of RAM)
server non default settings: shared buffers 32GB, max_wal_size = 20GB,
min_wal_size = 15GB

Test tables and data:

I have used pgbench_accounts table of pgbench tool as data source with 3
different scale factors 100, 1000, 2000. Both heap and zheap table is
lookalike of pgbench_accounts

CREATE TABLE pgbench_zheap (LIKE pgbench_accounts) WITH
(storage_engine='zheap');
CREATE TABLE pgbench_heap (LIKE pgbench_accounts) WITH
(storage_engine='heap');

Test Commands:
Command to generate datafile: COPY pgbench_accounts TO '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data';

Command to load from datafile:
COPY pgbench_heap FROM '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data'; -- heap table
COPY pgbench_zheap FROM '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data'; -- zheap table

Results
==

Scale factor : 100

zheap table size : 1028 MB
heap table size: 1281 MB
-- table size reduction: 19% size reduction.
zheap wal size: 1007 MB
heap wal size: 1024 MB
-- wal size difference: 1.6% size reduction.
zheap COPY  execution time: 24869.451 ms
heap COPY  execution time: 25858.773 ms
-- % of improvement -- 3.8% reduction in execution time for zheap

Scale factor : 1000
-
zheap table size : 10 GB
heap table size: 13 GB
-- table size reduction: 23% size reduction.
zheap wal size: 10071 MB
heap wal size: 10243 MB
-- wal size difference: 1.67% size reduction.
zheap COPY  execution time: 270790.235 ms
heap COPY  execution time:  280325.632 ms
-- % of improvement -- 3.4% reduction in execution time for zheap

Scale factor : 2000
-
zheap table size : 20GB
heap table size: 25GB
-- table size reduction: 20% size reduction.
zheap wal size: 20142 MB
heap wal size: 20499 MB
-- wal size difference: 1.7% size reduction.
zheap COPY  execution time: 523702.904 ms
heap COPY  execution time: 537537.720 ms
-- % of improvement -- 2.5 % reduction in execution time for zheap


COPY command seems to have improved very slightly with zheap in both with
size of wal and execution time. I also did some tests with insert statement
where I could see some regression in zheap when compared to heap with
respect to execution time. With further more investigation I will reply
here.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Mithun Cy
On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila  wrote:
>
> On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule  wrote:
> >
> > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy  
> > napsal:
> >>
> >> COPY command seems to have improved very slightly with zheap in both with 
> >> size of wal and execution time. I also did some tests with insert 
> >> statement where I could see some regression in zheap when compared to heap 
> >> with respect to execution time. With further more investigation I will 
> >> reply here.
> >>
> >
> > 20% of size reduction looks like effect of fill factor.
> >
>
> I think it is because of smaller zheap tuple sizes.  Mithun can tell
> more about setup whether he has used different fillfactor or anything
> else which could lead to such a big difference.

Yes default fillfactor is unaltered, zheap tuples sizes are less and
alinged each at 2 Bytes

Length of each item. (all Items are identical)
=
postgres=# SELECT lp_len FROM
zheap_page_items(get_raw_page('pgbench_zheap', 9)) limit 1;
 lp_len

102
(1 row)

postgres=# SELECT lp_len FROM
heap_page_items(get_raw_page('pgbench_heap', 9)) limit 1;
 lp_len

121
(1 row)

Total tuples per page
=
postgres=# SELECT count(*) FROM
zheap_page_items(get_raw_page('pgbench_zheap', 9));
 count
---
76
(1 row)

postgres=# SELECT count(*) FROM
heap_page_items(get_raw_page('pgbench_heap', 9));
 count
---
61
(1 row)

because of this zheap takes less space as reported above.


-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2018-12-29 Thread Mithun Cy
On Sat, Dec 8, 2018 at 6:35 PM Amit Kapila  wrote:
>
> On Fri, Dec 7, 2018 at 7:25 PM John Naylor  wrote:
> >
> > On 12/6/18, Amit Kapila  wrote:
> > > On Thu, Dec 6, 2018 at 10:53 PM John Naylor 
wrote:
> > >>
> > >> I've added an additional regression test for finding the right block

I did run some performance tests on the latest patch v11, I see small
regression in execution time of COPY statement. Tests I have used is same
as provided in [1] just that I ran it for fill factor 20 and 70. Here are
my results!

Machine : cthulhu (Intel based  8 numa machine)
Server setting is default, configured with HEAP_FSM_CREATION_THRESHOLD = 4,
Entire data directory was on HDD.

Results are execution time(unit ms) taken by copy statement when number of
records  equal to exact number which fit HEAP_FSM_CREATION_THRESHOLD = 4
pages. For fill factor 20 it is till tid (3, 43) and for scale factor 70
till tid (3, 157). Result is taken as a median of 10 runs.

Fill factor 20
Tables BasePatch  % of increase in execution time
500121.97  125.315   2.7424776584
1000  246.592253.789   2.9185861666

Fill factor 70
500211.502217.128   2.6600221275
1000  420.309432.606   2.9257046601

So 2-3% consistent regression, And on every run I can see for patch v11
execution time is slightly more than base. I also tried to insert more
records till 8 pages and same regression is observed! So I guess even
HEAP_FSM_CREATION_THRESHOLD = 4 is not perfect!

[1]
https://www.postgresql.org/message-id/CAJVSVGX%3D2Q52fwijD9cjeq1UdiYGXns2_9WAPFf%3DE8cwbFCDvQ%40mail.gmail.com

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: WIP: Avoid creation of the free space map for small tables

2018-12-30 Thread Mithun Cy
On Thu, Dec 6, 2018 at 10:53 PM John Naylor  wrote:
> On 12/3/18, Amit Kapila  wrote:
> > fsm_local_set is being called from RecordAndGetPageWithFreeSpace and
> > GetPageWithFreeSpace whereas the change we have discussed was specific
> > to GetPageWithFreeSpace, so not sure if we need any change in
> > fsm_local_set.

I have some minor comments for pg_upgrade patch
1. Now we call stat main fork file in transfer_relfile()
+sret = stat(old_file, &statbuf);

+/* Save the size of the first segment of the main fork. */
+if (type_suffix[0] == '\0' && segno == 0)
+first_seg_size = statbuf.st_size;

But we do not handle the case if stat has returned any error!

2. src/bin/pg_upgrade/pg_upgrade.h

 char   *relname;
+
+charrelkind;/* relation relkind -- see pg_class.h */

I think we can remove the added empty line.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-09 Thread Mithun Cy
Hi John Naylor,
On Tue, Jan 8, 2019 at 2:27 AM John Naylor  wrote:
> I've attached two patches for testing. Each one applies on top of the
> current patch.

Thanks for the patch, I did a quick test for both of the patches same
tests as in [1], now for fillfactors 20, 70, 100 (Note for
HEAP_FSM_CREATION_THRESHOLD = 4 highest tid inserted was 20 fillfactor
is (3,43),  for 70 fillfactor is (3, 157) and for 100 fillfactor is
(3, 225), so exactly 4 pages are used)

Machine : cthulhu, same as before [2] and server settings is default.
Test: COPY command as in [1], for 500 tables.

Fill factor 20
  execution time in ms%increase in
execution time
Base 119.238
v11-all-pages121.974 2.2945705228
v11-Every-other-page   114.455 -4.0113051209
v11-last-page113.573 -4.7510021973

Fill factor 70
 execution time in ms   %increase in execution time
Base  209.991
v11-all-pages 211.0760.5166888105
v11-Every-other-page206.476  -1.6738812616
v11-last-page 203.591  -3.0477496655

Fill factor 100
   execution time in ms%increase in execution time
Base  269.691
v11-all-pages 270.078 0.1434975583
v11-Every-other-page262.691-2.5955630703
v11-last-page 260.293-3.4847288193

Observations
1. Execution time of both base and v11-all-pages patch has improved
than my earlier results [2]. But still v11-all-pages is slightly
behind base.
2. v11-Every-other-page and v11-last-page patches improve the
performance from base.
3. IMHO v11-Every-other-page would be ideal to consider it improves
the performance and also to an extent avoid expansion if space is
already available.

[1] 
https://www.postgresql.org/message-id/CAJVSVGX%3D2Q52fwijD9cjeq1UdiYGXns2_9WAPFf%3DE8cwbFCDvQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD__Ouj%3Dat4hy2wYidK90v92qSRLjU%2BQe4y-PwfjLLeGkhc6ZA%40mail.gmail.com

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-15 Thread Mithun Cy
On Fri, Jan 11, 2019 at 3:54 AM John Naylor 
wrote:
>
> On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila 
wrote:
> > Thanks, Mithun for performance testing, it really helps us to choose
> > the right strategy here.  Once John provides next version, it would be
> > good to see the results of regular pgbench (read-write) runs (say at
> > 50 and 300 scale factor) and the results of large copy.  I don't think
> > there will be any problem, but we should just double check that.
>
> Attached is v12 using the alternating-page strategy. I've updated the
> comments and README as needed. In addition, I've

Below are my performance tests and numbers
Machine : cthulhu

Tests and setups
Server settings:
max_connections = 200
shared_buffers=8GB
checkpoint_timeout =15min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
min_wal_size=15GB and max_wal_size=20GB.

pgbench settings:
---
read-write settings (TPCB like tests)
./pgbench -c $threads -j $threads -T $time_for_reading -M prepared postgres

scale factor 50 -- median of 3 TPS
clients   v12-patch  base patch% diff
1   826.081588   834.328238   -0.9884179421
16 10805.807081   10800.6628050.0476292621
32 19722.277019   19641.5466280.4110185034
64 30232.681889   30263.616073   -0.1022157561

scale factor 300 -- median of 3 TPS
clients   v12-patch  base patch% diff
1  813.646062   822.18648  -1.038744641
16   11379.02870211277.055860.9042505709
32   21688.08409321613.044463  0.3471960192
64   36288.85711  36348.6178 -0.1644098005


Copy command
Test: setup
 ./psql -d postgres -c "COPY pgbench_accounts TO '/mnt/data-mag/
mithun.cy/fsmbin/bin/dump.out' WITH csv"
 ./psql -d postgres -c "CREATE UNLOGGED TABLE pgbench_accounts_ulg (LIKE
pgbench_accounts) WITH (fillfactor = 100);"
Test run:
TRUNCATE TABLE pgbench_accounts_ulg;
\timing
COPY pgbench_accounts_ulg FROM '/mnt/data-mag/mithun.cy/fsmbin/bin/dump.out'
WITH csv;
\timing

execution time in ms. (scale factor indicates size of pgbench_accounts)
scale factor   v12-patchbase patch   % diff
300   77166.407   77862.041 -0.8934186557
50 13329.233  13284.583   0.3361038882

So for large table tests do not show any considerable performance variance
from base code!



On Fri, Jan 11, 2019 at 3:54 AM John Naylor 
wrote:

> On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila 
> wrote:
> > Thanks, Mithun for performance testing, it really helps us to choose
> > the right strategy here.  Once John provides next version, it would be
> > good to see the results of regular pgbench (read-write) runs (say at
> > 50 and 300 scale factor) and the results of large copy.  I don't think
> > there will be any problem, but we should just double check that.
>
> Attached is v12 using the alternating-page strategy. I've updated the
> comments and README as needed. In addition, I've
>
> -handled a possible stat() call failure during pg_upgrade
> -added one more assertion
> -moved the new README material into a separate paragraph
> -added a comment to FSMClearLocalMap() about transaction abort
> -corrected an outdated comment that erroneously referred to extension
> rather than creation
> -fleshed out the draft commit messages
>


-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: Possible performance regression in version 10.1 with pgbench read-write tests.

2018-07-19 Thread Mithun Cy
Hi Andres,

On Fri, Jul 20, 2018 at 1:21 AM, Andres Freund  wrote:

> Hi,
>
> On 2018-01-24 00:06:44 +0530, Mithun Cy wrote:
> > Server:
> > ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
> > max_wal_size=20GB -c checkpoint_timeout=900 -c
> > maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 &
>
> Which kernel & glibc version does this server have?
>

[mithun.cy@cthulhu ~]$ cat /proc/version
Linux version 3.10.0-693.5.2.el7.x86_64 (buil...@kbuilder.dev.centos.org)
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Fri Oct 20
20:32:50 UTC 2017

[mithun.cy@cthulhu ~]$ ldd --version
ldd (GNU libc) 2.17


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: Possible performance regression in version 10.1 with pgbench read-write tests.

2018-07-20 Thread Mithun Cy
On Fri, Jul 20, 2018 at 10:52 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Fri, Jul 20, 2018 at 7:56 AM, Tom Lane  wrote:
> >
> > It's not *that* noticeable, as I failed to demonstrate any performance
> > difference before committing the patch.  I think some more investigation
> > is warranted to find out why some other people are getting different
> > results
> Maybe false sharing is a factor, since sizeof(sem_t) is 32 bytes on
> Linux/amd64 and we're probably hitting elements clustered at one end
> of the array?  Let's see... I tried sticking padding into
> PGSemaphoreData and I got ~8% more TPS (72 client on multi socket
> box, pgbench scale 100, only running for a minute but otherwise the
> same settings that Mithun showed).
>
> --- a/src/backend/port/posix_sema.c
> +++ b/src/backend/port/posix_sema.c
> @@ -45,6 +45,7 @@
>  typedef struct PGSemaphoreData
>  {
> sem_t   pgsem;
> +   charpadding[PG_CACHE_LINE_SIZE - sizeof(sem_t)];
>  } PGSemaphoreData;
>
> That's probably not the right idiom and my tests probably weren't long
> enough, but there seems to be some effect here.
>

I did a quick test applying the patch with same settings as initial mail I
have reported  (On postgresql 10 latest code)
72 clients

CASE 1:
Without Patch : TPS 29269.823540

With Patch : TPS 36005.544960.--- 23% jump

Just Disabling using unnamed POSIX semaphores: TPS 34481.207959

So it seems that is the issue as the test is being run on 8 node numa
machine.
I also came across a presentation [1] : slide 20 which says one of those
futex architecture is bad for NUMA machine. I am not sure the new fix for
same is included as part of Linux version 3.10.0-693.5.2.el7.x86_64 which
is on my test machine.


[1] https://www.slideshare.net/davidlohr/futex-scaling-for-multicore-systems


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: Pluggable Storage - Andres's take

2018-09-10 Thread Mithun Cy
On Mon, Sep 10, 2018 at 7:33 PM, Amit Kapila  wrote:
> On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi  
> wrote:
>>
>> On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi  
>> wrote:
>>>
>> pg_stat_get_tuples_hot_updated and others:
>> /*
>> * Counter tuples_hot_updated stores number of hot updates for heap table
>> * and the number of inplace updates for zheap table.
>> */
>> if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
>> RelationStorageIsZHeap(rel))
>> result = 0;
>> else
>> result = (int64) (tabentry->tuples_hot_updated);
>>
>>
>> Is the special condition is needed? The values should be 0 because of zheap 
>> right?
>>
>
> I also think so.  Beena/Mithun has worked on this part of the code, so
> it is better if they also confirm once.

Yes pg_stat_get_tuples_hot_updated should return 0 for zheap.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com