Segmentation fault with core dump

2018-01-10 Thread Glauco Torres
Hi group,

I'm using PG 9.6.6 and I have a problem with seg fault from every few days
to up to two week,
this server is a replica, the other servers (master, and other slaves) do
not have this problem.

I could not identify the problem, so I do not know what triggers the
problem, however I have the PostgreSQL log and the core-dump generated by
the problem.

The server has 60 GB RAM, PG is configured:

shared_buffers = 14GB
work_mem = 192MB


Below are the relevant details.

$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)


postgres=# select version();

version
--
 PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)


# cat postgresql-Mon.log | grep 'was terminated by signal 11: Segmentation
fault'
2018-01-08 01:51:27.909 -03 [85039]: [102-1] user=,db=,app=,client= LOG:
server process (PID 40286) was terminated by signal 11: Segmentation fault
2018-01-08 05:09:51.929 -03 [85039]: [107-1] user=,db=,app=,client= LOG:
server process (PID 62427) was terminated by signal 11: Segmentation fault
2018-01-08 06:33:46.840 -03 [85039]: [112-1] user=,db=,app=,client= LOG:
server process (PID 72156) was terminated by signal 11: Segmentation fault
2018-01-08 13:59:37.422 -03 [119484]: [4-1] user=,db=,app=,client= LOG:
server process (PID 124190) was terminated by signal 11: Segmentation fault
2018-01-08 14:09:41.590 -03 [119484]: [9-1] user=,db=,app=,client= LOG:
checkpointer process (PID 124528) was terminated by signal 11: Segmentation
fault
2018-01-08 15:18:06.379 -03 [119484]: [13-1] user=,db=,app=,client= LOG:
server process (PID 129026) was terminated by signal 11: Segmentation fault
2018-01-08 15:23:15.586 -03 [119484]: [18-1] user=,db=,app=,client= LOG:
server process (PID 6528) was terminated by signal 11: Segmentation fault
2018-01-08 15:55:32.029 -03 [119484]: [23-1] user=,db=,app=,client= LOG:
server process (PID 8762) was terminated by signal 11: Segmentation fault
2018-01-08 20:52:16.344 -03 [14804]: [5-1] user=,db=,app=,client= LOG:
checkpointer process (PID 14828) was terminated by signal 11: Segmentation
fault

(gdb) bt
#0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
#1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 ,
c=0x2fc9dfbb1815c , cmp=0x6a4d20
)
at qsort.c:107
#2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
) at qsort.c:157
#3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
) at qsort.c:203
#4  0x006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863
#5  0x006a8477 in CheckPointBuffers (flags=flags@entry=128) at
bufmgr.c:2578
#6  0x004dd781 in CheckPointGuts (checkPointRedo=,
flags=) at xlog.c:8698
#7  0x004e9faf in CreateRestartPoint (flags=) at
xlog.c:8856
#8  0x0066977c in CheckpointerMain () at checkpointer.c:490
#9  0x004f2820 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffd8bac2b80) at bootstrap.c:429
#10 0x00673330 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5252
#11 0x00674b1f in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4949
#12 
#13 0x7f6fc75a0b83 in __select_nocancel () from /lib64/libc.so.6
#14 0x0046ef32 in ServerLoop () at postmaster.c:1683
#15 0x00675b69 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x27ca210)
at postmaster.c:1327
#16 0x0047053e in main (argc=3, argv=0x27ca210) at main.c:228



#0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
a = 0x7f6fa9ef4b2c
b = 0x1be06d2d06644
#1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 ,
c=0x2fc9dfbb1815c , cmp=0x6a4d20
)
at qsort.c:107
No locals.
#2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
) at qsort.c:157
d = 350293923535640
pa = 
pb = 
pc = 
pd = 
pl = 0x7f6fa9ef4b2c "\177\006"
pm = 0x7f6fa9f06174 "\177\006"
pn = 0xa7428f0f82428 
d1 = 
d2 = 
r = 
swaptype = 2
presorted = 0
#3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
) at qsort.c:203
pa = 
pb = 0x7f6fa9f3a49c "\177\006"
pc = 
pd = 0x7f6faa0c8840 "\177\006"
pl = 
pm = 
pn = 0x7f6faa0c8854 "\177\006"
d1 = 
d2 = 1631160
r = 
swaptype = 2
presorted = 0
#4  0x006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863
buf_state = 
buf_id = 1835008
num_to_scan = 

String comparison problem in select - too many results

2018-01-10 Thread Durumdara
Dear Members!


Today one of hour clients reported an error.

She made a report filtered with string range, and she got wrong documents.

I checked it.
She filtered by a document nr (varchar field -> character varying(15)).

The range was: '18/0113', and '18/0212'.

Q.Close;
Q.SQL.Text := 'select * from idoc where nr >=:pfrom and nr  <=:pto';
Q.paramByName('pfrom').Asstring := PFrom;
Q.paramByName('pto').Asstring := PTo;
Q.Open;

This devart TPGQuery uses parameters, the query prepared before call.

She said that she got nr = '180/2010'  document too (and much more).

Firstly I tried to check this in LibreOffice calc, because it seems to be
impossible.



Lesser Greater
'18/0113 '180/2010 True False
'18/0212 '180/2010 True False
It's ok.

Then I tried to use the select in the database.
I also got 180/2010 in the list!!!

Then I tried a simple check with SQL:

select
'18/0113' > '180/2010',
'18/0113' < '180/2010',
'18/0212' > '180/2010',
'18/0212' < '180/2010'
---
f;t;t;f

What

It's impossible.
First I assumed the "/" is special char like escape "\". But in this case
the pretag is "18/0" in every input string!!!

Next I supposed that PGSQL converts this expression and the numeric value
could be different.
Without calculation I changed the select to:

select
cast('18/0113' as text) > cast('180/2010' as text),
cast('18/0113' as text) < cast('180/2010' as text),
cast('18/0212' as text) > cast('180/2010' as text),
cast('18/0212' as text) < cast('180/2010' as text),
replace(cast('18/0113' as text), '/', '_') > replace(cast('180/2010' as
text), '/', '_'),
replace(cast('18/0212' as text), '/', '_') > replace(cast('180/2010' as
text), '/', '_')

To be sure the strings must be strings. I use replace at the end to filter
"/" sign.
The result is also wrong:
f;t;t;f;f;t

Please help me in this theme!

What can cause differences between similar strings?
How can I force the good range?
What is the base of the problem?

The PG is 9.4 on Linux, the DataBase encoding is:
   ENCODING = 'UTF8'
   LC_COLLATE = 'hu_HU.UTF-8'
   LC_CTYPE = 'hu_HU.UTF-8'


Thank you for your help!

Best regards
   dd


Re: help with generation_series in pg10

2018-01-10 Thread Merlin Moncure
On Mon, Jan 8, 2018 at 11:19 PM, Adrian Klaver
 wrote:
> On 01/08/2018 05:25 PM, Márcio A. Sepp wrote:
>>
>>
>> Hi,
>>
>>
>> In pg10 generation series doesn't work like in 9.5.
>> Ex. in 9.5:
>> z=# select generate_series(1, 10), generate_series(1, 5);
>>   generate_series | generate_series
>> -+-
>> 1 |   1
>> 2 |   2
>> 3 |   3
>> 4 |   4
>> 5 |   5
>> 6 |   1
>> 7 |   2
>> 8 |   3
>> 9 |   4
>>10 |   5
>> (10 registros)
>>
>>
>> so, in version 10 the same sql show different result set.
>
>
> The reason why:
>
> https://www.postgresql.org/docs/10/static/release-10.html#idm46428657945600
>
> Change the implementation of set-returning functions appearing in a query's
> SELECT list (Andres Freund)

An interesting (and good) side effect of this change is that this query:
select generate_series(1, nextval('s')), generate_series(1, nextval('s'));

...now terminates.

merlin



Re: help with generation_series in pg10

2018-01-10 Thread Merlin Moncure
On Tue, Jan 9, 2018 at 10:14 AM, Márcio A. Sepp
 wrote:
>
>> > how can i have the same in pg10 as i have had in pg 9.x?
>>
>> Move the function call to the FROM clause:
>>
>> select g, (g - 1) % 5 + 1 from generate_series(1, 10) g;
>
> thank you. That is exact what i need.
>
>
>> > I need it to date type to...  if possible.
>>
>> There is a generate_series() variant that can return dates (more
>> precisely, timestamp with time zone).  But what exactly would you like
>> returned?
>
> In the past i use querys like this to generate some timestamp field:
> select generate_series (1, 10), generate_series('2018-01-01
> 10:00'::timestamp, '2018-01-02 10:00'::timestamp, '10 hours');

With the old behavior you're lucky this ever worked at all.  Doing
this kind of stuff, you want to write it with a single generate_series
driver if you can or as a cross product:

select
  a, b,c
from generate_series(1,3) a
cross join generate_series(1,4) b
cross join generate_series(1,2) c;

The old behavior presented 'least common multiple' which was very
surprising in that it sometimes worked like cross product but
sometimes didn't depending on specific numbers chosen.

merlin



RES: help with generation_series in pg10

2018-01-10 Thread Márcio A . Sepp
> >> > how can i have the same in pg10 as i have had in pg 9.x?
> >>
> >> Move the function call to the FROM clause:
> >>
> >> select g, (g - 1) % 5 + 1 from generate_series(1, 10) g;
> >
> > thank you. That is exact what i need.
> >
> >
> >> > I need it to date type to...  if possible.
> >>
> >> There is a generate_series() variant that can return dates (more
> >> precisely, timestamp with time zone).  But what exactly would you
> >> like returned?
> >
> > In the past i use querys like this to generate some timestamp field:
> > select generate_series (1, 10), generate_series('2018-01-01
> > 10:00'::timestamp, '2018-01-02 10:00'::timestamp, '10 hours');
> 
> With the old behavior you're lucky this ever worked at all.  Doing this
> kind of stuff, you want to write it with a single generate_series
> driver if you can or as a cross product:
> 
> select
>   a, b,c
> from generate_series(1,3) a
> cross join generate_series(1,4) b
> cross join generate_series(1,2) c;

Exactly what I need. Thank you so much!!!





Re: String comparison problem in select - too many results

2018-01-10 Thread David G. Johnston
On Wednesday, January 10, 2018, Durumdara  wrote:

>
> The PG is 9.4 on Linux, the DataBase encoding is:
>ENCODING = 'UTF8'
>LC_COLLATE = 'hu_HU.UTF-8'
>LC_CTYPE = 'hu_HU.UTF-8'
>
>
The collection rules for hu_HU.UTF-8 probably pretend symbols don't exist,
this is not uncommon.  You probably need to use 'C' collation.

David J.


Re: How Many Partitions are Good Performing

2018-01-10 Thread pinker
I've run once a test on my laptop because was curious as well. From my
results (on laptop - 16GB RAM, 4 cores) the upper limit was 12k. Above it
planning time was unbearable high - much higher than execution time. It's
been tested on 9.5



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Segmentation fault with core dump

2018-01-10 Thread Tom Lane
Glauco Torres  writes:
> (gdb) bt
> #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
> pb=pb@entry=0x1be06d2d06644)
> at bufmgr.c:4137
> #1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
> b=0x1be06d2d06644 ,
> c=0x2fc9dfbb1815c , cmp=0x6a4d20
> )
> at qsort.c:107
> #2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
> n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> ) at qsort.c:157
> #3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
> n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> ) at qsort.c:203
> #4  0x006a81cf in BufferSync (flags=flags@entry=128) at
> bufmgr.c:1863

Hm.  I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

I suppose this might be due to a corrupted copy of the postgres executable
rather than an actual compiler bug.  Did you build it yourself?

BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
see the same block ID twice in the array, which I think is an
unsupportable assumption.  But I cannot see a way that that could lead
to a crash in pg_qsort --- at worst it might cause a little inefficiency.

regards, tom lane



Re: Segmentation fault with core dump

2018-01-10 Thread Merlin Moncure
On Wed, Jan 10, 2018 at 11:08 AM, Tom Lane  wrote:
> Glauco Torres  writes:
>> (gdb) bt
>> #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
>> pb=pb@entry=0x1be06d2d06644)
>> at bufmgr.c:4137
>> #1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
>> b=0x1be06d2d06644 ,
>> c=0x2fc9dfbb1815c , cmp=0x6a4d20
>> )
>> at qsort.c:107
>> #2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
>> n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> ) at qsort.c:157
>> #3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
>> n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> ) at qsort.c:203
>> #4  0x006a81cf in BufferSync (flags=flags@entry=128) at
>> bufmgr.c:1863
>
> Hm.  I'm not normally one to jump to the conclusion that something is a
> compiler bug, but it's hard to explain this stack trace any other way.
> The value of "n" passed to the inner invocation of pg_qsort should not
> have been more than 29914, but working from either the value of d or the
> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
> looks a lot more like an address in the array than a proper value of n.
>
> I suppose this might be due to a corrupted copy of the postgres executable
> rather than an actual compiler bug.  Did you build it yourself?
>
> BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
> see the same block ID twice in the array, which I think is an
> unsupportable assumption.  But I cannot see a way that that could lead
> to a crash in pg_qsort --- at worst it might cause a little inefficiency.

simple
SELECT version();
...can give a lot of hints on who/what compiled the database if you don't know.

merlin



Re: Segmentation fault with core dump

2018-01-10 Thread Alvaro Herrera
Tom Lane wrote:
> Glauco Torres  writes:
> > (gdb) bt
> > #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
> > pb=pb@entry=0x1be06d2d06644)
> > at bufmgr.c:4137
> > #1  0x00801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
> > b=0x1be06d2d06644 ,
> > c=0x2fc9dfbb1815c , cmp=0x6a4d20
> > )
> > at qsort.c:107
> > #2  0x00801621 in pg_qsort (a=0x7f6fa9ef4b2c, 
> > a@entry=0x7f6fa9ea8380,
> > n=, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> > ) at qsort.c:157
> > #3  0x008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=,
> > n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> > ) at qsort.c:203
> > #4  0x006a81cf in BufferSync (flags=flags@entry=128) at
> > bufmgr.c:1863
> 
> Hm.  I'm not normally one to jump to the conclusion that something is a
> compiler bug, but it's hard to explain this stack trace any other way.
> The value of "n" passed to the inner invocation of pg_qsort should not
> have been more than 29914, but working from either the value of d or the
> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
> looks a lot more like an address in the array than a proper value of n.
> 
> I suppose this might be due to a corrupted copy of the postgres executable
> rather than an actual compiler bug.  Did you build it yourself?

Hmm, is this something that can be explained by using a different
postgres executable in GDB than the one that produced the core file?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Segmentation fault with core dump

2018-01-10 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Hm.  I'm not normally one to jump to the conclusion that something is a
>> compiler bug, but it's hard to explain this stack trace any other way.
>> The value of "n" passed to the inner invocation of pg_qsort should not
>> have been more than 29914, but working from either the value of d or the
>> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
>> looks a lot more like an address in the array than a proper value of n.

> Hmm, is this something that can be explained by using a different
> postgres executable in GDB than the one that produced the core file?

That would result in nonsensical gdb output, most likely; but Glauco's
trace is internally consistent enough that I doubt gdb is lying to us.
In any case, the crash is an observable fact :-(

regards, tom lane



Re: Segmentation fault with core dump

2018-01-10 Thread Alvaro Herrera
Merlin Moncure wrote:

> simple
> SELECT version();
> ...can give a lot of hints on who/what compiled the database if you don't 
> know.

Probably, this is why Glauco included the output in his opening letter.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Segmentation fault with core dump

2018-01-10 Thread Glauco Torres
>
> That would result in nonsensical gdb output, most likely; but Glauco's
> trace is internally consistent enough that I doubt gdb is lying to us.
> In any case, the crash is an observable fact :-(
>
>
> The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

We are pretty sure that the same binary that crashed was using on `gdb`
command. More specifically, the path used was
`/usr/pgsql-9.6/bin/postmaster`, and we were running 9.6.6 (most recent 9.6
minor release today) for a few weeks, so there shouldn't have any upgrade
on the binaries since the server was up, specially because we restarted the
service in order to allow core dump creation, this is not the first crash
(although the only one with core dump generated so far), we can send new
gdb stack if it happens again.

More information:
$ uname -a
Linux pg-iii.br 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux

$ /usr/pgsql-9.6/bin/pg_config
BINDIR = /usr/pgsql-9.6/bin
DOCDIR = /usr/pgsql-9.6/doc
HTMLDIR = /usr/pgsql-9.6/doc/html
INCLUDEDIR = /usr/pgsql-9.6/include
PKGINCLUDEDIR = /usr/pgsql-9.6/include
INCLUDEDIR-SERVER = /usr/pgsql-9.6/include/server
LIBDIR = /usr/pgsql-9.6/lib
PKGLIBDIR = /usr/pgsql-9.6/lib
LOCALEDIR = /usr/pgsql-9.6/share/locale
MANDIR = /usr/pgsql-9.6/share/man
SHAREDIR = /usr/pgsql-9.6/share
SYSCONFDIR = /etc/sysconfig/pgsql
PGXS = /usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/pgsql-9.6'
'--includedir=/usr/pgsql-9.6/include' '--mandir=/usr/pgsql-9.6/share/man'
'--datadir=/usr/pgsql-9.6/share' '--libdir=/usr/pgsql-9.6/lib'
'--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64'
'--with-openssl' '--with-pam' '--with-gssapi'
'--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls'
'--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt'
'--with-ldap' '--with-selinux' '--with-systemd'
'--with-system-tzdata=/usr/share/zoneinfo'
'--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-9.6/doc'
'--htmldir=/usr/pgsql-9.6/doc/html' 'CFLAGS=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic'
'LDFLAGS=-Wl,--as-needed'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic
CFLAGS_SL = -fPIC
LDFLAGS = -L../../src/common -Wl,--as-needed -L/usr/lib64 -Wl,--as-needed
-Wl,-rpath,'/usr/pgsql-9.6/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lselinux -lxslt -lxml2 -lpam -lssl -lcrypto
-lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.6.6

Regards,
Glauco


Re: Segmentation fault with core dump

2018-01-10 Thread Tom Lane
Glauco Torres  writes:
>> The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

Might be worth comparing sha1sum's of the postgres executable between
this server and one that's not having the problem, just to eliminate
the corrupted-binary theory.

regards, tom lane



Re: Segmentation fault with core dump

2018-01-10 Thread Glauco Torres
>
> Might be worth comparing sha1sum's of the postgres executable between
> this server and one that's not having the problem, just to eliminate
> the corrupted-binary theory.
>
>
>

The return is the same for the two servers,

$ sha1sum /usr/pgsql-9.6/bin/postmaster
56bcb4d644a8b00f07e9bd42f9a3f02be7ff2523  /usr/pgsql-9.6/bin/postmaster


Today I left to generate more core-dump, follow the return,

(gdb) bt
#0  tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70)
at tidbitmap.c:1031
#1  0x00801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70
, c=0x583ecd8 , cmp=0x603ca0 ) at qsort.c:107
#2  0x00801621 in pg_qsort (a=0x1d5ca08, n=,
n@entry=10477,
es=es@entry=8, cmp=cmp@entry=0x603ca0 ) at qsort.c:157
#3  0x00604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at
tidbitmap.c:635
#4  0x005d3a89 in BitmapHeapNext (node=node@entry=0x1dc2ef0) at
nodeBitmapHeapscan.c:110
#5  0x005caf1a in ExecScanFetch (recheckMtd=0x5d35b0
, accessMtd=0x5d35f0 , node=0x1dc2ef0)
at execScan.c:95
#6  ExecScan (node=node@entry=0x1dc2ef0, accessMtd=accessMtd@entry=0x5d35f0
, recheckMtd=recheckMtd@entry=0x5d35b0 )
at execScan.c:180
#7  0x005d3cff in ExecBitmapHeapScan (node=node@entry=0x1dc2ef0) at
nodeBitmapHeapscan.c:440
#8  0x005c3fb8 in ExecProcNode (node=node@entry=0x1dc2ef0) at
execProcnode.c:437
#9  0x005de877 in ExecNestLoop (node=node@entry=0x1dc0148) at
nodeNestloop.c:174
#10 0x005c3f28 in ExecProcNode (node=node@entry=0x1dc0148) at
execProcnode.c:476
#11 0x005de7d7 in ExecNestLoop (node=node@entry=0x1dbfdd8) at
nodeNestloop.c:123
#12 0x005c3f28 in ExecProcNode (node=node@entry=0x1dbfdd8) at
execProcnode.c:476
#13 0x005d624d in MultiExecHash (node=node@entry=0x1dbf9b8) at
nodeHash.c:104
#14 0x005c40c0 in MultiExecProcNode (node=node@entry=0x1dbf9b8) at
execProcnode.c:577
#15 0x005d6cb9 in ExecHashJoin (node=node@entry=0x1dbe688) at
nodeHashjoin.c:178
#16 0x005c3f08 in ExecProcNode (node=node@entry=0x1dbe688) at
execProcnode.c:484
#17 0x005de7d7 in ExecNestLoop (node=node@entry=0x1dbc6e0) at
nodeNestloop.c:123
#18 0x005c3f28 in ExecProcNode (node=node@entry=0x1dbc6e0) at
execProcnode.c:476
#19 0x005de7d7 in ExecNestLoop (node=node@entry=0x1dbc520) at
nodeNestloop.c:123
#20 0x005c3f28 in ExecProcNode (node=0x1dbc520) at
execProcnode.c:476
#21 0x005cf619 in fetch_input_tuple (aggstate=aggstate@entry=0x1dbbc48)
at nodeAgg.c:598
#22 0x005d10ff in agg_retrieve_direct (aggstate=0x1dbbc48) at
nodeAgg.c:2067
#23 ExecAgg (node=node@entry=0x1dbbc48) at nodeAgg.c:1892
#24 0x005c3ec8 in ExecProcNode (node=node@entry=0x1dbbc48) at
execProcnode.c:503
#25 0x005c03a7 in ExecutePlan (dest=0x1b607c8, direction=, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT,
use_parallel_mode=, planstate=0x1dbbc48, estate=0x1dbba58)
at execMain.c:1566
#26 standard_ExecutorRun (queryDesc=0x1c98de0, direction=,
count=0) at execMain.c:338
#27 0x7f016577e0a5 in pgss_ExecutorRun (queryDesc=0x1c98de0,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:877
#28 0x006d3a97 in PortalRunSelect (portal=portal@entry=0x1ad9278,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x1b607c8) at pquery.c:948
#29 0x006d4eab in PortalRun (portal=0x1ad9278,
count=9223372036854775807, isTopLevel=, dest=0x1b607c8,
altdest=0x1b607c8, completionTag=0x7ffdfe32a700 "") at pquery.c:789
#30 0x006d2371 in PostgresMain (argc=,
argv=, dbname=, username=) at
postgres.c:1969
#31 0x0046f8d4 in BackendRun (port=0x1add6d0) at postmaster.c:4294
#32 BackendStartup (port=0x1add6d0) at postmaster.c:3968
#33 ServerLoop () at postmaster.c:1719
#34 0x00675b69 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1ab1210)
at postmaster.c:1327
#35 0x0047053e in main (argc=3, argv=0x1ab1210) at main.c:228

Regards,
Glauco


Re: Segmentation fault with core dump

2018-01-10 Thread Tom Lane
Glauco Torres  writes:
> Today I left to generate more core-dump, follow the return,

> (gdb) bt
> #0  tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70)
> at tidbitmap.c:1031
> #1  0x00801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70
> , c=0x583ecd8  bounds>, cmp=0x603ca0 ) at qsort.c:107
> #2  0x00801621 in pg_qsort (a=0x1d5ca08, n=,
> n@entry=10477,
> es=es@entry=8, cmp=cmp@entry=0x603ca0 ) at qsort.c:157
> #3  0x00604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at
> tidbitmap.c:635

Oh ho!  I was wondering to myself "if pg_qsort is broken, why isn't
his system falling over everywhere?".  The answer evidently is
"yes, it is falling over everywhere".  This symptom looks pretty much
like what you had before, ie far-out-of-range addresses getting passed
to med3(), but the qsort call site is completely different.

Since you've eliminated the idea that the executable file per se
is different from your working servers, I think we're now down to
the conclusion that there's something flaky about the hardware
on this server.  Maybe it's misexecuting integer divide every so
often --- though it's hard to guess why only pg_qsort would be
affected.

regards, tom lane



Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Andres Freund
On 2018-01-09 20:51:17 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Andreas Joseph Krogh (andr...@visena.com) wrote:
> > Aha, so enabling CRC causes hint-bits to be written causing extra 
> > WAL-logging, 
> > which woudn't be the case without CRC enabled?
> > Thanks for pointing that out.
> 
> Yes, having checksums enabled forces logging of hint bits.  You can
> enable wal_log_hints independently too, without having checksums, to see
> what kind of an impact it'll have on your environment.
> 
> A useful documentation update might be:
> 
> ---
> With checksums enabled, wal_log_hints 
> will be enabled and each page read or write will involve calculating the
> checksum for the page.
> ---
> 
> I'd probably just replace the "Enabling checksums may incur a noticeable
> performance penalty" with the above, as it should be clear that doing
> more work implies an impact on performance and that avoids the whole
> question of trying to characterize in a general way something that can't
> be generalized (as it's workload dependent).

-1. I think this is underplaying the cost.

Greetings,

Andres Freund



Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Thomas Poty
Hello,
A question seems to be, according to me, important :
How a corruption, detected thanks to data-checksums, is fixed?

Thank you,
Thomas

Le 10 janv. 2018 20:39, "Andres Freund"  a écrit :

> On 2018-01-09 20:51:17 -0500, Stephen Frost wrote:
> > Greetings,
> >
> > * Andreas Joseph Krogh (andr...@visena.com) wrote:
> > > Aha, so enabling CRC causes hint-bits to be written causing extra
> WAL-logging,
> > > which woudn't be the case without CRC enabled?
> > > Thanks for pointing that out.
> >
> > Yes, having checksums enabled forces logging of hint bits.  You can
> > enable wal_log_hints independently too, without having checksums, to see
> > what kind of an impact it'll have on your environment.
> >
> > A useful documentation update might be:
> >
> > ---
> > With checksums enabled, wal_log_hints 
> > will be enabled and each page read or write will involve calculating the
> > checksum for the page.
> > ---
> >
> > I'd probably just replace the "Enabling checksums may incur a noticeable
> > performance penalty" with the above, as it should be clear that doing
> > more work implies an impact on performance and that avoids the whole
> > question of trying to characterize in a general way something that can't
> > be generalized (as it's workload dependent).
>
> -1. I think this is underplaying the cost.
>
> Greetings,
>
> Andres Freund
>
>


Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Jeff Janes
On Wed, Jan 10, 2018 at 12:23 PM, Thomas Poty  wrote:

> Hello,
> A question seems to be, according to me, important :
> How a corruption, detected thanks to data-checksums, is fixed?
>

Take two full cold backups of the current mess you have, including the
executables, and lock one of them away where you can't accidentally do
something to make it worse.

Replace the hardware (or fix the software bug) which lead to this, so it
doesn't eat more of your data than it already has.

If you have a wal archive, then restore from the most recent backup and
recover it forward with the WAL, hoping your hardware problem hasn't
polluted that as well.  Make sure it rolls forward as far as you think it
should.  If you think it rolled forward all the way, then you are probably
done.  I'd take a full cold backup as well as as full pg_dump(all) at this
point and lock it away for future forensics, just in case, and also to see
if any more errors are found by the pg_dump.

I'd also extract the damaged block and inspect it.  If the value says "MQs.
Johnson" but the obvious (due to context) correction to "Mrs. Johnson"
matches what the post-recovery block also says, then I'd say you are pretty
good.  If the damage was to a floating point number or a header in which
you don't have any useful context to guide you, you might want to engage a
professional at this kind of thing if the data is very important to you.

Cheers,

Jeff


Sv: Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Andreas Joseph Krogh
På onsdag 10. januar 2018 kl. 22:36:42, skrev Jeff Janes mailto:jeff.ja...@gmail.com>>:
On Wed, Jan 10, 2018 at 12:23 PM, Thomas Poty mailto:thomas.p...@gmail.com>> wrote: Hello, A question seems to be, according 
to me, important :
How a corruption, detected thanks to data-checksums, is fixed?

 
Take two full cold backups of the current mess you have, including the 
executables, and lock one of them away where you can't accidentally do 
something to make it worse.
 
Replace the hardware (or fix the software bug) which lead to this, so it 
doesn't eat more of your data than it already has.
 
If you have a wal archive, then restore from the most recent backup and 
recover it forward with the WAL, hoping your hardware problem hasn't polluted 
that as well.  Make sure it rolls forward as far as you think it should.  If 
you think it rolled forward all the way, then you are probably done.  I'd take 
a full cold backup as well as as full pg_dump(all) at this point and lock it 
away for future forensics, just in case, and also to see if any more errors are 
found by the pg_dump.
 
I'd also extract the damaged block and inspect it.  If the value says "MQs. 
Johnson" but the obvious (due to context) correction to "Mrs. Johnson" matches 
what the post-recovery block also says, then I'd say you are pretty good.  If 
the damage was to a floating point number or a header in which you don't have 
any useful context to guide you, you might want to engage a professional at 
this kind of thing if the data is very important to you.



 
Btrfs actually fixes (some) corruption once it detects it, so Thomas' question 
is not all that far-fetched.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


RE: How Many Partitions are Good Performing

2018-01-10 Thread Kumar, Virendra
For test I created two tables with 7800 partitions each and joining them sees 
performance bottleneck. It is taking 5 seconds planning time. Please see 
attached plan.


Regards,
Virendra

-Original Message-
From: pinker [mailto:pin...@onet.eu]
Sent: Wednesday, January 10, 2018 12:07 PM
To: pgsql-gene...@postgresql.org
Subject: Re: How Many Partitions are Good Performing

I've run once a test on my laptop because was curious as well. From my results 
(on laptop - 16GB RAM, 4 cores) the upper limit was 12k. Above it planning time 
was unbearable high - much higher than execution time. It's been tested on 9.5



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html





This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.
 Aggregate  (cost=337270.14..337270.15 rows=1 width=232) (actual 
time=102.123..102.124 rows=1 loops=1)
   Buffers: shared hit=22747
   CTE t1
 ->  HashAggregate  (cost=2773.81..2795.37 rows=2156 width=24) (actual 
time=12.051..13.626 rows=2792 loops=1)
   Group Key: se_1.site_id, se_1.peril_id, se_1.account_id, 
se_1.portfolio_id
   Buffers: shared hit=385
   ->  Nested Loop  (cost=10.05..2752.25 rows=2156 width=24) (actual 
time=0.359..9.940 rows=2792 loops=1)
 Buffers: shared hit=385
 ->  Seq Scan on catevent_earthquake_poly_merged  
(cost=0.00..17.75 rows=3 width=32) (actual time=0.027..0.030 rows=1 loops=1)
   Filter: (event_id = 'us20009yvq'::text)
   Rows Removed by Filter: 3
   Buffers: shared hit=2
 ->  Append  (cost=10.05..910.78 rows=72 width=56) (actual 
time=0.328..7.250 rows=2792 loops=1)
   Buffers: shared hit=383
   ->  Bitmap Heap Scan on site_exposure_13397 se_1  
(cost=10.05..910.78 rows=72 width=56) (actual time=0.325..4.726 rows=2792 
loops=1)
 Recheck Cond: (shape && 
catevent_earthquake_poly_merged.shape)
 Filter: ((portfolio_id = 13397) AND 
_st_intersects(shape, catevent_earthquake_poly_merged.shape))
 Heap Blocks: exact=360
 Buffers: shared hit=383
 ->  Bitmap Index Scan on 
site_exposure_13397_shape_idx  (cost=0.00..10.03 rows=216 width=0) (actual 
time=0.263..0.263 rows=2792 loops=1)
   Index Cond: (shape && 
catevent_earthquake_poly_merged.shape)
   Buffers: shared hit=23
   ->  Hash Join  (cost=271937.09..334470.48 rows=59 width=200) (actual 
time=88.676..98.896 rows=2792 loops=1)
 Hash Cond: ((se.peril_id = p.pe_peril_id) AND (se.account_id = 
p.pe_account_id))
 Buffers: shared hit=22747
 ->  GroupAggregate  (cost=271794.39..330553.67 rows=215630 width=152) 
(actual time=30.641..37.303 rows=2792 loops=1)
   Group Key: se.peril_id, se.portfolio_id, se.account_id
   Buffers: shared hit=11178
   ->  Sort  (cost=271794.39..274489.77 rows=1078152 width=148) 
(actual time=30.629..32.018 rows=2792 loops=1)
 Sort Key: se.peril_id, se.account_id
 Sort Method: quicksort  Memory: 838kB
 Buffers: shared hit=11178
 ->  Nested Loop  (cost=48.94..1614.87 rows=1078152 
width=148) (actual time=3.305..27.195 rows=2792 loops=1)
   Buffers: shared hit=11178
   ->  HashAggregate  (cost=48.51..50.51 rows=200 
width=8) (actual time=3.274..4.957 rows=2792 loops=1)
 Group Key: t1.site_id
 ->  CTE Scan on t1  (cost=0.00..43.12 
rows=2156 width=8) (actual time=0.002..1.378 rows=2792 loops=1)
   ->  Append  (cost=0.43..7.81 rows=1 width=156) 
(actual time=0.004..0.006 rows=1 loops=2792)
 Buffers: shared hit=11178
 ->  Index Scan using site_exposure_13397_pkey 
on site_exposure_13397 se  (cost=0.43..7.81 rows=1 width=156) (actual 
time=0.003..0.004 rows=1 loops=2792)
   Index Cond: (site_id = t1.site_id)
   Filter: (portfolio_id = 13397)
   Buffers: shared hit=11178
 ->  Hash  (cost=142.53..142.53 rows=11 width=80) (actual 
time=58.016..58.016 rows=2792 loops=1)
   Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  
Memory Usage: 

creating a table in plpython?

2018-01-10 Thread Celia McInnis
Hi - Is it possible to create a table inside a plpython stored procedure?
If so, can you give an example of how to do so?

Many thanks,
Celia McInnis


Re: creating a table in plpython?

2018-01-10 Thread David G. Johnston
On Wed, Jan 10, 2018 at 4:27 PM, Celia McInnis 
wrote:

> Hi - Is it possible to create a table inside a plpython stored procedure?
> If so, can you give an example of how to do so?
>
>
​Haven't used pl/python myself but...​

​https://www.postgresql.org/docs/10/static/plpython-database.html

so... plpy.execute("CREATE TABLE ...") ?

David J.
​


Re: creating a table in plpython?

2018-01-10 Thread Celia McInnis
Thanks - works like a charm. Hmmm. I tried that earlier and got errors. I
just retried and it worked fine this time. I guess I had some other problem
- It seems that sometimes I have to exit and reenter my psql session to use
a newly interpreted version of my stored procedure, so maybe that was part
of the problem...

On Wed, Jan 10, 2018 at 6:34 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Jan 10, 2018 at 4:27 PM, Celia McInnis 
> wrote:
>
>> Hi - Is it possible to create a table inside a plpython stored procedure?
>> If so, can you give an example of how to do so?
>>
>>
> ​Haven't used pl/python myself but...​
>
> ​https://www.postgresql.org/docs/10/static/plpython-database.html
>
> so... plpy.execute("CREATE TABLE ...") ?
>
> David J.
> ​
>


FOSDEM booth volunteer

2018-01-10 Thread Andreas 'ads' Scherbaum


Hello,


if you are attending FOSDEM 2018 and like to help with the PostgreSQL 
booth or devroom, please respond to me directly.


The devroom volunteers have a guaranteed seat in the usually packed room.

We also require that every volunteer has Google Hangouts, for quick and 
undisturbing communication. Please include your Hangouts handle in your 
email (and once again, only reply to me, please don't share your handle 
with a public mailinglist)-



Thank you,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project




Parallel Query

2018-01-10 Thread Krithika Venkatesh
Hi All,

I was reading about parallel queries in postgresql 10.

I came across the property max_parallel_workers_per_gather. It sets the
maximum number of workers that can be started by a single Gather or Gather
Merge node.

I am not able to understand gather and gather merge node.

What is the difference between max_parallel_workers_per_gather and
max_parallel_processes. What is meant by gather node.

Please let me know.

Thanks,
Krithika