Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
Hi,

we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
extension postgis. Postgres and postgis have both been installed from packages:

postgresql12-server-12.3-5PGDG
postgis30_12-3.0.1-5.rhel7

The OS is RedHat 7.7.

Creating other extensions works fine, we only see this for postgis. A gdb 
session attached to the psql process while creating the extension gives:

Program received signal SIGINT, Interrupt.
0x7f19c0de9740 in __read_nocancel () from /lib64/libpthread.so.0
Detaching from program: /usr/pgsql-12/bin/psql, process 44202

The process eats all the available memory and finally dies:
# create extension postgis;
ERROR:  out of memory
DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
Time: 773569.877 ms (12:53.570)


Stats file from /proc:

# cat status
Name:   psql
Umask:  0022
State:  S (sleeping)
Tgid:   45958
Ngid:   0
Pid:    45958
PPid:   44075
TracerPid:      0
Uid:    280     280     280     280
Gid:    280     280     280     280
FDSize: 256
Groups: 280
VmPeak:   184604 kB
VmSize:   184600 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4304 kB
VmRSS:      4304 kB
RssAnon:             980 kB
RssFile:            3324 kB
RssShmem:              0 kB
VmData:      672 kB
VmStk:       132 kB
VmExe:       604 kB
VmLib:     11288 kB
VmPTE:       180 kB
VmSwap:        0 kB
Threads:        1
SigQ:   0/15635
SigPnd: 
ShdPnd: 
SigBlk: 
SigIgn: 
SigCgt: 00018002
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 001f
CapAmb: 
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   ,,,
Cpus_allowed_list:      0-127
Mems_allowed:   
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0001
Mems_allowed_list:      0
voluntary_ctxt_switches:        47

stack file from /proc:
# cat stack
[] poll_schedule_timeout+0x55/0xc0
[] do_sys_poll+0x48d/0x590
[] SyS_poll+0x74/0x110
[] system_call_fastpath+0x25/0x2a
[] 0x

limits from /proc:
# cat limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             4096                 15635                processes
Max open files            1024                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15635                15635                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Stracing the psql process shows no activity. Looking at the used swap shows 
that the postmaster uses a huge amount:

find /proc -maxdepth 2 -path "/proc/[0-9]*/status" -readable -exec awk -v 
FS=":" '{process[$1]=$2;sub(/^[ \t]+/,"",process[$1]);} END 
{if(process["VmSwap"] && process["VmSwap"] != "0 kB") printf "%10s %-30s 
%20s\n",process["Pid"],process["Name"],process["VmSwap"]}' '{}' \;
...
 48043 psql1004 kB
 48044 postmaster  21948064 kB
 49059 postmaster  1008 kB
 52550 rhsmcertd144 kB
...
 
Any hints what could be the issue? Can I provide any other information that 
would help troubleshooting this issue?

Thanks in advance
Daniel






Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Shaozhong SHI
It has been found that issues occur when Big Data is being handled with
PostGIS.  Typically, queries can be killed by the system or memory is out.
Often, queries can be very slow.  Sometimes, it will take days or weeks to
complete.

What are the best approaches and means for improving the performance of
queries and processes in PostgreSQL/PostGIS?

Can anyone shed light on this?

Regards,

Shao


Re: is JIT available

2020-07-28 Thread David Rowley
On Tue, 28 Jul 2020 at 15:55, David Rowley  wrote:
>
> On Tue, 28 Jul 2020 at 15:33, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > Maybe this would be better?
> >
> > > "returns true if jit is enabled and JIT compilation is available in
> > > this session (see Chapter 31)."
> >
> > The general, non-hacker meaning of "jit is enabled" would seem to
> > be pretty much what this function is already doing; and for that
> > matter, the same can be said for "JIT compilation is available".
> > We need something that's less tautological-looking.  Maybe along
> > the lines of
> >
> > "returns true if a JIT compiler extension is available and the
> > jit parameter is set to on;
>
> That's probably better.  FWIW, the "jit" is already a link to the GUC
> docs, so I had in mind that users would have known we meant "jit" the
> GUC rather than "jit" the feature.  Your wording will help for anyone
> who thinks we're talking about the feature.
>
> > when this is true, JIT compilation will be performed."
>
> I'd probably drop this part since it's not really true. The query has
> to exceed the cost thresholds before that'll happen.

I pushed a doc change for this with slightly revised wording from what
you mentioned.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d7c8576ebe3949a644c700a9f54d88e7e373a647

David




Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Gavin Flower

On 28/07/2020 22:54, Shaozhong SHI wrote:
It has been found that issues occur when Big Data is being handled 
with PostGIS. Typically, queries can be killed by the system or memory 
is out.  Often, queries can be very slow.  Sometimes, it will take 
days or weeks to complete.


What are the best approaches and means for improving the performance 
of queries and processes in PostgreSQL/PostGIS?


Can anyone shed light on this?

Regards,

Shao


Probably helps if you can give us more details!

Such as O/S, versions of PostgreSQL/PostGIS, and hardware used. Plus 
anything else you think might be relevant.



Cheers,
Gavin





Re: is JIT available

2020-07-28 Thread Scott Ribe
> On Jul 27, 2020, at 6:04 PM, David Rowley  wrote:
> 
> "returns true if jit is enabled and JIT compilation is available in
> this session (see Chapter 31)."

That is clearer. I didn't submit a suggestion myself because I'm not clear on 
the actual circumstances. I know it won't be available if:

- jit is not on in config
- PG was not compiled with JIT support

But does compilation with JIT enable and LLVM dev tools mean that all the LLVM 
compilation/optimization is built into the PG binaries, or does it require LLVM 
presence on the machine where deployed? And if so, does the function take that 
into account as well?

I would guess the function is telling the truth under all circumstances, but I 
don't know for sure.

Perhaps: "returns true if JIT (see Chapter 31) is available in this session. 
Availability of JIT requires that PG was compiled with JIT support, JIT is 
enabled in config, .



Re: is JIT available

2020-07-28 Thread Scott Ribe
> On Jul 27, 2020, at 9:33 PM, Tom Lane  wrote:
> 
> The general, non-hacker meaning of "jit is enabled" would seem to
> be pretty much what this function is already doing; and for that
> matter, the same can be said for "JIT compilation is available".
> We need something that's less tautological-looking.  Maybe along
> the lines of

My problem was that it says "is enabled", then calls out just one of the 
conditions for it to be available, but not the other one. Either calling out no 
conditions, or all of them, would be more clear.



Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Shaozhong SHI
Hi, Gavin,

PostgreSQL Version - 10.11. PostGIS Version - 2.4.7. Operating System - Red
Hat Enterprise Linux 7.7 .

That is all I know at the moment.

As I understand, our IT staff is building another one to sit on Azure.

Regards,

Shao

On Tue, 28 Jul 2020 at 12:31, Gavin Flower 
wrote:

> On 28/07/2020 22:54, Shaozhong SHI wrote:
> > It has been found that issues occur when Big Data is being handled
> > with PostGIS. Typically, queries can be killed by the system or memory
> > is out.  Often, queries can be very slow.  Sometimes, it will take
> > days or weeks to complete.
> >
> > What are the best approaches and means for improving the performance
> > of queries and processes in PostgreSQL/PostGIS?
> >
> > Can anyone shed light on this?
> >
> > Regards,
> >
> > Shao
>
> Probably helps if you can give us more details!
>
> Such as O/S, versions of PostgreSQL/PostGIS, and hardware used. Plus
> anything else you think might be relevant.
>
>
> Cheers,
> Gavin
>
>


Re: Out of memory with "create extension postgis"

2020-07-28 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
> we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
> extension postgis. Postgres and postgis have both been installed from 
> packages:
> ...
> The process eats all the available memory and finally dies:
> # create extension postgis;
> ERROR:  out of memory
> DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
> Time: 773569.877 ms (12:53.570)

Quite odd.  There should have been a memory context dump written to the
postmaster's stderr, can you show that?  Also possibly useful would be
a backtrace (set a breakpoint at errfinish):

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane




Re: Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
>"Daniel Westermann (DWE)"  writes:
>> we have a very strange behavior on PostgreSQL 12.3 when we try to create the 
>> extension postgis. Postgres and postgis have both been installed from 
>> packages:
>> ...
>> The process eats all the available memory and finally dies:
>> # create extension postgis;
>> ERROR:  out of memory
>> DETAIL:  Failed on request of size 8265691 in memory context "PortalContext".
>> Time: 773569.877 ms (12:53.570)

>Quite odd.  There should have been a memory context dump written to the
>postmaster's stderr, can you show that?  Also possibly useful would be
>a backtrace (set a breakpoint at errfinish):

>https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

There is nothing in the log file except these (not from today, but the messages 
are always the same):
2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was terminated 
by signal 9: Killed
2020-07-03 16:52:16 CEST 53617 DETAIL:  Failed process was running: create 
extension postgis;
2020-07-03 16:52:16 CEST 53617 LOG:  terminating any other active server 
processes

I will take care of the backtrace

Regards
Daniel



Re: Out of memory with "create extension postgis"

2020-07-28 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
>> "Daniel Westermann (DWE)"  writes:
>>> The process eats all the available memory and finally dies:
>>> # create extension postgis;
>>> ERROR:  out of memory
>>> DETAIL:  Failed on request of size 8265691 in memory context 
>>> "PortalContext".
>>> Time: 773569.877 ms (12:53.570)

> There is nothing in the log file except these (not from today, but the 
> messages are always the same):
> 2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was 
> terminated by signal 9: Killed

A process that was killed by the OOM killer would not have managed to
produce an "out of memory" ERROR report, so these two are different
symptoms.  You need to reproduce the first case, or you won't have
any luck setting an error breakpoint either.

If you can't manage to get back to that state, you might get somewhere
by waiting for the process to grow large and then attaching with gdb
and getting a stack trace.  That's a bit less reliable than the
errfinish approach, though.

regards, tom lane




Re: Out of memory with "create extension postgis"

2020-07-28 Thread Daniel Westermann (DWE)
>>> "Daniel Westermann (DWE)"  writes:
 The process eats all the available memory and finally dies:
 # create extension postgis;
 ERROR:  out of memory
 DETAIL:  Failed on request of size 8265691 in memory context 
 "PortalContext".
 Time: 773569.877 ms (12:53.570)

>> There is nothing in the log file except these (not from today, but the 
>> messages are always the same):
>> 2020-07-03 16:52:16 CEST 53617 LOG:  server process (PID 54070) was 
>> terminated by signal 9: Killed

>A process that was killed by the OOM killer would not have managed to
>produce an "out of memory" ERROR report, so these two are different
>symptoms.  You need to reproduce the first case, or you won't have
>any luck setting an error breakpoint either.

You're right, that was before we adjusted the oom behavior. Will get back once 
I have more information.

Regards
Daniel


Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Michael Lewis
On Tue, Jul 28, 2020 at 7:59 AM Shaozhong SHI 
wrote:

> Hi, Gavin,
>
> PostgreSQL Version - 10.11. PostGIS Version - 2.4.7. Operating System -
> Red Hat Enterprise Linux 7.7 .
>
> That is all I know at the moment.
>
> As I understand, our IT staff is building another one to sit on Azure.
>

Please don't top-post in these mailing lists.

Can you share the results of the below?

select name, setting, source from pg_settings where source <> 'default';


Re: determine what column(s) form the primary key, in C extention

2020-07-28 Thread alex maslakov

I'm emailing it to the 'general' list.


(1)

This:


    int i = -1;
    while ((i = bms_next_member(pkattnos , i)) >= 0) {
    /* do stuff with i */
    /* you'll need to use i - FirstLowInvalidHeapAttributeNumber to 
get the pg_attribute.attnum */



    elog(INFO, "bms_next_member i: %d", i);
    }


prints 10 and then 9

Then:

  10 - FirstLowInvalidHeapAttributeNumber (-8) ==> 2

  9 - FirstLowInvalidHeapAttributeNumber (-8) ==> 1


These are indexes of the columns, right?

Do they start from 1, not from 0?



(2)

I'll use this C code as an example to build an extention in Rust. The 
Postgresql bindings for Rust I have don't contain a definition of 
`FirstLowInvalidHeapAttributeNumber` for some reason. I can define it 
since it's simply single digit constant.


However what does in some source files it's defined as (-7) and in some 
as (-8)? Which should I use?





El 28/07/2020 a las 03:20, David Rowley escribió:

Hi Alex,

On Tue, 28 Jul 2020 at 05:47, alex maslakov  wrote:

I was suggested to use `get_primary_key_attnos` from
`src/include/catalog/pg_constraint.h`

extern Bitmapset *get_primary_key_attnos(Oid relid, bool deferrableOk)


It returns *Bitstamp. And it's got "nwords" and "words[]". But those
return just big numbers, not something that look similar to an index of
the primary key column.


And therefore I haven't had any luck thus far.

How to do it?

You'll need to loop over the return value of that function with
bms_next_member()

e.g.
pkattnos = get_primary_key_attnos(oid, false);
i = -1;
while ((i = bms_next_member(pkattnos , i)) >= 0)
{
   /* do stuff with i */
/* you'll need to use i - FirstLowInvalidHeapAttributeNumber to get
the pg_attribute.attnum */
}

For the future, for questions, you should use the general list.  If
the question is very source code related then you might have more luck
in pgsql-hackers.  This is not the right list. Please post any
followup questions on one of those lists.

Note the description for this list, per
https://www.postgresql.org/list/ is "Notification of git commits are
sent to this list. Do not post here!"

David





Need free PG odbc driver for Windows 10

2020-07-28 Thread David Gauthier
Hi:

I need a free odbc driver for PG to be installed on Windows 10 that my user
community can easily install.  By "easily install" I mean no binaries, no
zipped file, etc... just point-click-install (with the usual confirmations
and accepting default destinations for the code and such).

Devart used to give this away and I have a copy on my laptop.  But
apparently it's not free anymore.

If it matters, the PG DB they will be accessing is on linux, psql (9.3.2,
server 11.3)

Thanks !


Re: Need free PG odbc driver for Windows 10

2020-07-28 Thread Igor Korot
Hi,

On Tue, Jul 28, 2020 at 11:40 AM David Gauthier
 wrote:
>
> Hi:
>
> I need a free odbc driver for PG to be installed on Windows 10 that my user 
> community can easily install.  By "easily install" I mean no binaries, no 
> zipped file, etc... just point-click-install (with the usual confirmations 
> and accepting default destinations for the code and such).
>
> Devart used to give this away and I have a copy on my laptop.  But apparently 
> it's not free anymore.
>
> If it matters, the PG DB they will be accessing is on linux, psql (9.3.2, 
> server 11.3)

If I understand correctly, there is a free ODBC driver on the
PostgreSQL download page.

Thank you.

>
> Thanks !




Re: bad JIT decision

2020-07-28 Thread Andres Freund
Hi,

On 2020-07-27 19:02:56 -0400, Alvaro Herrera wrote:
> On 2020-Jul-27, Scott Ribe wrote:
> 
> > > On Jul 27, 2020, at 4:00 PM, Alvaro Herrera  
> > > wrote:
> > > 
> > > I don't quite understand why is it that a table with 1000 partitions
> > > means that JIT compiles the thing 1000 times.  Sure, it is possible that
> > > some partitions have a different column layout, but it seems an easy bet
> > > that most cases are going to have identical column layout, and so tuple
> > > deforming can be shared.  (I'm less sure about sharing a compile of an
> > > expression, since the varno would vary. But presumably there's a way to
> > > take the varno as an input value for the compiled expr too?)  Now I
> > > don't actually know how this works so please correct if I misunderstand
> > > it.
> > 
> > I'm guessing it's because of inlining. You could optimize a function
> > that takes parameters, no problem. But what's happening is inlining,
> > with parameters, then optimizing.

No, that's not what happens. The issue rather is that at execution time
there's simply nothing tying the partitioned parts of the query together
from the executor POV. Each table scan gets its own expressions to
evaluate quals etc. That's not a JIT specific thing, it's general.

Which then means a partitioned query with a projection and a where
clause applying on the partition level has > 2 expressions for each
partiton. And they get a separate ExprState and get emitted separately.

One issue is that we don't take that into account for costing. The other
is the overhead, of course. Even when not JITed, that's a lot of work
that we don't actually need, except we don't know which partitions look
enough like others that we could reuse another expression.

One partial way to address this is to simply add a
LLVMAddMergeFunctionsPass() at the beginning of the optimization
pipeline. In my testing that can quite drastically cut down on
optimization time. But obviously solves the problem only to some degree,
since that's not free.


> Are you saying that if you crank jit_inline_above_cost beyond this
> query's total cost, the problem goes away?

FWIW, you can set the cost to -1 and it'll never inline.

Greetings,

Andres Freund




Re: Need free PG odbc driver for Windows 10

2020-07-28 Thread Thomas Kellerer

David Gauthier schrieb am 28.07.2020 um 18:40:

Hi:

I need a free odbc driver for PG to be installed on Windows 10 that
my user community can easily install.  By "easily install" I mean no
binaries, no zipped file, etc... just point-click-install (with the
usual confirmations and accepting default destinations for the code
and such).

Devart used to give this away and I have a copy on my laptop.  But
apparently it's not free anymore.>


https://odbc.postgresql.org/

And the downloads are here: https://www.postgresql.org/ftp/odbc/versions/




Re: Need free PG odbc driver for Windows 10

2020-07-28 Thread Adrian Klaver

On 7/28/20 9:40 AM, David Gauthier wrote:

Hi:

I need a free odbc driver for PG to be installed on Windows 10 that my 
user community can easily install.  By "easily install" I mean no 
binaries, no zipped file, etc... just point-click-install (with the 
usual confirmations and accepting default destinations for the code and 
such).


Devart used to give this away and I have a copy on my laptop.  But 
apparently it's not free anymore.


If it matters, the PG DB they will be accessing is on linux, psql 
(9.3.2, server 11.3)


See Thomas Kellerer's post for link to MSI downloads. Are your 
applications that use this 32 bit or 64 bit or both?  If there is 32 bit 
in the mix then you might want to take a look at:


https://odbc.postgresql.org/faq.html

6.8) Installing psqlODBC on 64bit Windows



Thanks !



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Gavin Flower

On 29/07/2020 03:51, Shaozhong SHI wrote:

Hi,

Please find the result of select name, setting, source from 
pg_settings where source <> 'default';


Regards,

Shao



On Tue, 28 Jul 2020 at 16:42, Michael Lewis > wrote:


On Tue, Jul 28, 2020 at 7:59 AM Shaozhong SHI
mailto:shishaozh...@gmail.com>> wrote:

Hi, Gavin,

PostgreSQL Version - 10.11. PostGIS Version - 2.4.7. Operating
System - Red Hat Enterprise Linux 7.7 .

That is all I know at the moment.

As I understand, our IT staff is building another one to sit
on Azure.


Please don't top-post in these mailing lists.


Shaozhong, you're top posting again!

Although, sometimes it is appropriate to intersperse your comments in 
the previos email, like I this





Can you share the results of the below?

select name, setting, source from pg_settings where source <>
'default';

Basically you're being asked to put your reply after the rest of the 
email, like I am doing here!




Cheers,
Gavin





Re: bad JIT decision

2020-07-28 Thread Tom Lane
Andres Freund  writes:
> On 2020-07-27 19:02:56 -0400, Alvaro Herrera wrote:
>>> I don't quite understand why is it that a table with 1000 partitions
>>> means that JIT compiles the thing 1000 times.  Sure, it is possible that
>>> some partitions have a different column layout, but it seems an easy bet
>>> that most cases are going to have identical column layout, and so tuple
>>> deforming can be shared.

> No, that's not what happens. The issue rather is that at execution time
> there's simply nothing tying the partitioned parts of the query together
> from the executor POV. Each table scan gets its own expressions to
> evaluate quals etc. That's not a JIT specific thing, it's general.

I think what Alvaro is imagining is caching the results of compiling
tuple-deforming.  You could hash on the basis of all the parts of the
tupdesc that the deforming compiler cares about, and then share the
compiled code across different relations with similar tupdescs.
That could win for lots-o-partitions cases, and it could win across
successive queries on the same relation, too.

Maybe the same principle could be applied to compiled expressions,
but it's less obvious that you'd get enough matches to win.

regards, tom lane




Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Shaozhong SHI
On Tue, 28 Jul 2020 at 20:36, Gavin Flower 
wrote:

> On 29/07/2020 03:51, Shaozhong SHI wrote:
> > Hi,
> >
> > Please find the result of select name, setting, source from
> > pg_settings where source <> 'default';
> >
> > Regards,
> >
> > Shao
> >
> >
> >
> > On Tue, 28 Jul 2020 at 16:42, Michael Lewis  > > wrote:
> >
> > On Tue, Jul 28, 2020 at 7:59 AM Shaozhong SHI
> > mailto:shishaozh...@gmail.com>> wrote:
> >
> > Hi, Gavin,
> >
> > PostgreSQL Version - 10.11. PostGIS Version - 2.4.7. Operating
> > System - Red Hat Enterprise Linux 7.7 .
> >
> > That is all I know at the moment.
> >
> > As I understand, our IT staff is building another one to sit
> > on Azure.
> >
> >
> > Please don't top-post in these mailing lists.
> >
> Shaozhong, you're top posting again!
>
> Although, sometimes it is appropriate to intersperse your comments in
> the previos email, like I this
>
>
> >
> > Can you share the results of the below?
> >
> > select name, setting, source from pg_settings where source <>
> > 'default';
> >
> Basically you're being asked to put your reply after the rest of the
> email, like I am doing here!
>
>
>
> Cheers,
> Gavin
>

Hi, Gavin,

Many thanks.  Is this way correct?  I am learning what is meant by 'top
posting'.

Regards,

Shao


Re: bad JIT decision

2020-07-28 Thread Andres Freund
Hi,

On 2020-07-28 11:54:53 +1200, David Rowley wrote:
> Is there some reason that we can't consider jitting on a more granular
> basis?

There's a substantial "constant" overhead of doing JIT. And that it's
nontrival to determine which parts of the query should be JITed in one
part, and which not.


> To me, it seems wrong to have a jit cost per expression and
> demand that the plan cost > #nexprs * jit_expr_cost before we do jit
> on anything.  It'll make it pretty hard to predict when jit will occur
> and doing things like adding new partitions could suddenly cause jit
> to not enable for some query any more.

I think that's the right answer though:

> ISTM a more granular approach would be better. For example, for the
> expression we expect to evaluate once, there's likely little point in
> jitting it, but for the one on some other relation that has more rows,
> where we expect to evaluate it 1 billion times, there's likely good
> reason to jit that.  Wouldn't it be better to consider it at the
> RangeTblEntry level?

Because this'd still JIT if a query has 10k unconditional partition
accesses with the corresponding accesses, even if they're all just one
row?

(I'm rebasing my tree that tries to reduce the overhead / allow caching
/ increase efficiency to current PG, but it's a fair bit of work)

Greetings,

Andres Freund




Is upper_inc ever true for dateranges?

2020-07-28 Thread Ken Tanzer
Hi.  Regardless of how I specify a daterange, it is converted to inclusive
lower bound, exclusive upper bound ('[)'):

SELECT daterange('2019-01-01','2020-01-01','(]') AS range;
  range
-
 [2019-01-02,2020-01-02)

So here's my question.  Will the upper_inc function always return false for
a non-null daterange?  And if so, what's the point of the function?  And/or
is it different for other kinds of ranges?

lower_inc at least seems to return true if lower bound is not null, but
false if it is null.


WITH r AS (SELECT daterange('2020-01-01','2020-01-31','(]') AS range)
SELECT range,upper_inc(range),lower_inc(range) FROM r;

  range  | upper_inc | lower_inc
-+---+---
 [2020-01-02,2020-02-01) | f | t

 range | upper_inc | lower_inc
---+---+---
 [2020-01-02,) | f | t

 range | upper_inc | lower_inc
---+---+---
 (,2020-01-02) | f | f

Thanks in advance!

Ken



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: Is upper_inc ever true for dateranges?

2020-07-28 Thread David G. Johnston
On Tue, Jul 28, 2020 at 2:19 PM Ken Tanzer  wrote:

> So here's my question.  Will the upper_inc function always return false
> for a non-null daterange?  And if so, what's the point of the function?
> And/or is it different for other kinds of ranges?
>

Ranges over discrete types are always canonicalized while ranges over
non-discrete types (i.e., float) cannot.

David J.


Re: Is upper_inc ever true for dateranges?

2020-07-28 Thread Ken Tanzer
On Tue, Jul 28, 2020 at 2:24 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Jul 28, 2020 at 2:19 PM Ken Tanzer  wrote:
>
>> So here's my question.  Will the upper_inc function always return false
>> for a non-null daterange?  And if so, what's the point of the function?
>> And/or is it different for other kinds of ranges?
>>
>
> Ranges over discrete types are always canonicalized while ranges over
> non-discrete types (i.e., float) cannot.
>
>
Oh that makes perfect sense.  Thank you!

Ken



> David J.
>
>
>

-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: bad JIT decision

2020-07-28 Thread Andres Freund
Hi,

On 2020-07-28 14:07:48 -0700, Andres Freund wrote:
> (I'm rebasing my tree that tries to reduce the overhead / allow caching
> / increase efficiency to current PG, but it's a fair bit of work)

FWIW, I created a demo workload for this, and repro'ed the issue with
that. Those improvements does make a very significant difference:

CREATE FUNCTION exec(text) returns text language plpgsql volatile
AS $f$
BEGIN
  EXECUTE $1;
  RETURN $1;
END;
$f$;
CREATE TABLE manypa(category text not null, data text not null) PARTITION BY 
LIST(category);
SELECT exec('CREATE TABLE manypa_'||g.i||' PARTITION OF manypa FOR VALUES 
IN('||g.i||')') FROM generate_series(1, 1000) g(i);
INSERT INTO manypa(category, data) VALUES('1', '1');

EXPLAIN ANALYZE SELECT * FROM manypa WHERE data <> '17' and data <> '15' and 
data <> '13' AND data <> '11' AND data <> '9' AND data <> '7' AND data <> '5' 
AND data <> '3' AND data <> '1';

Before:
Timing: Generation 335.345 ms, Inlining 51.025 ms, Optimization 11967.776 
ms, Emission 9201.499 ms, Total 21555.645 ms
IR size: unoptimized: 9022868 bytes, optimized: 6206368 bytes

After:
Timing: Generation 261.283 ms, Inlining 30.875 ms, Optimization 1671.969 
ms, Emission 18.557 ms, Total 1982.683 ms
IR size: unoptimized 8776100 bytes, optimized 115868 bytes

That obviously needs to be improved further, but it's already a lot
better. In particular after these changes the generated code could be
cached.


One thing that could make a huge difference here is to be able to
determine whether two expressions and/or tlists are equivalent
cheaply... I know that David has some need for that too.

Greetings,

Andres Freund




Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Michael Lewis
>
> Many thanks.  Is this way correct?  I am learning what is meant by 'top
> posting'.
>

Yes.

On the subject of your settings, I don't see anything for work_mem,
random_page_cost and other commonly tuned parameters. That would be a good
start. What sort of machine specs are there for Postgres instance? Cpu,
ram, disk type (spinning disk, ssd, etc)? Are any other applications
running on this machine?

Do you have an example query and explain analyze output? Are you aware of
https://explain.depesz.com/ ?


Re: Issues of slow running queries when dealing with Big Data

2020-07-28 Thread Shaozhong SHI
On Tue, 28 Jul 2020 at 22:43, Michael Lewis  wrote:

> Many thanks.  Is this way correct?  I am learning what is meant by 'top
>> posting'.
>>
>
> Yes.
>
> On the subject of your settings, I don't see anything for work_mem,
> random_page_cost and other commonly tuned parameters. That would be a good
> start. What sort of machine specs are there for Postgres instance? Cpu,
> ram, disk type (spinning disk, ssd, etc)? Are any other applications
> running on this machine?
>
> Do you have an example query and explain analyze output? Are you aware of
> https://explain.depesz.com/ ?
>


Hi, Michael,

Many thanks for pointing me to the right direction.   I do not know the
machine specs myself.  We have an IT person who helps my team.

As I understand, he is building a new PostGIS on Azure for us.  I will ask
him.

I would like to understand the matter much better for various reasons.  If
you are aware of technical papers or articles on the subject, it will be
useful to me.

I have been discussing the matter with another manager.  I am also thinking
about how to frame up a project as a formal project for the IT person.  If
you have ideas, please let me know.  Basically, approaches and methods,
areas to be investigated for tuning up performance.  So that, I can pass
the information to him.  He is technically excellent, but has not got much
experience in turning PostGIS for improving performance.

I need to do such type of work, in order to not only solve the problem, but
provide justifications to secure his job.

I hope that you understand.

Regards,

Shao


Re: determine what column(s) form the primary key, in C extention

2020-07-28 Thread David Rowley
On Wed, 29 Jul 2020 at 03:45, alex maslakov  wrote:
>  int i = -1;
>  while ((i = bms_next_member(pkattnos , i)) >= 0) {
>  /* do stuff with i */
>  /* you'll need to use i - FirstLowInvalidHeapAttributeNumber to
> get the pg_attribute.attnum */
>
>
>  elog(INFO, "bms_next_member i: %d", i);
>  }
>
> prints 10 and then 9
>
> Then:
>
>10 - FirstLowInvalidHeapAttributeNumber (-8) ==> 2
>
>9 - FirstLowInvalidHeapAttributeNumber (-8) ==> 1
>
> These are indexes of the columns, right?
>
> Do they start from 1, not from 0?

User attributes start at 1. Have a look at the pg_attribute system
catalogue table. The number you get will be the attnum column from
that table.

> (2)
>
> I'll use this C code as an example to build an extention in Rust. The
> Postgresql bindings for Rust I have don't contain a definition of
> `FirstLowInvalidHeapAttributeNumber` for some reason. I can define it
> since it's simply single digit constant.
>
> However what does in some source files it's defined as (-7) and in some
> as (-8)? Which should I use?

It did recently change from -8 to -7 when we removed Oid as a system
column in pg12. The number will never change on a major version, so
you'll always know what it is for versions that have already been
released.  There's always a chance it'll change from -7 in some future
PostgreSQL version though.

David




Re: is JIT available

2020-07-28 Thread David Rowley
On Wed, 29 Jul 2020 at 00:26, Scott Ribe  wrote:
> But does compilation with JIT enable and LLVM dev tools mean that all the 
> LLVM compilation/optimization is built into the PG binaries, or does it 
> require LLVM presence on the machine where deployed? And if so, does the 
> function take that into account as well?

It's not enough for just the build to have been built with jit
enabled. The jit extension must also be present on the machine. I
think the new wording in
https://www.postgresql.org/docs/devel/functions-info.html conveys
that:

"Returns true if a JIT compiler extension is available (see Chapter
31) and the jit configuration parameter is set to on."

David