Re: Optimizing the same PREPAREd static query (without parameters)

2019-01-07 Thread David Rowley
On Mon, 7 Jan 2019 at 18:54, Mitar  wrote:
> If I have a PREPAREd query without parameters (static) and I EXECUTE
> it repeatedly in the same session, does PostgreSQL learn/optimize
> anything across those runs?

Yes, it will generate the query plan on the first invocation of
EXECUTE and use that plan for all subsequent EXECUTEs for the session
until you DEALLOCATE the prepared query or DISCARD PLANS/ALL;

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Optimizing the same PREPAREd static query (without parameters)

2019-01-07 Thread Mitar
Hi!

On Mon, Jan 7, 2019 at 12:09 AM David Rowley
 wrote:
> On Mon, 7 Jan 2019 at 18:54, Mitar  wrote:
> > If I have a PREPAREd query without parameters (static) and I EXECUTE
> > it repeatedly in the same session, does PostgreSQL learn/optimize
> > anything across those runs?
>
> Yes, it will generate the query plan on the first invocation of
> EXECUTE and use that plan for all subsequent EXECUTEs for the session
> until you DEALLOCATE the prepared query or DISCARD PLANS/ALL;

So this sounds more like no? So the same plan is used, so PostgreSQL
is not further optimizing the plan based on any statistics or
anything?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Optimizing the same PREPAREd static query (without parameters)

2019-01-07 Thread David Rowley
On Mon, 7 Jan 2019 at 21:40, Mitar  wrote:
>
> On Mon, Jan 7, 2019 at 12:09 AM David Rowley
>  wrote:
> > On Mon, 7 Jan 2019 at 18:54, Mitar  wrote:
> > > If I have a PREPAREd query without parameters (static) and I EXECUTE
> > > it repeatedly in the same session, does PostgreSQL learn/optimize
> > > anything across those runs?
> >
> > Yes, it will generate the query plan on the first invocation of
> > EXECUTE and use that plan for all subsequent EXECUTEs for the session
> > until you DEALLOCATE the prepared query or DISCARD PLANS/ALL;
>
> So this sounds more like no? So the same plan is used, so PostgreSQL
> is not further optimizing the plan based on any statistics or
> anything?

You asked if it learned anything. I mentioned that it learns the query
plan, so I'd have said "Yes".

If you're asking if it caches the result and foregoes scanning the
underlying tables, then that's a  "No". Else what further optimising
did you have in mind?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Use bytearray for blobs or not?

2019-01-07 Thread Thomas Güttler

I came across this article, and now I think blob in s3 in s3-object-id in 
PostgreSQL
are the best solution for me:

https://wiki.postgresql.org/wiki/BinaryFilesInDB

I hope the wiki page is still valid.

Regards,
  Thomas


Am 04.01.19 um 12:41 schrieb Thomas Güttler:

Some months ago I wrote a little application with Python+Django which stores
blob data in bytearrays.

It works.

In the future there will be a lot more traffic, and I am unsure
if this is really a good solution. I know this is more FUD than
concrete PG issue.

What do you think?

Which alternatives could be useful?

Regards,
   Thomas






--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines



Re: Is there something wrong with my test case?

2019-01-07 Thread Thiemo Kellner



Hi HP

Thanks for your reply.

Quoting "Peter J. Holzer" :


On 2018-12-25 11:54:11 +, Thiemo Kellner wrote:
[three different but functionally equivalent queries]


Explain analyze verbose showed for:
A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0
loops=1)
C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0
loops=1)


626.97 doesn't seem "much higher" to me than 611.19. I would call that
"about the same".



So would I but the cost is given as a range. Taking the the average  
somewhat 400 compare to somewhat 300. I do not know whether averaging  
is appropriate here.



This is weird. C takes almost exactly twice as long as A, and while
humans aren't very good at estimating times, One second should feel
faster than two, not slower, and certainly not slower by far. Is it
possible that your subjective impression wasn't based on the executions
you posted but on others? Caching and load spikes can cause quite large
variations in run time, so running the same query again may not take the
same time (usually the second time is faster - sometimes much faster).


I am pretty sure not to have confused the association of my felt time  
measure to the query. I did run the script several times but as the  
script create everything anew this might not have any impact caching  
wise. However, if caching had an impact it just would add to the  
discrepancy between feeling that the first statement was much faster  
than the supposedly optimal statement. Being as may, there is still  
fact that cost for A was estimated  about the same as C or much higher  
but A was executed in half of the time of C.


--
Signal: +49 1578 7723737
Handys: +41 78 947 36 21 | +49 1578 772 37 37
Tox-Id:  
B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B





Re: Is there something wrong with my test case?

2019-01-07 Thread Thiemo Kellner

Hi David

Thanks for your revision.

Quoting David Rowley :


On Wed, 26 Dec 2018 at 00:54, Thiemo Kellner
 wrote:

Explain analyze verbose showed for:
A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117
rows=0 loops=1)
B (cost=264.72..974.25 rows=31 width=58) (actual time=1.508..1.508
rows=0 loops=1)
C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217
rows=0 loops=1)

I am very surprised that the cost of A is (much) higher than that of C
which I suspected to be the most inefficient. I was that much fixed on
the costs that I initially ignored the actual time where my
assumptions on efficiency are reflected. Funny though is that the
subjective impression when waiting for the update queries to complete
was that C was fastest by far, followed by B and only at the end was
update A.


While the times mentioned in "actual time" are for execution only and
don't account for the time taken to plan the query, the results you
put in [1] disagree entirely with your claim that 'C' was faster. 'A'
comes out fastest with the explain analyzes you've listed.

A:
Planning TIME: 0.423 ms
Execution TIME: 1.170 ms

C:
Planning TIME: 0.631 ms
Execution TIME: 2.281 ms

Have you confused each of the results, perhaps because they're in a
different order as to your cases above?


I am pretty sure I did not confuse. I am not worried about planning  
times as I assume that PostgreSQL has a time limit restricting the  
time used to find the best execution path in the order of seconds such  
that for a heavy load query it would get neglectable.



I'd certainly expect 'A' to be the fastest of the bunch since it's
both less effort for the planner and also the executor.  I didn't look
at why the cost is estimated to be slightly higher, but the planner
wouldn't consider rewriting the queries to one of the other cases
anyway, so it's likely not that critical that the costings are
slightly out from reality.


I am glad, that my feeling what should be the best query meets  
reality. However, I am left a bit concerned about the usefulness of  
the costs of the execution plan. I feel the costs rather contradict  
the actual execution times in my test case. To me this would render  
the cost useless for comparison of queries.



where U.KEY_U in ({correlated subquery 3})


This is not correlated in [1].

[1]  https://pastebin.com/W2HsTBwi


Right you are, my fault. Thanks for your attention there as well. :-)

--
Signal: +49 1578 7723737
Handys: +41 78 947 36 21 | +49 1578 772 37 37
Tox-Id:  
B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B





Re: Is there something wrong with my test case?

2019-01-07 Thread Fabio Pardi
Hi Thiemo,


On 07/01/2019 11:30, Thiemo Kellner wrote:
>
> Hi HP
>
> Thanks for your reply.
>
> Quoting "Peter J. Holzer" :
>
>> On 2018-12-25 11:54:11 +, Thiemo Kellner wrote:
>> [three different but functionally equivalent queries]
>>
>>> Explain analyze verbose showed for:
>>> A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0
>>> loops=1)
>>> C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0
>>> loops=1)
>>
>> 626.97 doesn't seem "much higher" to me than 611.19. I would call that
>> "about the same".
>>
>
> So would I but the cost is given as a range. Taking the the average somewhat 
> 400 compare to somewhat 300. I do not know whether averaging is appropriate 
> here.


The cost is not a range. The 2 numbers you see are:

  *

Estimated start-up cost. This is the time expended before the output phase 
can begin, e.g., time to do the sorting in a sort node.

  * Estimated total cost. This is stated on the assumption that the plan node 
is run to completion, i.e., all available rows are retrieved. In practice a 
node's parent node might stop short of reading all available rows (see the 
LIMIT example below).

As you can read here:

https://www.postgresql.org/docs/current/using-explain.html


regards,

fabio pardi




Re: Is there something wrong with my test case?

2019-01-07 Thread Thiemo Kellner

Hi Fabio

Quoting Fabio Pardi :


The cost is not a range. The 2 numbers you see are:


  *
Estimated start-up cost. This is the time expended before the  
output phase can begin, e.g., time to do the sorting in a sort node.


  *Estimated total cost. This is stated on the assumption that  
the plan node is run to completion, i.e., all available rows are  
retrieved. In practice a node's parent node might stop short of  
reading all available rows (see the LIMIT example below).


  As you can read here:

  https://www.postgresql.org/docs/current/using-explain.html


Thanks for pointing this out. I was mislead than be the form those two  
different things are displayed.

--
Signal: +49 1578 7723737
Handys: +41 78 947 36 21 | +49 1578 772 37 37
Tox-Id:  
B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B





Which queries have run query trace form .exe.

2019-01-07 Thread Vimal Kanzariya
*Hello,*

I need find out some database operation which is executed form One of the
.exe. I have installed that .exe into my system and once i executed that
.exe it is executed some operation into PostgreSQL 9.5 (version 1.22.1). So
i want to know what operation done by that .exe into which table. is there
any tool or any other way to find out database operation executed form
.exe.?

-- 
*Thanks*
*Vimal Kanzariya*


Re: Which queries have run query trace form .exe.

2019-01-07 Thread Ron

On 1/7/19 6:52 AM, Vimal Kanzariya wrote:

*Hello,*

I need find out some database operation which is executed form One of the 
.exe. I have installed that .exe into my system and once i executed that 
.exe it is executed some operation into PostgreSQL 9.5 (version 1.22.1). 
So i want to know what operation done by that .exe into which table. is 
there any tool or any other way to find out database operation executed 
form .exe.?


Look at the log_* options in postgresql.conf. 
https://www.postgresql.org/docs/9.5/runtime-config-logging.html


Here's an example:
log_destination = 'stderr'
logging_collector = on
log_directory = '/Database/9.6/pg_log'
log_filename = 'postgresql-%F_%H.log'
log_truncate_on_rotation = off
log_rotation_age = 1h
log_rotation_size = 0
client_min_messages =  log
log_min_messages = info
log_min_error_statement = info
log_line_prefix = '%m %r %u %d %p %i %a %e '


--
Angular momentum makes the world go 'round.


Not sure which part of the query needs optimization

2019-01-07 Thread Alexander Farber
Good afternoon,

for each visitor of my website I generate a JSON list of 30 top players (
https://slova.de/words/top.php ), who played in the past week, with their
average scores and average time between moves.

With 5 seconds this query is taking quite a bit of time:
https://explain.depesz.com/s/wMMV

I have noticed the

Seq Scan on words_moves (cost=0.00..81,448.79 rows=1,747 width=4)
(actual time=0.443..161.673 rows=15,009 loops=30)
Filter: (uid = u.uid)
Rows Removed by Filter: 1494728

and added a:

CREATE INDEX ON words_moves(uid);

which has improved the query time to 800ms:
https://explain.depesz.com/s/xgv1

However now I am not sure anymore, what else could be improved in my query:

WITH last_week_moves AS (
SELECT
m.gid,
m.uid,
m.played - LAG(m.played) OVER(PARTITION BY
m.gid ORDER BY played) AS diff
FROM words_moves m
JOIN words_games g ON (m.gid = g.gid AND m.uid in
(g.player1, g.player2))
-- only show players who where active in the last
week
WHERE m.played > CURRENT_TIMESTAMP - interval '1
week'
)
SELECT
u.uid,
u.elo,
(SELECT TO_CHAR(AVG(diff), 'HH24:MI') FROM
last_week_moves WHERE uid = u.uid) AS avg_time,
(SELECT ROUND(AVG(score), 1) FROM words_moves WHERE
uid = u.uid) AS avg_score,
s.given,
s.photo
FROM words_users u
JOIN words_social s USING (uid)
WHERE u.elo > 1500
-- take the most recent record from words_social
AND NOT EXISTS (SELECT 1
FROM words_social x
WHERE s.uid = x.uid
AND x.stamp > s.stamp)
-- only show players who where active in the last week
AND EXISTS (SELECT 1
FROM last_week_moves m
WHERE u.uid = m.uid
AND m.diff IS NOT NULL)
ORDER BY u.elo DESC
LIMIT 30;

I have tried changing to LEFT JOIN LATERAL as in -

WITH last_week_moves AS (
SELECT
m.gid,
m.uid,
m.played - LAG(m.played) OVER(PARTITION BY
m.gid ORDER BY played) AS diff
FROM words_moves m
JOIN words_games g ON (m.gid = g.gid AND m.uid in
(g.player1, g.player2))
-- only show players who where active in the last
week
WHERE m.played > CURRENT_TIMESTAMP - interval '1
week'
)
SELECT
u.uid,
u.elo,
(SELECT TO_CHAR(AVG(diff), 'HH24:MI') FROM
last_week_moves WHERE uid = u.uid) AS avg_time,
(SELECT ROUND(AVG(score), 1) FROM words_moves WHERE
uid = u.uid) AS avg_score,
s.given,
s.photo
FROM words_users u
-- take the most recent record from words_social
LEFT JOIN LATERAL (SELECT * FROM words_social AS s WHERE
s.uid = u.uid ORDER BY s.stamp DESC LIMIT 1) AS s ON TRUE
WHERE u.elo > 1500
-- only show players who where active in the last week
AND EXISTS (SELECT 1
FROM last_week_moves m
WHERE u.uid = m.uid
AND m.diff IS NOT NULL)
ORDER BY u.elo DESC
LIMIT 30;

But this has not helped much (please see https://explain.depesz.com/s/PrF
or the same output below) :


 QUERY PLAN
-
 Limit  (cost=89502.34..272686.22 rows=30 width=160) (actual
time=215.796..842.446 rows=30 loops=1)
   CTE last_week_moves
 ->  WindowAgg  (cost=87308.08..87316.34 rows=367 width=32) (actual
time=183.075..193.613 rows=33221 loops=1)
   ->  Sort  (cost=87308.08..87309.00 rows=367 width=16) (actual
time=183.069..184.359 rows=33221 loops=1)
 Sort Key: m_1.gid, m_1.played
 Sort Method: quicksort  Memory: 4132kB
 ->  Gather  (cost=13632.94..87292.45 rows=367 width=16)
(actual time=37.204..172.827 rows=33221 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=12632.94..86255.75 rows=153
width=16) (actual time=36.183..168.676 rows=11074 loops=3

Re: Adding LEFT JOIN to a query has increased execution time 10 times

2019-01-07 Thread Alexander Farber
Hi Andrew -

On Mon, Jan 7, 2019 at 12:00 AM Andrew Gierth 
wrote:

> > "Alexander" == Alexander Farber  writes:
>  Alexander> With the further help of the IRC folks the query has been
>  Alexander> optimized (7-10 seconds -> 0.3 second)
>
> 0.3 MILLIseconds, actually.
>
> (You chanced not to catch me around on IRC, but I see that didn't
> matter.)
>

it is amazing to see how the query I got used to be so slow in the past
months is suddenly returning so quickly.

Yes, I figured I have missed you on the IRC yesterday, but the advice to go
there (haven't used it for 20 years) has been very helpful anyway.


>  Alexander> CREATE INDEX ON words_moves (gid, played DESC);
>  Alexander> CREATE INDEX ON words_social (uid, stamp DESC);
>
> I'm not a big fan of using DESC on indexes; it's almost never needed,
> because any btree index can be scanned in reverse. (You only actually
> need it if you're mixing ASC/DESC orderings in an ORDER BY and want an
> index that matches it.)
>

Ok good to know, I will remove the DESC from the indices.

Regards
Alex


Re: About SSL connection

2019-01-07 Thread Adrian Klaver

On 1/7/19 2:51 AM, John Mikel wrote:

Please reply to list also
Ccing list


Thanks,
after setting the postgresql.conf file nothing changes (ssl = off)
is this a bug or what ?


Did you restart the server?

Per Ron's post what are the error messages in the Postgres and/or system 
logs?




how to enable SSL?what you need to that ?


https://www.postgresql.org/docs/10/ssl-tcp.html




Le dim. 6 janv. 2019 à 21:01, Adrian Klaver > a écrit :


On 1/6/19 6:10 AM, John Mikel wrote:
 >
 > Hi,
 > i was trying to enable SSL on postgresql 10 and 11 Servers after
searching
 > on postregsql documentation i found this :
 >
 > ALTER SYSTEM SET/|configuration_parameter|/  { TO | = }
{/|value|/  | '/|value|/' | DEFAULT }
 >
 > on https://www.postgresql.org/docs/10/sql-altersystem.html
 > i used
 >
 > ALTER SYSTEM SET SSL = ON , querry executed successfully
 >
 > after that i entred this Select *  from pg_settings , i saw that
ssl value is off,
 >
 > after  stoping the service of postgresql it wont start again

In addition to what Ron said about checking the logs, you could change
the ssl setting back to 'off' in the postgresql.conf file.

 >
 > i tried this on both servers 11 and 10 the same problem (the
server wont start again)
 >
 > I'm using widonws 7 x64.
 >
 > my question is  how to enbale SSL ?
 >
 > ps : i created all require cert files.
 >
 > sorry for my bad english.
 >


-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Optimizing the same PREPAREd static query (without parameters)

2019-01-07 Thread Mitar
Hi!

On Mon, Jan 7, 2019 at 12:44 AM David Rowley
 wrote:
> If you're asking if it caches the result and foregoes scanning the
> underlying tables, then that's a  "No". Else what further optimising
> did you have in mind?

For example, it could learn better statistics. In documentation [1] it
is written:

> A generic plan assumes that each value supplied to EXECUTE is one of the 
> column's distinct values and that column values are uniformly distributed. 
> For example, if statistics record three distinct column values, a generic 
> plan assumes a column equality comparison will match 33% of processed rows. 
> Column statistics also allow generic plans to accurately compute the 
> selectivity of unique columns.

So it could learn that the values used are not distinct values, or
that column values are not uniformly distributed? And maybe decide to
change the plan? So it makes a plan, runs it, determines that the plan
was not as good as expected, I run it again, it decides to try another
plan. It is better, it decides to switch to it and keep it.

[1] https://www.postgresql.org/docs/devel/sql-prepare.html


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Not sure which part of the query needs optimization

2019-01-07 Thread Andrew Gierth
> "Alexander" == Alexander Farber  writes:

 Alexander> Good afternoon,

 Alexander> for each visitor of my website I generate a JSON list of 30
 Alexander> top players ( https://slova.de/words/top.php ), who played
 Alexander> in the past week, with their average scores and average time
 Alexander> between moves.

 -> Parallel Seq Scan on words_moves m_1
 (cost=0.00..73600.05 rows=8666 width=16)
 (actual time=0.761..130.844 rows=11074 loops=3)
  Filter: (played > (CURRENT_TIMESTAMP - '7 days'::interval))
  Rows Removed by Filter: 492241

This is telling you that an index on words_moves(played) would likely
help.

But the real hot point in the query is here (reformatted for clarity):

 -> Aggregate  (cost=6097.83..6097.84 rows=1 width=32)
   (actual time=19.401..19.401 rows=1 loops=30)
-> Bitmap Heap Scan on words_moves
(cost=33.97..6093.45 rows=1748 width=4)
(actual time=1.680..18.153 rows=15011 loops=30)
Recheck Cond: (uid = u.uid)
Heap Blocks: exact=216312
   -> Bitmap Index Scan on words_moves_uid_idx
   (cost=0.00..33.54 rows=1748 width=0)
   (actual time=0.979..0.979 rows=15011 loops=30)
   Index Cond: (uid = u.uid)

(the explain.depesz.com view points this out with an orange highlight)

This corresponds to this subquery:

  (SELECT ROUND(AVG(score), 1)
 FROM words_moves
WHERE uid = u.uid) AS avg_score,

The basic issue here is that you're calculating an average over around
15k rows per user, for each user in the query result (so 30 times,
making 450k rows). You can see from the "Heap Blocks" stat that this is
having to scan a lot of data; it's taking on average 19.4ms per user,
but multiply that by 30 users and you get ~580ms total, or about 70% of
the total execution time.

The obvious thing to do is to keep a computed average score for each
user - either in a separate table which you update based on changes to
words_moves, which you could do with a trigger, or using a materialized
view which you refresh at suitable intervals (this has the drawback that
the data will not be immediately up-to-date).

Combining these two changes should get you to under 100ms, maybe.

-- 
Andrew (irc:RhodiumToad)



Re: Memory exhaustion due to temporary tables?

2019-01-07 Thread Thomas Carroll
 > In the meantime, you might think about switching over to some process
> that doesn't create and drop the same table constantly.  Perhaps
> along the lines of

> create temp table if not exists tt_preTally (...) on commit delete rows;

> if (tt_preTally contains no rows) then
>  insert into tt_preTally select ...
> end if;

> This'd have the advantage of reducing catalog churn in other catalogs
> besides pg_statistic, too.
Just to close the loop, doing the CREATE TEMP TABLE IF NOT EXISTS etc. helped a 
lot.  Still a tiny increase in memory over time, but this is much much better.
Thanks for the help, and amazed at the responsiveness of Mr. Lane and this 
community!

Tom Carroll

 On Monday, December 10, 2018, 4:59:47 PM EST, Laurenz Albe 
 wrote:  
 
 Thomas Carroll wrote:
> We have a situation where a long-persistent Postgres connection consumes more
> and more memory.  If ignored, we eventually get “Cannot allocate memory”
> errors in the Postgres log.  If still ignored, the box will eventually crash.
> This takes about 3 weeks to happen.  It issues a call to a single function
> about once every 15 seconds.  What can I do to prevent this outcome?

There should be a memory context dump in the log file.
What does it look like?

Yours,
Laurenz Albe
-- 
Cybertec | Startseite - Cybertec

| 
| 
|  | 
Startseite - Cybertec

The PostgreSQL Company CYBERTEC – Ihr professioneller PostgreSQL-Partner für 
Support, Consulting, Trainings, … s...
 |

 |

 |



Thanks for your reply.  There are many memory context dumps