Locks analysis after-the-fact

2018-04-27 Thread Olivier Macchioni
Hello all,

PostgreSQL version 9.4.17

We have a number of queries running on the same DB from many systems. Among 
other things, we need to INSERT / UPDATE on a table based on external events - 
this is done via a pool of 5 SQL connections which process an average of 500 
events / second. The table is used by other systems as well...

We have more and more locks on this table, which prevents it to be updated as 
it should - and I'm trying to see how to improve things without setting up an 
external queue system just to manage those events.

I've enabled "log_lock_waits = on" in the configuration, which generates the 
following kind of log entries:

2018-04-27 07:48:07 CEST [1419.269] "[unknown]" xx@yy LOG:  process 1419 still 
waiting for ExclusiveLock on advisory lock [16417,1653339453,2672182422,2] 
after 1000.166 ms
2018-04-27 07:48:07 CEST [1419.270] "[unknown]" xx@yy DETAIL:  Process holding 
the lock: 16418. Wait queue: 1419.
2018-04-27 07:48:07 CEST [1419.271] "[unknown]" xx@yy STATEMENT:  SELECT 
pg_advisory_xact_lock(1653339453,  -1622784874);
2018-04-27 07:48:07 CEST [1419.272] "[unknown]" xx@yy LOG:  process 1419 
acquired ExclusiveLock on advisory lock [16417,1653339453,2672182422,2] after 
1318.721 ms
2018-04-27 07:48:07 CEST [1419.273] "[unknown]" xx@yy STATEMENT:  SELECT 
pg_advisory_xact_lock(1653339453,  -1622784874);

I can easily identify the session owner of the different processes, but the 
process holding the lock belong to an ERP which generates a lot of different 
kind of queries - I'd like to identify precisely which one is causing this 
long-lasting lock.

It doesn't look like this is possible via the PostgreSQL logging features, and 
doing the usual lock troubleshooting "on the spot" using pg_locks is not 
practical due to the short duration and odd timing of those locks.

Does anyone have an idea on how to process in such a case?

Thank you,

Olivier


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Locks analysis after-the-fact

2018-04-27 Thread Olleg Samoylov
On 2018-04-27 10:55, Olivier Macchioni wrote:
> Does anyone have an idea on how to process in such a case?
>
Log statments too. :)



Re: Locks analysis after-the-fact

2018-04-27 Thread Olivier Macchioni

> On 27 Apr 2018, at 11:58, Olleg Samoylov  wrote:
> 
> On 2018-04-27 10:55, Olivier Macchioni wrote:
>> Does anyone have an idea on how to process in such a case?
>> 
> Log statments too. :)
> 

Thank you - I was afraid of such an answer and on the load it may generate...

I'll give it a try during a quiet hour of the day

Olivier


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Asynchronous Trigger?

2018-04-27 Thread Olleg Samoylov

  

  
  
Try to look at PGQ from SkyTools.

On 2018-03-30 01:29, Cory Tucker wrote:


  
Is it possible to have the execution of a trigger (or any
  function) not block the completion of the statement they are
  associated with?


A pattern I had hoped to implement was to do a quick update
  of rows that signaled they needed attention, and then an async
  per-row trigger would come and do the maintenance (in this
  case, make an expensive materialized view).


Any suggestions welcome.


thanks!
--Cory
  


  




Re: pg_stat_statements : how to catch non successfully finished statements ?

2018-04-27 Thread Arthur Zakirov
Hello,

On Thu, Apr 26, 2018 at 01:24:25PM -0700, legrand legrand wrote:
> Hello all,
> 
> I was wondering if there is a hook to collect non successfully finished SQL
> statements in pg_stat_statements (timed-out, cancelled, killed, or simply
> errored) ?

Some time ago I looked for a such hook. My case was to log failed access
permissions checks. I didn't find a way to do it, except adding a
narrow-focused hook within ExecCheckRTPerms().

There is ErrorContextCallback. Context callbacks are called when an
error was raised. But I don't think that it is a good approach for
pg_stat_statements cases.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



invalid byte sequence for encoding "UTF8": 0xff

2018-04-27 Thread Kris Olson
I just did this on all my import dump files:

perl -p -i -e 's/\xff//g' *.dump

Kris

PUBLIC RECORDS NOTICE: In accordance with NRS Chapter 239, this email and 
responses, unless otherwise made confidential by law, may be subject to the 
Nevada Public Records laws and may be disclosed to the public upon request.


Re: Parameter passing in trigger function write in C

2018-04-27 Thread Adrian Klaver

On 04/26/2018 06:48 PM, a wrote:
hey thanks mate, I have red that. But the actual data retirement seems 
to be combination of that and the SPI_get* functions. I'm still testing 
on it. By the way, if I can pass parameters that is out of the scope of 
sql statement??



Can you be more specific about what parameters you are looking to pass in?




-- Original --
*From:* "Adrian Klaver";
*Date:* Friday, Apr 27, 2018 9:36 AM
*To:* "a"<372660...@qq.com>; "pgsql-general";
*Subject:* Re: Parameter passing in trigger function write in C

On 04/26/2018 06:13 PM, a wrote:
 > Hey everyone:
 >
 > I was request to write a trigger that perform the following function:
 >
 > 1, The trigger will be set to exam all statement that will result change
 > of datas
 >
 > 2, The trigger will exam three information other than the data updated
 > or inserted: user name, password and reasons of change
 >
 > 3, It will record down, in some format, the time and content of the 
change.

 >
 > Therefore, I would need some way of passing parameters into the trigger
 > function. Can anyone give me some instructions on how to do it or, if it
 > is possible??

https://www.postgresql.org/docs/10/static/trigger-interface.html 




--
Adrian Klaver
adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 19:13:17 +0300
Vitaliy Garnashevich  wrote:

> We're trying to populate a table with aggregated data from other
> tables. For that we're running a huge INSERT+SELECT query which joins
> several tables, aggregates values, and then inserts the results into
> another table. The problem we're facing is that while the query is
> running , some records in tables referenced by the results table may
> be deleted, which causes the following error:
> 
> ERROR:  insert or update on table "..." violates foreign key
> constraint "..."
> DETAIL:  Key (...)=(...) is not present in table "...".
> 
> Who do we make sure that such aggregating query would not fail?

Create a temporary table with a useful subset of the data.

You can select the mininimum number of columns joined and release
the locks. This can also help large queries by giving you a stable
snapshot of the data for repeated queries. 

I usually find that pre-joining the tables is easier because 
temp tables have restrictions on re-use w/in the query, and also
usually don't have indexes to speed up the joins.

If you are going to run this, say, daily it's easy enough to create
a view and just "create temporary table foo as select * from bar"
for some collection of views and go from there. This makes it easier
to tune the queries on the back end without having to hack the front
end code.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Parameter passing in trigger function write in C

2018-04-27 Thread a
Yep, I wanna pass the brief reason of manipulating(update, insert or delete), 
name and password of the manipulator (This might be done by database itself). 



-- Original --
From: "Adrian Klaver"; 
Date: Friday, Apr 27, 2018 10:16 PM
To: "a"<372660...@qq.com>; "pgsql-general"; 
Subject: Re: Parameter passing in trigger function write in C



On 04/26/2018 06:48 PM, a wrote:
> hey thanks mate, I have red that. But the actual data retirement seems 
> to be combination of that and the SPI_get* functions. I'm still testing 
> on it. By the way, if I can pass parameters that is out of the scope of 
> sql statement??


Can you be more specific about what parameters you are looking to pass in?

> 
> 
> -- Original --
> *From:* "Adrian Klaver";
> *Date:* Friday, Apr 27, 2018 9:36 AM
> *To:* "a"<372660...@qq.com>; "pgsql-general";
> *Subject:* Re: Parameter passing in trigger function write in C
> 
> On 04/26/2018 06:13 PM, a wrote:
>  > Hey everyone:
>  >
>  > I was request to write a trigger that perform the following function:
>  >
>  > 1, The trigger will be set to exam all statement that will result change
>  > of datas
>  >
>  > 2, The trigger will exam three information other than the data updated
>  > or inserted: user name, password and reasons of change
>  >
>  > 3, It will record down, in some format, the time and content of the 
> change.
>  >
>  > Therefore, I would need some way of passing parameters into the trigger
>  > function. Can anyone give me some instructions on how to do it or, if it
>  > is possible??
> 
> https://www.postgresql.org/docs/10/static/trigger-interface.html 
> ;
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com 


-- 
Adrian Klaver
adrian.kla...@aklaver.com

Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 21:08:01 +0300
Vitaliy Garnashevich  wrote:

>      INSERT INTO cmdb_sp_usage_history
>    (created_by, updated_by, created_on, updated_on, mod_count,
>    summary_on, quarter, product, used_from, "user",
>    keystrokes, minutes_in_use, times_started, avg_keystrokes, 
> max_keystrokes, spkg_operational)
>      SELECT
>     2, 2, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 1,
>     CURRENT_TIMESTAMP, quarter.id, "spv"."product", 
> "usage"."used_from", "usage"."user",
>     coalesce(sum("usage"."keystrokes"), 0),
>     coalesce(sum("usage"."minutes_in_use"), 0),
>     coalesce(sum("usage"."times_started"), 0),
>     coalesce(avg("usage"."keystrokes"), 0),
>     coalesce(max("usage"."keystrokes"), 0),
>     bool_or("cmdb_ci"."operational")
>      FROM
>    "cmdb_program_daily_usage" "usage"
>      LEFT OUTER JOIN
>    "cmdb_program_instance" "p" ON "p"."id" =
> "usage"."program_instance" LEFT OUTER JOIN
>    "cmdb_ci_spkg" "s" ON "s"."id" = "p"."spkg"
>      LEFT OUTER JOIN
>    "cmdb_ci" "cmdb_ci" ON "s"."id" = "cmdb_ci"."id"
>      LEFT OUTER JOIN
>    "cmdb_software_product_version" "spv" ON "spv"."id" =
> "s"."software" WHERE ("usage"."minutes_in_use" > 0)
>    AND ((NOT ("s"."software" IS NULL))
>     AND ((NOT ("s"."os" = TRUE))
>      OR ("s"."os" IS NULL)))
>    AND ("usage"."usage_date" >= quarter.start_date)
>    AND ("usage"."usage_date" < quarter.end_date)
>      GROUP BY "spv"."product", "usage"."used_from", "usage"."user"
>      HAVING (coalesce(sum("usage"."keystrokes"), 0) > 0) OR 
> (coalesce(sum("usage"."minutes_in_use"), 0) > 0) OR 
> (coalesce(sum("usage"."times_started"), 0) > 0)
>      ORDER BY "spv"."product", "usage"."used_from", "usage"."user";

create temporary table
foobar 

select

from

;

This isolates the lock time to performing the bare select, after
which you can coalesce and sum to your heart's content without 
locking any of it.

The point is performing the absolute minimum of processing to 
generate the temp table so as to release any locks quickly and
avoid "group by" in the main join.

Yes, this might end up creating a large-ish temp table :-)

One other approach would be selecting only incremental data
(e.g., daily) which locks a much smaller subset of the rows
and aggregating the daily totals into quarterly, whatever.
Call it daily usage, select where usage_date = today's or
timestamp && a tstzrange of ( , 2400, [) ). That might
also simplify your query logic: all the coalesce op's end up
in your daily/weekly/monthly/whatever summary, the quarterly 
values in the reporting are just sum X group by.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Thu, 26 Apr 2018 23:32:33 +0300
Vitaliy Garnashevich  wrote:

> The aggregated table has hundreds of millions of rows, and the query 
> runs for many hours (which is one of the reasons why it's better not
> to fail). I really doubt that row level locking would work. That
> would be a lot of RAM just to hold all the locks.

All the more reason to use temp tables.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Parameter passing in trigger function write in C

2018-04-27 Thread Adrian Klaver

On 04/27/2018 07:25 AM, a wrote:
Yep, I wanna pass the brief reason of manipulating(update, insert or 


https://www.postgresql.org/docs/10/static/trigger-interface.html
"
tg_event

Describes the event for which the function is called. You can use 
the following macros to examine tg_event


...

TRIGGER_FIRED_BY_UPDATE(tg_event)

Returns true if the trigger was fired by an UPDATE command.
"


delete), name and password of the manipulator (This might be done by 
database itself).


As for user:

https://www.postgresql.org/docs/10/static/functions-info.html

"current_user  nameuser name of current execution context"

Password?:

https://www.postgresql.org/docs/10/static/view-pg-shadow.html

Though if you are using password encryption it will be encrypted.




-- Original --
*From:* "Adrian Klaver";
*Date:* Friday, Apr 27, 2018 10:16 PM
*To:* "a"<372660...@qq.com>; "pgsql-general";
*Subject:* Re: Parameter passing in trigger function write in C

On 04/26/2018 06:48 PM, a wrote:
 > hey thanks mate, I have red that. But the actual data retirement seems
 > to be combination of that and the SPI_get* functions. I'm still testing
 > on it. By the way, if I can pass parameters that is out of the scope of
 > sql statement??


Can you be more specific about what parameters you are looking to pass in?

 >
 >
 > -- Original --
 > *From:* "Adrian Klaver";
 > *Date:* Friday, Apr 27, 2018 9:36 AM
 > *To:* "a"<372660...@qq.com >; "pgsql-general";
 > *Subject:* Re: Parameter passing in trigger function write in C
 >
 > On 04/26/2018 06:13 PM, a wrote:
 >  > Hey everyone:
 >  >
 >  > I was request to write a trigger that perform the following function:
 >  >
 >  > 1, The trigger will be set to exam all statement that will result 
change

 >  > of datas
 >  >
 >  > 2, The trigger will exam three information other than the data updated
 >  > or inserted: user name, password and reasons of change
 >  >
 >  > 3, It will record down, in some format, the time and content of the
 > change.
 >  >
 >  > Therefore, I would need some way of passing parameters into the 
trigger
 >  > function. Can anyone give me some instructions on how to do it or, 
if it

 >  > is possible??
 >
 > https://www.postgresql.org/docs/10/static/trigger-interface.html 

 >  
>;

 >
 >
 > --
 > Adrian Klaver
 > adrian.kla...@aklaver.com  
>



--
Adrian Klaver
adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Locks analysis after-the-fact

2018-04-27 Thread Tom Lane
Olleg Samoylov  writes:
> On 2018-04-27 10:55, Olivier Macchioni wrote:
>> Does anyone have an idea on how to process in such a case?

> Log statments too. :)

Another idea is to get the app to set application_name differently for
each session (and be sure to include %a in log_line_prefix).  It might
not be practical in this particular case, but it's a good tool to keep
in mind.

regards, tom lane



Re: Rationale for aversion to the central database?

2018-04-27 Thread Steven Lembark
On Sun, 8 Apr 2018 14:39:49 -0700
Guyren Howe  wrote:

> I am a Rails developer at a medium-large size company. I’ve mostly
> worked at smaller companies. I’ve some exposure to other web
> development communities.
> 
> When it comes to databases, I have universally encountered the
> attitude that one should treat the database as a dumb data bucket.
> There is a *very* strong aversion to putting much of any business
> logic in the database. I encounter substantial aversion to have
> multiple applications access one database, or even the reverse: all
> abstraction should be at the application layer.
> 
> My best theory is that these communities developed at a time when
> Windows was more dominant, and just generally it was *significantly*
> easier to use MySQL than Postgres for many, particularly new,
> developers. And it is pretty reasonable to adopt an aversion to
> sophisticated use of the database in that case.

The biggest single problem in most cases is a combination
of communiction and attitude.

Recall that most programmers are not DBA's and don't have access to
create anything in the database. Traditionally the High Priests of 
Holy Data don't allow mere peons to create or modify content in the
databases. Result is that programmers *have* to do most of their 
logic in code, even if the results put hugely more load on the servers.

The DBA's are also not Well Qualified Genious Programmers who obviously
know much more than mere DBA's about what the real use of data is 
supposed to be. They don't need no stinkin' SQL! They have the magic
of ORM!

Net result: People talk past one another.

I've nearly been lynched for creating ten-row temporary tables without
prior written permission or even suggesting that we might add indexes
to support more effecient use of views.

I've also nearly strung programmers up for running hundreds of 
"select * from table" statements because they weren't willing to 
figure out how to write a join or simple filter or give me enough
data to do it for them.

Good example are ORM packages: Most of really hideous code required
for joins, group by, coalesce can easily be pushed into views allowing
simple select X from Y where or one-line joins. DBA's can craft views
that make sense, programmers could use them to simplify their code.

 Q: When is the last place you worked where DBAs were pigs (vs.
chickens)?

 Q: How often was a DBA involved in the analysis and design of cycle
of reporting?

 Q: How many programming projects have you been on where the hackers
were willing to write clear, concise data reqirements and keep
them up to date?

When that happens people will use the databases as tools, ORM code
will be [semi-] maintainable, and we'll live happily for ever after.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: decompose big queries

2018-04-27 Thread Steven Lembark

> Hi,
> I want to know what are the best practice to use in order to
> decompose a big query which contains so many joins.Is it recommended
> to use stored procedures ? or is there any other solution?

The main problem with SP's is that they can really screw up
optimization. Most of the time you'll get more bang for the
buck by managing the SQL, breaking it into more manageable 
chunks.

Views can be a big help for this. Materialized views can help
by pushing lock-heavy or long-running subqueries into the wee
hours where resources can be managed a bit more easily (e.g.,
you don't collide with interactive queries or may be able to
just lock the tables and be done with it). They also give you 
a more stable dataset to start with (e.g., no need to worry
about inserts causing disconnects in totals or requiring 
table locks to avoid).

Temporary tables can help by breaking out messy portions of
the query and allowing you to pre-filter portions of the result.

Scratch tables (i.e., a permenant table that gets truncated after
use) can also help by staging portions of the query in indexed
areas. With careful use they can be a big help because you can 
pre-aggregate portions of the query into a table with nice, 
specific indexes. 

Many of the worst queries involve reporting on aggregates over time
that are re-calculated for each report. Pre-aggregating on, say, a
daily basis can both simplify the reporting query and allow you to
push some of the work off to the wee hours. You can also get the 
benefit of more specific values that may allow for unique indexes
on the subtotals. 

If the query involves "many" joins there is a good chance that 
they break into clumps of related data. Breaking, say, daily usage
vs. user account vs. accounting data into separate [materialized]
views or temp tables keeps the joins more manageable and helps
release resources that might otherwise get consumed for the entire
query.

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: Rationale for aversion to the central database?

2018-04-27 Thread Basques, Bob (CI-StPaul)
All,

Just chiming in  . . . we’ve taken a somewhat different approach and actually 
encourage our programmers to build out thier own DBs.  We’re using Postgres to 
aggregate many varied datasources into postgres as a cahing system, and then 
develop against this aggregated data.  Yes, we understand that the realtime 
stuff is a laggy process.  The intent here is to give the programmers a 
background in DB operation and performance expectations, especially since we 
primarilry build for Web Use.

bobb



On Apr 27, 2018, at 9:55 AM, Steven Lembark 
mailto:lemb...@wrkhors.com>> wrote:

On Sun, 8 Apr 2018 14:39:49 -0700
Guyren Howe mailto:guy...@gmail.com>> wrote:

I am a Rails developer at a medium-large size company. I’ve mostly
worked at smaller companies. I’ve some exposure to other web
development communities.

When it comes to databases, I have universally encountered the
attitude that one should treat the database as a dumb data bucket.
There is a *very* strong aversion to putting much of any business
logic in the database. I encounter substantial aversion to have
multiple applications access one database, or even the reverse: all
abstraction should be at the application layer.

My best theory is that these communities developed at a time when
Windows was more dominant, and just generally it was *significantly*
easier to use MySQL than Postgres for many, particularly new,
developers. And it is pretty reasonable to adopt an aversion to
sophisticated use of the database in that case.

The biggest single problem in most cases is a combination
of communiction and attitude.

Recall that most programmers are not DBA's and don't have access to
create anything in the database. Traditionally the High Priests of
Holy Data don't allow mere peons to create or modify content in the
databases. Result is that programmers *have* to do most of their
logic in code, even if the results put hugely more load on the servers.

The DBA's are also not Well Qualified Genious Programmers who obviously
know much more than mere DBA's about what the real use of data is
supposed to be. They don't need no stinkin' SQL! They have the magic
of ORM!

Net result: People talk past one another.

I've nearly been lynched for creating ten-row temporary tables without
prior written permission or even suggesting that we might add indexes
to support more effecient use of views.

I've also nearly strung programmers up for running hundreds of
"select * from table" statements because they weren't willing to
figure out how to write a join or simple filter or give me enough
data to do it for them.

Good example are ORM packages: Most of really hideous code required
for joins, group by, coalesce can easily be pushed into views allowing
simple select X from Y where or one-line joins. DBA's can craft views
that make sense, programmers could use them to simplify their code.

Q: When is the last place you worked where DBAs were pigs (vs.
   chickens)?

Q: How often was a DBA involved in the analysis and design of cycle
   of reporting?

Q: How many programming projects have you been on where the hackers
   were willing to write clear, concise data reqirements and keep
   them up to date?

When that happens people will use the databases as tools, ORM code
will be [semi-] maintainable, and we'll live happily for ever after.

--
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com 
   +1 888 359 3508



"The true measure of a man is how he treats someone who can do him absolutely 
no good."
- Samuel Johnson





Re: Rationale for aversion to the central database?

2018-04-27 Thread Guyren Howe
On Apr 27, 2018, at 8:45 , Basques, Bob (CI-StPaul) 
 wrote:
> 
> Just chiming in  . . . we’ve taken a somewhat different approach and actually 
> encourage our programmers to build out thier own DBs.  We’re using Postgres 
> to aggregate many varied datasources into postgres as a cahing system, and 
> then develop against this aggregated data.  Yes, we understand that the 
> realtime stuff is a laggy process.  The intent here is to give the 
> programmers a background in DB operation and performance expectations, 
> especially since we primarilry build for Web Use.

I’d love to see a blog post about this.



Re: Asynchronous Trigger?

2018-04-27 Thread Merlin Moncure
On Thu, Mar 29, 2018 at 5:29 PM, Cory Tucker  wrote:
> Is it possible to have the execution of a trigger (or any function) not
> block the completion of the statement they are associated with?
>
> A pattern I had hoped to implement was to do a quick update of rows that
> signaled they needed attention, and then an async per-row trigger would come
> and do the maintenance (in this case, make an expensive materialized view).
>
> Any suggestions welcome.

Generally the idea is to (in the trigger) invoke some low risk quick
action such as inserting a record in to a 'stuff to do' table.  Then,
some other process comes around and does the heavy lifting.

merlin



Re: Rationale for aversion to the central database?

2018-04-27 Thread Basques, Bob (CI-StPaul)


On Apr 27, 2018, at 10:46 AM, Guyren Howe 
mailto:guy...@gmail.com>> wrote:

On Apr 27, 2018, at 8:45 , Basques, Bob (CI-StPaul) 
mailto:bob.basq...@ci.stpaul.mn.us>> wrote:

Just chiming in  . . . we’ve taken a somewhat different approach and actually 
encourage our programmers to build out thier own DBs.  We’re using Postgres to 
aggregate many varied datasources into postgres as a cahing system, and then 
develop against this aggregated data.  Yes, we understand that the realtime 
stuff is a laggy process.  The intent here is to give the programmers a 
background in DB operation and performance expectations, especially since we 
primarilry build for Web Use.

I’d love to see a blog post about this.



Me too!! :c)   It’s funny you mention this, I’ve actually been experimenting 
with different Blogging tech for just this purpose.  Not sure when something 
will pop out the other end though.  :c)

bobb



"Life isn't about waiting for the storm to pass; it's about learning to dance 
in the rain."
- Vivian Greene





Re: Can PostgreSQL create new WAL files instead of reusing old ones?

2018-04-27 Thread Bruce Momjian
On Wed, Apr 18, 2018 at 09:34:50AM -0400, Vick Khera wrote:
> On Wed, Apr 18, 2018 at 12:35 AM, Michael Paquier  wrote:
> 
> 
> That looks like a rather difficult problem to solve in PostgreSQL
> itself, as the operator running the cluster is in charge of setting up
> the FS options which would control the COW behavior, so it seems to me
> 
> 
> You cannot turn off CoW on ZFS. What other behavior would you refer to here?
> 
> I suppose one could make a dedicated data set for the WAL and have ZFS make a
> reservation for about 2x the total expected WAL size. It would require careful
> attention to detail if you increase WAL segments configuration, though, and if
> you had any kind of hiccup with streaming replication that caused the segments
> to stick around longer than expected (but that's no different from any other
> file system).

Uh, at the risk of asking an obvious question, why is the WAL file COW
if it was renamed?  No one has the old WAL file open, as far as I know.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Long running INSERT+SELECT query

2018-04-27 Thread Vitaliy Garnashevich

Everybody thanks for the suggestions!

We're going to try using "SELECT 1 FROM table FOR KEY SHARE" for each of 
the tables, which are referenced by results, before running the big 
query. That should be up to a million of rows in total. It will probably 
not cover the case when a record is INSERT'ed and then DELETE'd after 
the calculation has begun, but such cases should be even more rare than 
the DELETE's we're currently facing.


If this will not help, we'll also try to use triggers, as mentioned 
here: https://stackoverflow.com/a/39828119


Regards,
Vitaliy



Re: Can PostgreSQL create new WAL files instead of reusing old ones?

2018-04-27 Thread Fabio Ugo Venchiarutti
I was wondering the same. The WAL writer is overwriting the same inodes 
again and again, so block COW logic should only happen once: at allocation.


I'm no expert: does XFS track COW based on path (ugh?)


Maybe I'm crazy but here's a possible workaround if the problem is 
effectively at that level: OP could use the archive_command to 
deliberately allocate a new segment and switch the old one with it 
before returning zero to the archiver.


The WAL writer will then recycle what it thinks is the same inode and 
not your impostor.



I'm rather confident this should work ok but you may want to make sure 
with the hackers that no file descriptors are open on a ready-to-archive 
segments while you shuffle things around in your command (or some other 
weird implication I'm missing).






On 27/04/18 17:28, Bruce Momjian wrote:

On Wed, Apr 18, 2018 at 09:34:50AM -0400, Vick Khera wrote:

On Wed, Apr 18, 2018 at 12:35 AM, Michael Paquier  wrote:


 That looks like a rather difficult problem to solve in PostgreSQL
 itself, as the operator running the cluster is in charge of setting up
 the FS options which would control the COW behavior, so it seems to me


You cannot turn off CoW on ZFS. What other behavior would you refer to here?

I suppose one could make a dedicated data set for the WAL and have ZFS make a
reservation for about 2x the total expected WAL size. It would require careful
attention to detail if you increase WAL segments configuration, though, and if
you had any kind of hiccup with streaming replication that caused the segments
to stick around longer than expected (but that's no different from any other
file system).


Uh, at the risk of asking an obvious question, why is the WAL file COW
if it was renamed?  No one has the old WAL file open, as far as I know.



--
Regards

Fabio Ugo Venchiarutti
Data Services Department
Ocado Technology

--


Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group. 


 

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses. 


 

Fetch and Sizzle 
are trading names of Speciality Stores Limited and Fabled is a trading name 
of Marie Claire Beauty Limited, both members of the Ocado Group.


 


References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Buildings One & Two, Trident 
Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL.




Re: Asynchronous Trigger?

2018-04-27 Thread Michael Loftis
As suggested, note in ToDo table, also maybe look at LISTEN and NOTIFY
and have a job runner process LISTENing (and cleaning up the queue,
or, marking an item as in progress if you've multiple workers)  The
work queue table is to help maintain state...if noone is LISTENing
then the table acts as backup so when a worker fires up it can start
working on whatever's already there and empty that.

On Fri, Apr 27, 2018 at 4:46 AM, Olleg Samoylov  wrote:
> Try to look at PGQ from SkyTools.
>
> On 2018-03-30 01:29, Cory Tucker wrote:
>
> Is it possible to have the execution of a trigger (or any function) not
> block the completion of the statement they are associated with?
>
> A pattern I had hoped to implement was to do a quick update of rows that
> signaled they needed attention, and then an async per-row trigger would come
> and do the maintenance (in this case, make an expensive materialized view).
>
> Any suggestions welcome.
>
> thanks!
> --Cory
>
>



-- 

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler



Re: Can PostgreSQL create new WAL files instead of reusing old ones?

2018-04-27 Thread Andres Freund
On 2018-04-27 12:28:25 -0400, Bruce Momjian wrote:
> On Wed, Apr 18, 2018 at 09:34:50AM -0400, Vick Khera wrote:
> > On Wed, Apr 18, 2018 at 12:35 AM, Michael Paquier  
> > wrote:
> > 
> > 
> > That looks like a rather difficult problem to solve in PostgreSQL
> > itself, as the operator running the cluster is in charge of setting up
> > the FS options which would control the COW behavior, so it seems to me
> > 
> > 
> > You cannot turn off CoW on ZFS. What other behavior would you refer to here?
> > 
> > I suppose one could make a dedicated data set for the WAL and have ZFS make 
> > a
> > reservation for about 2x the total expected WAL size. It would require 
> > careful
> > attention to detail if you increase WAL segments configuration, though, and 
> > if
> > you had any kind of hiccup with streaming replication that caused the 
> > segments
> > to stick around longer than expected (but that's no different from any other
> > file system).
> 
> Uh, at the risk of asking an obvious question, why is the WAL file COW
> if it was renamed?  No one has the old WAL file open, as far as I know.

Because it's a COW filesystem that doesn't overwrite in place. That's
how snapshots etc are implemented.

Greetings,

Andres Freund



Re: Long running INSERT+SELECT query

2018-04-27 Thread Steven Lembark
On Fri, 27 Apr 2018 19:38:15 +0300
Vitaliy Garnashevich  wrote:

> We're going to try using "SELECT 1 FROM table FOR KEY SHARE" for each of 
> the tables, which are referenced by results, before running the big 
> query. That should be up to a million of rows in total. It will probably 
> not cover the case when a record is INSERT'ed and then DELETE'd after 
> the calculation has begun, but such cases should be even more rare than 
> the DELETE's we're currently facing.

Thing about using a couple of Materialized Views for the worst 
part of it.


-- 
Steven Lembark 1505 National Ave
Workhorse Computing   Rockford, IL 61103
lemb...@wrkhors.com  +1 888 359 3508



Re: Asynchronous Trigger?

2018-04-27 Thread Adam Tauno Williams
On Fri, 2018-04-27 at 11:25 -0600, Michael Loftis wrote:
> As suggested, note in ToDo table, also maybe look at LISTEN and
> NOTIFY and have a job runner process LISTENing 

We use a simple python process to listen with a PostgreSQL cursor for
NOTIFY events, and push them into RabbitMQ (message broker).  From
there you can do a lot of asynchronous work;  it will store messages,
optionally with expiration, you have point-to-point or broadcast,
etc...  It is a very durable, scalable, and flexible way to built up an
event driven system.

-- 
Meetings Coordinator, Michigan Association of Railroad Passengers
537 Shirley St NE Grand Rapids, MI 49503-1754 Phone: 616.581.8010
E-mail: awill...@whitemice.org GPG#D95ED383 Web: http://www.marp.org



Re: Rationale for aversion to the central database?

2018-04-27 Thread Merlin Moncure
On Sun, Apr 8, 2018 at 4:39 PM, Guyren Howe  wrote:
> I am a Rails developer at a medium-large size company. I’ve mostly worked at
> smaller companies. I’ve some exposure to other web development communities.
>
> When it comes to databases, I have universally encountered the attitude that
> one should treat the database as a dumb data bucket. There is a *very*
> strong aversion to putting much of any business logic in the database. I
> encounter substantial aversion to have multiple applications access one
> database, or even the reverse: all abstraction should be at the application
> layer.
>
> My best theory is that these communities developed at a time when Windows
> was more dominant, and just generally it was *significantly* easier to use
> MySQL than Postgres for many, particularly new, developers. And it is pretty
> reasonable to adopt an aversion to sophisticated use of the database in that
> case.
>
> This attitude has just continued to today, even as many of them have
> switched to Postgres.
>
> This is only a hypothesis. I am now officially researching the issue. I
> would be grateful for any wisdom from this community.
>
>
> Aside: it is rare to find a situation in life or anywhere where one widely
> adopted thing is worse in *every way* than another thing, but this certainly
> was and largely still continues to be the case when one compares MySQL and
> Postgres. So why do folks continue to use MySQL? I find this mystifying.

This is a very common attitude in the industry, and a very unfortunate
one.  It's particularly common the so called 'enterprise' stacks --
java, .net etc.   It's also completely crazy.   SQL isn't _that_
difficult to learn, and in the particular case of postgres, allow for
faster implementation for solutions for many clases of data driven
problems that competing platforms.  SQL is an ideal language for
business logic IMSHO -- particularly the non-procedural forms of it.
There are a number of casual factors for this attitude:

*) outsourcing companies push standard stacks
*) major software vendors (ms/ibm/etc) push 'lock-in' toolsets with
lack of standardization
*) poor understanding of data structure fundamentals is extremely pervasive
*) developer salaries for data developers skew higher (causing HR
departments to look for lower cost skillsets, totally oblivious to
total development costs)
*) 'next big thing in data' companies constantly marketing and pushing
the 'new paradidm' and against proven, low cost solutions.  Hadoop,
big data, etc etc.
*) developer multiculturalism (why do I need to know anything other than java?)
*) perceived slow performance of databases when the problem is really storage
*) poor math training disincline learning of set based logic in school
*) corporate takeover of C.S. education syllabus -- in many overseas
schools the educational programs are written by software vendors and
are essentially tool training, or worse positional education for
outsourcing grind shops

I think things are getting better.  SQL is undergoing a kind of
renaissance for various reasons, and postgresql in particular is
really on a tear.

merlin



Re: Long running INSERT+SELECT query

2018-04-27 Thread Tim Cross

Steven Lembark  writes:

> On Fri, 27 Apr 2018 19:38:15 +0300
> Vitaliy Garnashevich  wrote:
>
>> We're going to try using "SELECT 1 FROM table FOR KEY SHARE" for each of 
>> the tables, which are referenced by results, before running the big 
>> query. That should be up to a million of rows in total. It will probably 
>> not cover the case when a record is INSERT'ed and then DELETE'd after 
>> the calculation has begun, but such cases should be even more rare than 
>> the DELETE's we're currently facing.
>
> Thing about using a couple of Materialized Views for the worst 
> part of it.

+1 re: materialised views - I have found them to be extremely useful for
situations where you want a snapshot of data and need to present it in a
way which is easier to process, especially when the underlying data is
changing faster than your reporting process can generate the report.  

-- 
Tim Cross



Re: Rationale for aversion to the central database?

2018-04-27 Thread g...@luxsci.net

 
 
On April 24, 2018 07:27:59 am PDT, "Sam Gendler" 
 wrote:

 
 
On Sun, Apr 8, 2018 at 15:37 [1]g...@luxsci.net <[2]g...@luxsci.net> wrote:

   
   
  On April 8, 2018 02:40:46 pm PDT, "Guyren Howe" <[3]guy...@gmail.com> wrote:
  One advantage to using logic and functions in  the db is that you 
can fix things immediately without having to make new application 
builds. That in itself is a huge advantage, IMO.


 
 
I doubt most of us would consider this any kind of advantage outside of 
the momentary temptation to do it when an app is completely broken and 
needs to be up in a hurry. Application changes, whether in the dB or in 
application logic, need to be tested, and they need to be revision 
controlled and released in a manner that can be easily rolled back in 
an automated manner. The fact that putting logic in the database can 
effectively allow developers to make unreleased changes to production 
apps is specifically one of the problems that I am trying to avoid when 
I keep most logic in the app instead of the dB. It’s a whole lot 
harder to make arbitrary manual changes to code in the app, whether 
interpreted or compiled, if it is running inside a container that 
cannot be updated. Even if you go in with a shell and update an 
interpreted file, the next time that container is launched the change 
will be lost, which is usually sufficient motivation to keep devs from 
doing that kind of thing.

 
I’ll put some things in the db, either for performance or because I 
want that logic to be built into the data and not be part of the 
application, but I choose those contexts carefully and I write them in 
as portable a manner as possible. And for those who say migrations 
don’t happen, I’ve certainly been through a few, usually as part of 
an acquisition or the like, but sometimes simply because another dB 
server better meets our needs after a time. And migrating stored procs 
can be really difficult. Such code usually has less complete unit and 
integration tests, which makes validating those changes more difficult, 
too.

 
But the biggest reason is that databases often have to scale up rather 
than out, so keeping as much logic in the application code allows my 
scaling requirements for the dB server to be as minimal as possible. 
Sure, there are workloads where pushing raw data across the wire will 
be more work than processing it in the dB, and in those cases, I may do 
that, but I consider it premature optimization to just assume that is 
necessary without hard evidence from production examples to suggest 
otherwise.

 
Finally, there’s the consistency argument. I want to find all of the 
logic in one place. Either entirely in the source code or entirely in 
the dB. Having to trace things from the code to the dB and back again 
can make it a whole lot harder to see, at a glance, what is happening 
in the code. Having logic in the dB also means it can be difficult or 
impossible to have two releases talking to the same schema version at 
the same time - so canary builds and rolling deployments can be 
difficult. Of course, schema changes can cause this problem, regardless 
of whether there are stored procs, but the more of your logic that 
lives in the db, the more likely it is that your releases will conflict 
over the db. So I’m more likely to be able to do a rolling release if 
I keep the db as a dumb data store and keep logic in the application 
code.


===

I could have worded that better but I think that we're coming at it 
from different directions. You think of your application as the 
"master" operator. I think of a PG db as the "master", not a slave. I 
believe that we shouldn't _have_ to use an external application for the 
database to be useful and coherent.  I like to think of external 
applications as subservient to the db and not the other way around. 
Yeah, I know, probably not a popular viewpoint.


Sorry, I don't really understand why it would be so hard to migrate, 
say pl/pgsql functions. You can maybe expect to write some likely 
convoluted application code, though. :) Reusable functions in the db 
that are solid also means that developers don't have to reinvent the 
wheel in whatever language and debugging also becomes simpler.


Perhaps I'm extreme. In my ideal world, developers might not even know 
table names! I'm kidding ,sorta...


Thanks much for your comments,
Gerry
 

References

Visible links
1. mailto:g...@luxsci.net
2. mailto:g...@luxsci.net
3. mailto:guy...@gmail.com



Re: Rationale for aversion to the central database?

2018-04-27 Thread Ron



On 04/27/2018 05:52 PM, g...@luxsci.net wrote:


On April 24, 2018 07:27:59 am PDT, "Sam Gendler" 
 wrote:
On Sun, Apr 8, 2018 at 15:37 g...@luxsci.net  
mailto:g...@luxsci.net>> wrote:



On April 8, 2018 02:40:46 pm PDT, "Guyren Howe" mailto:guy...@gmail.com>> wrote:

One advantage to using logic and functions in  the db is that you can
fix things immediately without having to make new application builds.
That in itself is a huge advantage, IMO.

I doubt most of us would consider this any kind of advantage outside of 
the momentary temptation to do it when an app is completely broken and 
needs to be up in a hurry. Application changes, whether in the dB or in 
application logic, need to be tested, and they need to be revision 
controlled and released in a manner that can be easily rolled back in an 
automated manner. The fact that putting logic in the database can 
effectively allow developers to make unreleased changes to production apps 
is specifically one of the problems that I am trying to avoid when I keep 
most logic in the app instead of the dB. It’s a whole lot harder to make 
arbitrary manual changes to code in the app, whether interpreted or 
compiled, if it is running inside a container that cannot be updated. Even 
if you go in with a shell and update an interpreted file, the next time 
that container is launched the change will be lost, which is usually 
sufficient motivation to keep devs from doing that kind of thing.
I’ll put some things in the db, either for performance or because I want 
that logic to be built into the data and not be part of the application, 
but I choose those contexts carefully and I write them in as portable a 
manner as possible. And for those who say migrations don’t happen, I’ve 
certainly been through a few, usually as part of an acquisition or the 
like, but sometimes simply because another dB server better meets our 
needs after a time. And migrating stored procs can be really difficult. 
Such code usually has less complete unit and integration tests, which 
makes validating those changes more difficult, too.
But the biggest reason is that databases often have to scale up rather 
than out, so keeping as much logic in the application code allows my 
scaling requirements for the dB server to be as minimal as possible. Sure, 
there are workloads where pushing raw data across the wire will be more 
work than processing it in the dB, and in those cases, I may do that, but 
I consider it premature optimization to just assume that is necessary 
without hard evidence from production examples to suggest otherwise.
Finally, there’s the consistency argument. I want to find all of the logic 
in one place. Either entirely in the source code or entirely in the dB. 
Having to trace things from the code to the dB and back again can make it 
a whole lot harder to see, at a glance, what is happening in the code. 
Having logic in the dB also means it can be difficult or impossible to 
have two releases talking to the same schema version at the same time - so 
canary builds and rolling deployments can be difficult. Of course, schema 
changes can cause this problem, regardless of whether there are stored 
procs, but the more of your logic that lives in the db, the more likely it 
is that your releases will conflict over the db. So I’m more likely to be 
able to do a rolling release if I keep the db as a dumb data store and 
keep logic in the application code.


===

I could have worded that better but I think that we're coming at it from 
different directions. You think of your application as the "master" 
operator. I think of a PG db as the "master", not a slave. I believe that 
we shouldn't _have_ to use an external application for the database to be 
useful and coherent.  I like to think of external applications as 
subservient to the db and not the other way around. Yeah, I know, probably 
not a popular viewpoint.


Sorry, I don't really understand why it would be so hard to migrate, say 
pl/pgsql functions. You can maybe expect to write some likely convoluted 
application code, though. :) Reusable functions in the db that are solid 
also means that developers don't have to reinvent the wheel in whatever 
language and debugging also becomes simpler.


And it's not like the developers don't rewrite the code every time they 
migrate to the Latest and Greatest Language...



--
Angular momentum makes the world go 'round.


Re: Rationale for aversion to the central database?

2018-04-27 Thread raf
> On Sun, 8 Apr 2018 14:39:49 -0700
> Guyren Howe mailto:guy...@gmail.com>> wrote:
> 
> When it comes to databases, I have universally encountered the
> attitude that one should treat the database as a dumb data bucket.
> There is a *very* strong aversion to putting much of any business
> logic in the database. I encounter substantial aversion to have
> multiple applications access one database, or even the reverse: all
> abstraction should be at the application layer.

That seems un-pragmatic to me. IMHO if any business logic needs
access to lots of data, it's best implemented by code that
resides in the database itself, close to the data. I once had a
job where, one night a week, I couldn't go home until a certain
program had been run and completed successfully. That meant I
wasn't going home until midnight. I realised that the person
that wrote it was (at least for the purposes of that program)
treating the database like "a dumb data bucket". Millions of
records were being selected, transferred over a network to
another host, effectively grouped and summarised, then each
resulting summary record was inserted into another table, one
stored function call at a time (with all the network round trip
times that that implies). It took 2-3 hours to complete. I
replaced it with a stored function that took two minutes to run
and I was able to start going home hours earlier. So, as you can
imagine, it would take an incredibly good argument to convince
me that business logic shouldn't reside in the database. :-)

I've always assumed (perhaps incorrectly) that not wanting
business logic in the database (and not using all of the
features that databases provide) was just a way to justify
programmers not having to learn SQL but it's just another
language and paradigm and programmers know so many languages and
paradigms anyway that I never understood why knowing SQL was a
problem. My assumption is probably wrong. And I guess as long as
the resulting software runs as quickly as it needs to, it
probably isn't an issue. If it's a widely held view, then it
must be workable.

I use stored functions exclusively, partly so that the business
logic is close to the data it needs to be close to for
efficiency, but also for security reasons. Users have no
permissions to select, update, insert or delete anything. All
they can do is execute stored functions that have been
previously created by the database owner who does have those
permissions. For a system that's accessible to the public, It's
a great way to guarantee that SQL injection can't do any harm
(even if all the other protections fail to work or are
bypassed). For a system that's only accessible to internal
staff, it's a great way to defend against their workstations
being infected by malware that goes looking for databases to
attack.

cheers,
raf




Re: Rationale for aversion to the central database?

2018-04-27 Thread Peter J. Holzer
On 2018-04-27 22:52:39 +, g...@luxsci.net wrote:
> Perhaps I'm extreme. In my ideal world, developers might not even know table
> names! I'm kidding ,sorta...

If they don't know the table names, how can they write those stored
procedures?

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature