postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-18 Thread Corey Taylor
If this is a common question or an article exists, please let me know.  I
couldn't find anything specific about it in stack overflow questions or
postgres/psql documentation.

This is in PostgreSQL 9.6.

I am trying to figure out how to debug an issue where a function
'import_wss' called through pgadmin4 will complete but not through psql.

The function is simply a wrapper around an INSERT INTO SELECT that copies
about 800k rows with some id and string conversions.  This completes in
about 1min through pgadmin, but does not finish when run through psql.

There are cases where the psql call does finish, but it is rare and after
running several tests I can't say what conditions there are.  I am running
with a simple select like:

psql -c 'select import_wss()'

The only relevant issues I've run across are cases where there are
multiple inserts separated by semi-colons.  However, this is a single
insert and query logging shows the same query run by both pgadmin and psql.

There is no other load on the db.  I can reproduce this issue right after
creating the db and running the script.

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
>
> First thing I'd look at is whether it's the same execution environment
> in both cases, eg same search_path.
>

As far as I can tell, it's the same execution environment, same search_path
and same user.

I found after testing other situations, that the psql command would always
finish as expected after canceling the first query that ran too long.  I
was able to reproduce this scenario with psql and pgadmin4 with various
combinations.

Any suggestions on what that would indicate?  The canceled query does not
complete as there are no rows and no duplicate errors when running the
second time.

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 8:57 AM Tom Lane  wrote:

> Maybe check for waiting on a lock?
> It'd be useful to look in pg_stat_activity and/or top(1) while the
> initial query is running, to see if it seems to be eating CPU or
> is blocked on some condition.
>

I think this will provide information that makes it seem less weird.

With your suggestion, I monitored postgres via top and pg_stat_activity in
the various scenarios and found that an autovacuum was triggering which
covered tables used in the insert into select.  What seems to be happening
is the autovacuum takes just about the same time as I give the query to run
before giving up on it.  The next time I run the query, the autovacuum is
complete and the query runs normally.

Of course, I'd like to understand why the query never finishes.

when autovacuum is running:
PID  PPID USER STAT   VSZ %VSZ CPU %CPU COMMAND
   38 1 postgres R 158m   8%   1  50% postgres: revwaste
portal-local 172.19.0.5(39956) SELECT
   36 1 postgres S 171m   9%   0   0% postgres: autovacuum worker
process   portal-local
   34 1 postgres S 186m   9%   0   0% postgres: autovacuum worker
process   portal-local

after autovacuum finishes and during remaining soft-lock:
   PID  PPID USER STAT   VSZ %VSZ CPU %CPU COMMAND
   45 1 postgres R 259m  13%   0  50% postgres: revwaste
portal-local 172.19.0.5(39962) SELECT
   20 1 postgres S 153m   8%   1   0% postgres: writer process
   22 1 postgres S 153m   8%   0   0% postgres: autovacuum launcher
process

Same for pg_stat_activity:

| backend_start |  xact_start   |
 query_start  | state_change  | wait_event_type |
wait_event | state  | backend_xid | backend_xmin |   query
---+--+-+--+--+--+-+-+-+---+---+---+--
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00  |
2019-09-23 20:29:45.12886+00  | 2019-09-23 20:29:45.128861+00 |
|| active | 808 |  808 | select import_wss()
| 2019-09-23 20:30:01.624853+00 | 2019-09-23 20:30:58.047317+00 |
2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047318+00 |
|| active | |  808 | autovacuum:
ANALYZE wss.RowCoding
| 2019-09-23 20:31:01.644824+00 | 2019-09-23 20:31:01.666711+00 |
2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666712+00 |
|| active | |  808 | autovacuum:
ANALYZE wss.WSSData
| 2019-09-23 20:31:14.101808+00 | 2019-09-23 20:31:14.103306+00 |
2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103307+00 |
|| active | |  808 | select * from
pg_stat_activity
(4 rows)

| backend_start |  xact_start   |
 query_start  | state_change  | wait_event_type |
wait_event | state  | backend_xid | backend_xmin | query
---+--+-+--+--+--+-+-+-+---+---+---+---
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00  |
2019-09-23 20:29:45.12886+00  | 2019-09-23 20:29:45.128861+00 |
|| active | 808 |  808 | select import_wss()
| 2019-09-23 20:34:21.01283+00  | 2019-09-23 20:34:21.014473+00 |
2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014475+00 |
|| active | |  808 | select * from
pg_stat_activity
(2 rows)


> Can you create a self-contained test case that acts like this?
>

I can try to duplicate it if this doesn't provide the details needed.

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 9:22 AM Adrian Klaver 
wrote:

> Can we see the actual function/query?
>
> Also the schema of the table(s) involved?
>

Of course.  This was added to pull some data from legacy db tables into
something we could query while converting the process that populates the
legacy db. These are rather long.  I'll spare you the details of the other
tables it joins as they are basically id <=> name pairs for the most part.

This isn't meant to run often, but it does need to be run every time the
legacy db data is pulled over.

Import Function:

CREATE OR REPLACE FUNCTION import_wss()
RETURNS void AS $$
INSERT INTO wss_entries (
is_historical,
historical_path,
wss_import_row_index,
service_date,
original_sid,
client_id,
client_site_id,
material_group_id,
material_category_id,
material_id,
material_outcome_id,
targeted,
goal_percent,
weight_pounds,
old_vendor_name,
new_vendor_name,
vendor_id,
vendor_site_id,
old_service_type,
new_service_type,
old_quantity,
old_size,
old_frequency,
old_price,
old_market_index,
old_service_per_month,
old_units_per_haul,
old_avg_monthly_cost,
new_quantity,
new_size,
new_frequency,
new_price,
new_market_index,
new_service_per_month,
new_units_per_haul,
new_avg_monthly_cost,
is_haul,
haul_unit_id,
service_conversion_id,
num_hauls_per_weight,
compaction_ratio,
unit_weight,
full_percent,
benchmark_hauling_cost,
total_monthly_cost,
gross_savings,
month_of_shared_savings,
wr_fees,
net_savings,
new_account_number,
contract_expiration,
scheduled_service_days
) SELECT
true,
w."Path",
w."RowNum",
w."WSSDate"::date,
CASE
WHEN client_sites.id IS null THEN TRIM(w."SID")
ELSE null
END,
client_sites.client_id,
client_sites.id,
material_groups.id,
material_categories.id,
materials.id,
material_outcomes.id,
w."Targeted" = 'True',
w."Goal Percentage",
w."Total Pounds",
NULLIF(w."Old Vendor ", ''),
NULLIF(w."New Vendor", ''),
vendor_sites.vendor_id,
vendor_sites.id,
NULLIF(w."Old Service Description", ''),
NULLIF(w."New Service Description", ''),
NULLIF(w."Old Quan", ''),
NULLIF(w."Old Size", ''),
NULLIF(w."Old Freq", ''),
CASE
WHEN w."Old Price " ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."Old Price
"::float
ELSE 0::float
END,
CASE
WHEN w."Old Market Index" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."Old
Market Index"::float
ELSE 0::float
END,
CASE
WHEN w."Old Service Per Month" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN
w."Old Service Per Month"::float
ELSE 0::float
END,
CASE
WHEN w."Old Tons Per Haul" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN
w."Old Tons Per Haul"::float
ELSE 0::float
END,
CASE
WHEN w."Old Monthly Cost" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."Old
Monthly Cost"::float
ELSE 0::float
END,
NULLIF(w."New Quan", ''),
NULLIF(w."New Size", ''),
NULLIF(w."New Freq", ''),
CASE
WHEN w."New Price " ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."New Price
"::float
ELSE 0::float
END,
CASE
WHEN w."New Market Index" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."New
Market Index"::float
ELSE 0::float
END,
CASE
WHEN w."New Service Per Month" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN
w."New Service Per Month"::float
ELSE 0::float
END,
CASE
WHEN w."New Tons Per haul" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN
w."New Tons Per haul"::float
ELSE 0::float
END,
CASE
WHEN w."New Monthly Cost" ~ '^[0-9\-]+[0-9\.\-E]*$' THEN w."New
Monthly Cost"::float
ELSE 0::float
END,
r."LineType" = 'Haul',
haul_units.id,
service_conversions.id,
CASE
WHEN w."hauls coorelating to this weight" ~
'^[0-9\-]+[0-9\.\-E]*$' THEN w."hauls coorelating to this weight"::float
ELSE 0::float
END,
NULLIF(w."Estimated Compaction Ratio", ''),
CASE
WHEN w."Unit weight assigned to waste" ~
'^[0-9\-]+[0-9\.\-E]*$' THEN w."Unit weight assigned to waste"::float
ELSE 0::float
END,
CASE
WHEN w."Pct Full based on last survey" ~
'^[0-9\-]+[0-9\.\-E]*$' THEN w."Pct Full based on last survey"::float
ELSE 0::float
END,
CASE
   

Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 4:31 PM Adrian Klaver 
wrote:

> Hmm, are there triggers on wss_entries that are UPDATEing/DELETEing
> entries elsewhere?
>

No, that table is pretty much stand-alone.  What we're seeing here is most
likely caused by the initial copy of the legacy db tables.

This happens when creating the local test db which I do right before the
tests.  I can avoid the issue if I monitor top and wait for the analyze to
complete on all the related tables.

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 4:50 PM Adrian Klaver 
wrote:

> Smaller hammer:
>
>
> https://www.postgresql.org/docs/11/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
>
> "autovacuum_enabled, toast.autovacuum_enabled (boolean)"
>
> which can be done by ALTER TABLE also:
>
> https://www.postgresql.org/docs/11/sql-altertable.html



Thanks for the feedback.  I think we'll be able to work around this issue
in a way that is reliable enough to script.

I am curious if this is a "known" issue with restoring tables and analyze
running at some point after while performing a select on the tables.  It
would be kind of nice to know when to predict this kind of situation.

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 5:51 PM Adrian Klaver 
wrote:

> Usually what is seen here is the opposite, that tables are restored and
> ANALYZE is not run and performance on the subsequent queries is poor due
> to lack of current statistics.
>
> What is the restore process?
>

For these specific legacy db tables, they are isolated in a separate
schema.  We then use pg_restore to restore the entire schema.  Essentially
just:

pg_restore -n wss --no-owner

corey


Re: postgres 9.6: insert into select finishes only in pgadmin not psql

2019-09-23 Thread Corey Taylor
On Mon, Sep 23, 2019 at 7:23 PM Adrian Klaver 
wrote:

> "Once restored, it is wise to run ANALYZE on each restored table so the
> optimizer has useful statistics; see Section 24.1.3 and Section 24.1.6
> for more information."
>
> So is there some other step in the process that occurs after the restore
> and before you run your function?
>

There are several other restore called and a delete query that clears out
an unrelated table.

However, I think this solves the issue for us and the mystery. The ANALYZE
was missing which reduces an unending query down to a minute.  The ANALZYE
runs very quickly on its own so we're simply going to fix the issue by
following documented advice.

I guess the length of the query when before/during the ANALZYE felt like
something more was wrong.

corey


How to see index was rejected for seq scan?

2017-12-16 Thread Corey Taylor
This isn't meant to be a question about improving a slow query or
determining that the planner was wrong.

It seems like a simple and obvious answer, but I would love to know if
there is any documentation you can point me to read on this.

Essentially, if an index was deemed not to save cost during the input scan,
the planner will schedule a seq scan.  What I'm wondering if there is
anything that indicates a valid index for the scan was found and rejected
(reason doesn't necessarily matter).  I couldn't find anything in the using
explain document, but I have probably missed it.  I would assume the index
would be used to determine if the filter requires a large scan count.

I didn't want to send an email with a bunch of unnecessary SQL, but I
created a sqlfiddle with the example if one is required although any
theoretical scenario where an index is used or rejected would work (unless
that's wrong).

http://sqlfiddle.com/#!17/8782b/25

This is an execution plan for a seq scan due to a large number of rows for
a datetime range.  If you reduce the datetime range enough (1 week to 1 day
in the data sample in the sqlfiddle), it switches to index scan.

https://explain.depesz.com/s/AurP

Seq Scan

on data (cost=0.00..62.67 rows=503 width=19) (actual rows= loops=)

   - Filter: ((datetime <= now()) AND (datetime >= (now() - '7
   days'::interval)))


Re: How to see index was rejected for seq scan?

2017-12-16 Thread Corey Taylor
> But for simple queries, you might get some insight if you set
> enable_seqscan to off.  Then the planner will give you an index-using
> plan if it is at all possible.  Then you can compare the costs.  If the
> planner still gives you a sequential scan, then the index was not
> applicable for other reasons.

Thanks!  This gives me enough information. I was essentially looking for
something concrete to discuss for cases where someone even more novice than
myself in query optimization points to issues after adding an index.

I have a tangential question that I am curious about more than it being
helpful in building queries.

Is index information used to determine the input scan cost or is that
determine through another mechanism or other metadata in the table?  I mean
this for a simple case and not in an exhaustive general sense.  For example
a table with an id and timestamp column with an index on the id and
timestamp filtered by a date range.


Re: Converting timestamp to timestamptz without rewriting tables

2017-12-18 Thread Corey Taylor
On Mon, Dec 18, 2017 at 2:18 AM, Tom Dunstan  wrote:
>
>>
>> A timestamptz stores everything as UTC and the value is converted to the
session time zone upon retrieval.
>
>
> Which is exactly what we want.

If a random person can interject here, I believe what Tom Dunstan is asking
about here is the cheapest way to do a raw conversion of *all* columns in
*all* tables with the underlying assumption that the data does not need to
be touched at all (as the data representation is the same between timestamp
and timestamptz).

Hence he is showing manipulating the pg_attribute and pg_table values.


Re: PostgreSQL needs percentage function

2017-12-18 Thread Corey Taylor
On Mon, Dec 18, 2017 at 11:01 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> ​Do you mean:
>
> SELECT id, val, val / (sum(val) OVER ())
> FROM vals;​
>


You could end up with a percentage > 100 or divide by 0 with that if the
values are not in a proper range.  I don't know if that would be the
intention.


Re: Array of foreign key

2017-12-24 Thread Corey Taylor
On Sun, Dec 24, 2017 at 1:11 AM, Daevor The Devoted 
wrote:

>
> Le 23 déc. 2017 20:25, "Peter J. Holzer"  a écrit :
>
> I think the OP wants something like this:
>
> create table features (
> id serial primary key,
> name varchar not null
> );
>
> create table products (
> id serial primary key,
> name varchar not null,
> array_of_features int[]
> references [] features(id) -- XXX - invented syntax
> );
>
> where products.array_of_features can only contain integers from
> features.id. This doesn't currently (as of 10.1) work.
> As I wrote, you could use triggers, but there is no way to declare that
> foreign key directly.
>
>
> This is an interesting feature you are talking about. However, I'm not
> sure I properly understand the use-case(s). Could you perhaps please give
> some examples to help me understand? (I don't just mean some SQL code, I
> mean an explanation of the business use-cases served by the code).
>


Other than Peter's simple example above, you could implement a simple RBAC
system this way.  As with the features example above, you would have a
table of roles that are referenced by a user table.  In this case, you
would need an int array with  foreign key constraints on each element in
the array to the id column in the role table.


Re: Ideas to deal with table corruption

2018-01-06 Thread Corey Taylor
On Sat, Jan 6, 2018 at 12:30 PM, Melvin Davidson 
wrote:

>
> *Don't forget to create indexes on the FK's in the table they reference!*
>
>
> *Also, it would be nice to know the PostgreSQL version and O/S.*
>


Is it possible for an index to not exist on those columns?

https://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK

"A foreign key must reference columns that either are a primary key or form
a unique constraint. This means that the referenced columns always have an
index (the one underlying the primary key or unique constraint); so checks
on whether a referencing row has a match will be efficient."