Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Adrian Klaver

On 3/20/20 8:13 PM, pabloa98 wrote:


Nothing I saw that said int could not become bigint.


My bad. The code cannot be a bigint. Or it could be a bigint between 1 
to  :)



Aah, that was the counter Peter was talking about. I missed that.

As to below that is going to require more thought.

I thought it was not important. The code could be a number from 1 to 
 (so an Int will be OK) assigned in order-ish. This is because 
of business requirements. The code should not change in case a row is 
deleted. That rules out using windowed functions. At least for this 
problem. There could be some gaps if they are not too big (similar to a 
serial column when a transaction is rolled back). We are more concerned 
about code generation collisions (for example 2 transactions calculating 
the same code value) than gaps. For each pair (group, element) the code, 
once assigned should remain constant. This is because the group, the 
element, and the code will become an id (business-related). So:


group, element, code = bid
1, 3, 1 = bid 131
2, 1, 1 = bid 211
etc

This calculated bid follows the convention described here and it is used 
as a reference in other tables. Therefore it should not change.
All this weirdness is because of business requirements. I am good with a 
classic serial column. But that will not map our business rules.


Regarding to the implementation of this. Our concern is to choose 
something not slow (it does not need to be super fast because there will 
be more operations in other tables using the same transaction) and I 
thought that creating/dropping sequences could be a solution. But I was 
not sure. I am not sure how it will behave with millions of sequences.


If there is another solution different than millions of sequences that 
do not block, generate few gaps (and those gaps are small) and never 
generate collisions then I will happily use it.


I hope I described the problem completely.

Pablo






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Rob Sargent



> On Mar 21, 2020, at 10:47 AM, Adrian Klaver  wrote:
> 
> On 3/20/20 8:13 PM, pabloa98 wrote:
>>Nothing I saw that said int could not become bigint.
>> My bad. The code cannot be a bigint. Or it could be a bigint between 1 to 
>>  :)
> 
> 
> Aah, that was the counter Peter was talking about. I missed that.
> 
> As to below that is going to require more thought.
> 
Still no word on the actual requirement. As someone who believes consecutive 
numbers on digital invoices is simply a mistaken interpretation of the paper 
based system, I suspect a similar error here. But again we haven’t really 
heard, far as I know. Something really fishy about . 
> 




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Ron

On 3/21/20 12:02 PM, Rob Sargent wrote:

On Mar 21, 2020, at 10:47 AM, Adrian Klaver  wrote:

On 3/20/20 8:13 PM, pabloa98 wrote:

Nothing I saw that said int could not become bigint.
My bad. The code cannot be a bigint. Or it could be a bigint between 1 to 
 :)

Aah, that was the counter Peter was talking about. I missed that.

As to below that is going to require more thought.


Still no word on the actual requirement. As someone who believes consecutive 
numbers on digital invoices is simply a mistaken interpretation of the paper 
based system, I suspect a similar error here. But again we haven’t really 
heard, far as I know. Something really fishy about .


Why?  "Print" and "screen" forms have all sorts of practical restrictions 
like this.


--
Angular momentum makes the world go 'round.




Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Alastair McKinley
Hi all,

I have a long running query that I have tweaked along with config (e.g. 
min_parallel_table_scan_size) to execute nicely and very fast in parallel which 
works as expected executed directly from psql client.  The query is then 
embedded in a psql function like "return query select * from 
function_that_executes_the_query()".

I am checking the explain output (using execute explain $query) just before 
executing inside my function and it the plan is identical to what I would 
expect, planning 8 workers.  However, this query actually only uses 1 worker 
and takes many times longer than when ran directly on the psql command line 
with the same server configuration parameters.

Why would the explain output be different from the executed plan? Is this a 
limitation of plpgsql functions? Is there any way to debug this further?

If it is meaningful during parallel execution I notice lots of "postgres: 
parallel worker" proceses in top and when executing from my function just a 
single "postgres: $user $db $host($pid) SELECT" processes.

Best regards,

Alastair




Re: Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Adrian Klaver

On 3/21/20 10:25 AM, Alastair McKinley wrote:

Hi all,

I have a long running query that I have tweaked along with config (e.g. 
min_parallel_table_scan_size) to execute nicely and very fast in 
parallel which works as expected executed directly from psql client.  
The query is then embedded in a psql function like "return query select 
* from function_that_executes_the_query()".


Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html



I am checking the explain output (using execute explain $query) just 
before executing inside my function and it the plan is identical to what 
I would expect, planning 8 workers.  However, this query actually only 
uses 1 worker and takes many times longer than when ran directly on the 
psql command line with the same server configuration parameters.


Why would the explain output be different from the executed plan? Is 
this a limitation of plpgsql functions? Is there any way to debug this 
further?


If it is meaningful during parallel execution I notice lots of 
"postgres: parallel worker" proceses in top and when executing from my 
function just a single "postgres: $user $db $host($pid) SELECT" processes.


Best regards,

Alastair





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Alastair McKinley
Hi Adrian,

Thanks for getting back to me.

Postgres version is:

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-39), 64-bit

I simplified it to the following structure:

create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
select generate_query1() into q; -- the query q executes in parallel with 8 
workers if executed standalone
for output in execute('explain ' || q) loop
raise notice '%',output;  -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this launches one worker

   select generate_query2() into q;
for output in execute('explain ' || q) loop
raise notice '%',output;  -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this also launches one worker
end;
language plpgsql;

Should this work in principle or am I missing something subtle about parallel 
dynamic queries in plpgsql functions?  Does the outer function need to be 
parallel safe?
Might a stored proc work better?

Best regards,

Alastair



From: Adrian Klaver 
Sent: 21 March 2020 17:38
To: Alastair McKinley ; 
pgsql-general@lists.postgresql.org 
Subject: Re: Explain says 8 workers planned, only 1 executed

On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers.  However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>


--
Adrian Klaver
adrian.kla...@aklaver.com


Re: Duplicate key violation on upsert

2020-03-21 Thread Adrian Klaver

On 3/20/20 1:32 PM, Matt Magoffin wrote:


On 21/03/2020, at 8:10 AM, Adrian Klaver > wrote:


The _hyper_1_1931_chunk_da_datum_x_acc_idx index has the same 
definition as the da_datum_x_acc_idx above (it is defined on a child 
table). That is, they are both essentially:
UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) WHERE jdata_a IS 
NOT NULL
The da_datum_pkey index is what the ON CONFLICT cause refers to, so 
(node_id, ts, source_id) is UNIQUE as well.


Hmm, wonder if you are getting bit by this?:

https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT

"INSERT with an ON CONFLICT DO UPDATE clause is a “deterministic” 
statement. This means that the command will not be allowed to affect 
any single existing row more than once; a cardinality violation error 
will be raised when this situation arises. Rows proposed for insertion 
should not duplicate each other in terms of attributes constrained by 
an arbiter index or constraint.”


I’m not sure I’m wrapping my head around this. The INSERT affects 1 row 
as the unique values (node_id, ts, source_id) are specified in the 
statement. Is it possible that da_datum_x_acc_idx is used as the arbiter 
index in this situation, rather than da_datum_pkey (that I intended), 
and you’re saying that the jdata_a column is getting updated twice, 
first in the INSERT and second in the DO UPDATE, triggering the 
duplicate key violation?


I was thinking more about this:

"INSERT INTO solardatum.da_datum(ts, node_id, source_id, posted, 
jdata_i, jdata_a, jdata_s, jdata_t)

VALUES (…) ..."

from your OP. Namely whether it was:

VALUES (), (), (), ...

and if so there were values in the (),(),() that duplicated each other.

As to the second part of your response, ON CONFLICT does one of either 
INSERT or UPDATE. If:


1) There is no conflict for ON CONFLICT (node_id, ts, source_id) then 
the INSERT proceeds.


2) If there is a conflict then an UPDATE occurs using the SET values.

Now just me working through this:

da_datum_pkey   = (node_id, ts, source_id)
da_datum_x_acc_idx  = (node_id, source_id, ts DESC, jdata_a)

If 1) from above applies then da_datum_x_acc_idx will not be tripped as 
the only way that could happen is if the node_id, ts, source_id was the 
same as an existing row and that can't be true because the PK over the 
same values passed.


If 2) from above happened then you are trying to UPDATE a row with 
matching PK values(node_id, ts, source_id). Now it is entirely possible 
that since you are not testing for constraint violation on (node_id, 
source_id, ts DESC, jdata_a) that you be doing SET jdata_a = 
EXCLUDED.jdata_a, using a value that would trip da_datum_x_acc_idx




— m@




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread pabloa98
> > As to below that is going to require more thought.
> >
> Still no word on the actual requirement. As someone who believes
> consecutive numbers on digital invoices is simply a mistaken interpretation
> of the paper based system, I suspect a similar error here. But again we
> haven’t really heard, far as I know. Something really fishy about .
> >
>
It is fishy. This is the thing. The code, element, and group is part of a
bigger code called item identifier (or ii).

An item identifier is a result of:  code || group || element || qualifier
(a constant like 55) || check digit coming from some check algorithm.

For example:

for a triplet (group, element, code) like (1, 3, 63) the item identifier
(or ii) is: 630010003558  (the last 8 is the check digit).
This number is converted to a bigint and stored (and used as PK or FK on
other tables, etc, etc).

In an item identifier the room is assigned like:

3 digits for group
4 digits for element
8 digits for code (not padded with 0s)
2 digits for qualifier
1 digit for the check digit.
-
18 digits for item identifier.

And that is why we have 8 digits maximum for the code. So when a "code" is
generated, it is important that there are no collisions, no big gaps
(little gaps are OK, but because we cannot waste too many codes, keeping
gaps small is important) and no locks.

The lock part is because we solved a similar problem with a counter by row
locking the counter and increasing it in another part of the database. The
result is that all the queries using that table are queued by pair (group,
element) that is not that bad because we are not inserting thousands of
rows by second. Still is killing cluster performance (but performance is
still OK from the business point of view). The problem using locks is that
they are too sensitive to developer errors and bugs. Sometimes connected
clients aborts and the connection is returned to the pool with the lock
active until the connection is closed or someone unlocks the row. I would
prefer to have something more resilient to developers/programming errors,
if possible.


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread pabloa98
On Fri, Mar 20, 2020 at 9:04 PM John W Higgins  wrote:

>
>
> On Fri, Mar 20, 2020 at 8:13 PM pabloa98  wrote:
>
>>
>> I hope I described the problem completely.
>>
>>
> 1) What is a group - does it exist prior to records being inserted? How
> many groups are you working with? How long do they live for?
> 2) #1 but for element
>
Group and elements are 2 numbers assigned in another part of the business.
They exist before codes are created. They will continue existing after each
code is deleted. If they are deleted is because we are evicting all that
branch of information from the database. When that happens everything goes
using FK.

3) How many records per second per group per element are you anticipating?
> Are their spikes?
>
We are thinking about 100K each 2 to 3 months max. At business hours (8
hours each day).


> 4) How long does a group/element live for? You can put 10 record per
> second over 3 years and still be under 100 million records (the size of
> your seq_number)
>
That depends. Some group/element will leave seconds because they are used
for testing. Others will live years.

5) How quickly do you need the seq_number - if you batch created them once
> a minute would that work? Does it need to be quicker than that? Slower than
> that? Or is it an immediate need?
>
We will need each code is created in less than 1 second (1 second is ok if
that is the max time). Regarding to the pair group/element, it is ok if
they take no more than 3 seconds top.



> That's a starting point.
>
> John W Higgins
>


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread pabloa98
> Why?  "Print" and "screen" forms have all sorts of practical restrictions
> like this.
>
> Legacy I guess. These are all digital stuff. But the final result is an
identifier that people can read and realize what they are talking about.

Pablo


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Rob Sargent


> On Mar 21, 2020, at 12:18 PM, pabloa98  wrote:
> 
> 
> 
>> Why?  "Print" and "screen" forms have all sorts of practical restrictions 
>> like this.
>> 
> Legacy I guess. These are all digital stuff. But the final result is an 
> identifier that people can read and realize what they are talking about.
> 

I think  the “why” was directed at my comment. 
I’m suspicious of the practicality. 

To me the description of the ID smacks of database-in-the-name folly. I 
recognize that OP is likely unable to take another path. I’ll not push this any 
further. 

> Pablo


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Peter J. Holzer
On 2020-03-20 17:53:11 -0700, Adrian Klaver wrote:
> On 3/20/20 4:29 PM, Peter J. Holzer wrote:
> > On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> > > On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:
> > > > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> > > > > First, it sounds like you care about there being no gaps in the 
> > > > > records you end
> > > > > up saving.  If that is the case then sequences will not work for you.
> > > > 
> > > > I think (but I would love to be proven wrong), that *nothing* will work
> > > > reliably, if
> > > > 
> > > > 1) you need gapless numbers which are strictly allocated in sequence
> > > > 2) you have transactions
> > > > 3) you don't want to block
[...]
> > Yes. This wasn't a response to the OP's requirements, but to David's
> > (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> > requirements which would preclude sequences also preclude any other
> > solution.
> 
> I don't see  a knee-jerk reaction in this:
> 
> https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com
> 
> The response was if you cared about gaps(not something the OP had specified
> at that point) then a sequence would not work.

And I think that "care about gaps -> sequence doesn't work" is a
knee-jerk reaction. It's similar to "can't parse HTML with regexps".
True in the general case, and therefore people tend to blurt it out
every time the topic comes up. But not necessarily true in specific
cases. As I wrote above, there is no perfect solution - so you have to
think about the actual requirements and the consequences of various
solutions - and maybe using a sequence is the best (or least bad)
solution.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Duplicate key violation on upsert

2020-03-21 Thread Adrian Klaver

On 3/21/20 11:05 AM, Adrian Klaver wrote:

On 3/20/20 1:32 PM, Matt Magoffin wrote:


On 21/03/2020, at 8:10 AM, Adrian Klaver > wrote:


The _hyper_1_1931_chunk_da_datum_x_acc_idx index has the same 
definition as the da_datum_x_acc_idx above (it is defined on a child 
table). That is, they are both essentially:
UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) WHERE jdata_a 
IS NOT NULL
The da_datum_pkey index is what the ON CONFLICT cause refers to, so 
(node_id, ts, source_id) is UNIQUE as well.


Hmm, wonder if you are getting bit by this?:

https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT

"INSERT with an ON CONFLICT DO UPDATE clause is a “deterministic” 
statement. This means that the command will not be allowed to affect 
any single existing row more than once; a cardinality violation error 
will be raised when this situation arises. Rows proposed for 
insertion should not duplicate each other in terms of attributes 
constrained by an arbiter index or constraint.”


I’m not sure I’m wrapping my head around this. The INSERT affects 1 
row as the unique values (node_id, ts, source_id) are specified in the 
statement. Is it possible that da_datum_x_acc_idx is used as the 
arbiter index in this situation, rather than da_datum_pkey (that I 
intended), and you’re saying that the jdata_a column is getting 
updated twice, first in the INSERT and second in the DO UPDATE, 
triggering the duplicate key violation?


I was thinking more about this:

"INSERT INTO solardatum.da_datum(ts, node_id, source_id, posted, 
jdata_i, jdata_a, jdata_s, jdata_t)

 VALUES (…) ..."

from your OP. Namely whether it was:

VALUES (), (), (), ...

and if so there were values in the (),(),() that duplicated each other.

As to the second part of your response, ON CONFLICT does one of either 
INSERT or UPDATE. If:


1) There is no conflict for ON CONFLICT (node_id, ts, source_id) then 
the INSERT proceeds.


2) If there is a conflict then an UPDATE occurs using the SET values.

Now just me working through this:

da_datum_pkey   = (node_id, ts, source_id)
da_datum_x_acc_idx  = (node_id, source_id, ts DESC, jdata_a)

If 1) from above applies then da_datum_x_acc_idx will not be tripped as 
the only way that could happen is if the node_id, ts, source_id was the 
same as an existing row and that can't be true because the PK over the 
same values passed.


Well the below is complete rot. If you are UPDATEing then you are not 
creating a duplicate row, just overwriting a value with itself.




If 2) from above happened then you are trying to UPDATE a row with 
matching PK values(node_id, ts, source_id). Now it is entirely possible 
that since you are not testing for constraint violation on (node_id, 
source_id, ts DESC, jdata_a) that you be doing SET jdata_a = 
EXCLUDED.jdata_a, using a value that would trip da_datum_x_acc_idx




— m@







--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Peter J. Holzer
On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
> To me the description of the ID smacks of database-in-the-name folly. I
> recognize that OP is likely unable to take another path. I’ll not push this 
> any
> further. 

Identifiers often have internal structure. In Austria for example, the
social security number contains the birth date. Invoice numbers, project
numbers or court case numbers often contain the year.

That's because they are used by *people*, and people like their
identifiers to make some kind of sense. The computer doesn't care.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread pabloa98
On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer  wrote:

>
>
> And I think that "care about gaps -> sequence doesn't work" is a
> knee-jerk reaction. It's similar to "can't parse HTML with regexps".
> True in the general case, and therefore people tend to blurt it out
> every time the topic comes up. But not necessarily true in specific
> cases. As I wrote above, there is no perfect solution - so you have to
> think about the actual requirements and the consequences of various
> solutions - and maybe using a sequence is the best (or least bad)
> solution.
>
>
OK. In that case, I will proceed with the millions of sequences. We will
see.

Anyway, It will be awesome if we have a sequence data type in a future
version of postgresql. They will solve a lot of problems similar to this
one.

Pablo


Re: Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Jeremy Smith
Unfortunately, return query will never use parallel workers.  See:
https://stackoverflow.com/q/58079898/895640 and
https://www.postgresql.org/message-id/16040-eaacad11fecfb...@postgresql.org

On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley <
a.mckin...@analyticsengines.com> wrote:

> Hi Adrian,
>
> Thanks for getting back to me.
>
> Postgres version is:
>
> PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-39), 64-bit
>
> I simplified it to the following structure:
>
> create function f() returns setof my_type as
> $$
> declare
> q text;
> output text;
> begin
> select generate_query1() into q; -- the query q executes in parallel
> with 8 workers if executed standalone
> for output in execute('explain ' || q) loop
> raise notice '%',output;  -- this plan says 8 workers will be
> launched
> end loop;
> return query execute q; -- this launches one worker
>
>select generate_query2() into q;
> for output in execute('explain ' || q) loop
> raise notice '%',output;  -- this plan says 8 workers will be
> launched
> end loop;
> return query execute q; -- this also launches one worker
> end;
> language plpgsql;
>
> Should this work in principle or am I missing something subtle about
> parallel dynamic queries in plpgsql functions?  Does the outer function
> need to be parallel safe?
> Might a stored proc work better?
>
> Best regards,
>
> Alastair
>
>
> --
> *From:* Adrian Klaver 
> *Sent:* 21 March 2020 17:38
> *To:* Alastair McKinley ;
> pgsql-general@lists.postgresql.org 
> *Subject:* Re: Explain says 8 workers planned, only 1 executed
>
> On 3/21/20 10:25 AM, Alastair McKinley wrote:
> > Hi all,
> >
> > I have a long running query that I have tweaked along with config (e.g.
> > min_parallel_table_scan_size) to execute nicely and very fast in
> > parallel which works as expected executed directly from psql client.
> > The query is then embedded in a psql function like "return query select
> > * from function_that_executes_the_query()".
>
> Postgres version?
>
> What is happening in function_that_executes_the_query()?
>
> You might want to take a look at below to see any of the conditions apply:
>
> https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html
>
> >
> > I am checking the explain output (using execute explain $query) just
> > before executing inside my function and it the plan is identical to what
> > I would expect, planning 8 workers.  However, this query actually only
> > uses 1 worker and takes many times longer than when ran directly on the
> > psql command line with the same server configuration parameters.
> >
> > Why would the explain output be different from the executed plan? Is
> > this a limitation of plpgsql functions? Is there any way to debug this
> > further?
> >
> > If it is meaningful during parallel execution I notice lots of
> > "postgres: parallel worker" proceses in top and when executing from my
> > function just a single "postgres: $user $db $host($pid) SELECT"
> processes.
> >
> > Best regards,
> >
> > Alastair
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Rob Sargent



> On Mar 21, 2020, at 1:13 PM, Peter J. Holzer  wrote:
> 
> On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
>> To me the description of the ID smacks of database-in-the-name folly. I
>> recognize that OP is likely unable to take another path. I’ll not push this 
>> any
>> further. 
> 
> Identifiers often have internal structure. In Austria for example, the
> social security number contains the birth date. Invoice numbers, project
> numbers or court case numbers often contain the year.
> 
> That's because they are used by *people*, and people like their
> identifiers to make some kind of sense. The computer doesn't care.

Since OP said this was digital not paper, I see this as a presentation problem 
bleeding into database design (assuming your response was an invitation to push 
further).





Re: Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Alastair McKinley
Hi Jeremy,

Thanks for solving the mystery.  I think this might be a missing point in 
section 15.2 in the docs.

I wonder will this ever be improved or should I just write to temporary tables 
instead of return query?

Best regards,

Alastair

From: Jeremy Smith 
Sent: 21 March 2020 20:50
To: Alastair McKinley 
Cc: Adrian Klaver ; 
pgsql-general@lists.postgresql.org 
Subject: Re: Explain says 8 workers planned, only 1 executed

Unfortunately, return query will never use parallel workers.  See: 
https://stackoverflow.com/q/58079898/895640 and 
https://www.postgresql.org/message-id/16040-eaacad11fecfb...@postgresql.org

On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley 
mailto:a.mckin...@analyticsengines.com>> wrote:
Hi Adrian,

Thanks for getting back to me.

Postgres version is:

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-39), 64-bit

I simplified it to the following structure:

create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
select generate_query1() into q; -- the query q executes in parallel with 8 
workers if executed standalone
for output in execute('explain ' || q) loop
raise notice '%',output;  -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this launches one worker

   select generate_query2() into q;
for output in execute('explain ' || q) loop
raise notice '%',output;  -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this also launches one worker
end;
language plpgsql;

Should this work in principle or am I missing something subtle about parallel 
dynamic queries in plpgsql functions?  Does the outer function need to be 
parallel safe?
Might a stored proc work better?

Best regards,

Alastair



From: Adrian Klaver 
mailto:adrian.kla...@aklaver.com>>
Sent: 21 March 2020 17:38
To: Alastair McKinley 
mailto:a.mckin...@analyticsengines.com>>; 
pgsql-general@lists.postgresql.org 
mailto:pgsql-general@lists.postgresql.org>>
Subject: Re: Explain says 8 workers planned, only 1 executed

On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers.  However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>


--
Adrian Klaver
adrian.kla...@aklaver.com


Re: Duplicate key violation on upsert

2020-03-21 Thread Matt Magoffin

> On 22/03/2020, at 8:11 AM, Adrian Klaver  wrote:
> 
>> I was thinking more about this:
>> "INSERT INTO solardatum.da_datum(ts, node_id, source_id, posted, jdata_i, 
>> jdata_a, jdata_s, jdata_t)
>> VALUES (…) ..."
>> from your OP. Namely whether it was:
>> VALUES (), (), (), ...
>> and if so there were values in the (),(),() that duplicated each other.
>> As to the second part of your response, ON CONFLICT does one of either 
>> INSERT or UPDATE. If:
>> 1) There is no conflict for ON CONFLICT (node_id, ts, source_id) then the 
>> INSERT proceeds.
>> 2) If there is a conflict then an UPDATE occurs using the SET values.
>> Now just me working through this:
>> da_datum_pkey   = (node_id, ts, source_id)
>> da_datum_x_acc_idx  = (node_id, source_id, ts DESC, jdata_a)
>> If 1) from above applies then da_datum_x_acc_idx will not be tripped as the 
>> only way that could happen is if the node_id, ts, source_id was the same as 
>> an existing row and that can't be true because the PK over the same values 
>> passed.
> 
> Well the below is complete rot. If you are UPDATEing then you are not 
> creating a duplicate row, just overwriting a value with itself.
> 
>> If 2) from above happened then you are trying to UPDATE a row with matching 
>> PK values(node_id, ts, source_id). Now it is entirely possible that since 
>> you are not testing for constraint violation on (node_id, source_id, ts 
>> DESC, jdata_a) that you be doing SET jdata_a = EXCLUDED.jdata_a, using a 
>> value that would trip da_datum_x_acc_idx

Sorry for the vagueness in my OP, I was trying to make it easier to read. The 
VALUES are for individual single column values, so a single possible row to 
insert/update.

So what you’ve outlined is basically what I thought should be happening. 
Namely, there can be only one row that will be inserted/updated. I am wondering 
if I should re-create the da_datum_x_acc_idx index without UNIQUE? I had it as 
UNIQUE to optimise the type of queries that make use of that index… but I did a 
little bit of testing using a non-UNIQUE index and those queries appear to 
execute around the same time as with the UNIQUE index. I just wasn’t sure if 
that would just be masking some other problem in my setup.

— m@

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Adrian Klaver

On 3/21/20 1:45 PM, pabloa98 wrote:



On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer > wrote:




And I think that "care about gaps -> sequence doesn't work" is a
knee-jerk reaction. It's similar to "can't parse HTML with regexps".
True in the general case, and therefore people tend to blurt it out
every time the topic comes up. But not necessarily true in specific
cases. As I wrote above, there is no perfect solution - so you have to
think about the actual requirements and the consequences of various
solutions - and maybe using a sequence is the best (or least bad)
solution.


OK. In that case, I will proceed with the millions of sequences. We will 
see.


Anyway, It will be awesome if we have a sequence data type in a future 
version of postgresql. They will solve a lot of problems similar to this 
one.


Actually there are already two:

https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-SERIAL

https://www.postgresql.org/docs/12/sql-createtable.html
"
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]

This clause creates the column as an identity column. It will have 
an implicit sequence attached to it and the column in new rows will 
automatically have values from the sequence assigned to it.

"

What you want is a built in method to pull from different sequences 
depending on the value of another column or values from multiple columns 
in the same table.





Pablo




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread pabloa98
On Sat, Mar 21, 2020 at 4:37 PM Adrian Klaver 
wrote:

>
> > Anyway, It will be awesome if we have a sequence data type in a future
> > version of postgresql. They will solve a lot of problems similar to this
> > one.
>
> Actually there are already two:
>
> https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-SERIAL
>
> https://www.postgresql.org/docs/12/sql-createtable.html
> "
> GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]
>
>  This clause creates the column as an identity column. It will have
> an implicit sequence attached to it and the column in new rows will
> automatically have values from the sequence assigned to it.
> "
>
> What you want is a built in method to pull from different sequences
> depending on the value of another column or values from multiple columns
> in the same table.
>
> Exactly. That will work fine too.


Re: Explain says 8 workers planned, only 1 executed

2020-03-21 Thread Tom Lane
Alastair McKinley  writes:
> Thanks for solving the mystery.  I think this might be a missing point in 
> section 15.2 in the docs.
> I wonder will this ever be improved or should I just write to temporary 
> tables instead of return query?

I just posted a patch to improve that [1], but it's not something we'd be
likely to back-patch into existing releases.

regards, tom lane

[1] https://www.postgresql.org/message-id/1741.1584847383%40sss.pgh.pa.us