Intervals and ISO 8601 duration

2023-01-12 Thread Sebastien Flaesch
PostgreSQL has the INTERVAL type, which can be defined with fields such as:

INTERVAL YEAR TO MONTH(year-month class)
INTERVAL DAY TO SECOND(p)   (day-second class)

It's not possible to define an INTERVAL YEAR TO SECOND(p), which makes sense, 
since the number of days in a month can vary. Other SQL engines like Oracle and 
Informix also have 2 classes of interval types.

However, the ISO-8601 standard format for durations allows to specify 
year/month with day to second parts, for example:

P2Y10M15DT10H30M20S

Seems PostgreSQL accepts this format in input.
But what does it mean exactly?
What is the actual INTERVAL value and INTERVAL class of this?

Testing with V15.1:

What is the interval class in this case:

test1=> select cast('P2Y10M15DT10H30M20S' as interval);
 interval
--
 2 years 10 mons 15 days 10:30:20
(1 row)

Should the following convert to a day-second interval?

test1=> select cast('P2Y10M15DT10H30M20S' as interval day to second);
 interval
--
 2 years 10 mons 15 days 10:30:20
(1 row)

Should PostgreSQL not raise an SQL error in above cases?


When using invalid INTERVAL fields, error is raised as expected:

test1=> select cast('P2Y10M15DT10H30M20S' as interval year to second);
ERROR:  syntax error at or near "second"
LINE 1: ...lect cast('P2Y10M15DT10H30M20S' as interval year to second);


Does PostgreSQL assume that a month is ~30 days?

I did not find details about this in the documentation.

Thanks in advance!
Seb



default icu locale for new databases (PG15)

2023-01-12 Thread Robert Sjöblom

Greetings,

When initializing a new database server with a default collation, there 
are a number of different locales available. What's the difference between


1. se-x-icu
2. se-SE-x-icu
3. sv-SE-x-icu

? And, perhaps more importantly, how do I future-proof this so that I'm 
not making a decision today that will make my life worse when upgrading 
from postgres 15 to 16+? For a database with Swedish collation, which 
option is "the best"? Or perhaps "the most foolproof"?


Best regards,
Robert Sjöblom

--
Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för 
adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är 
förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag 
gått till fel adressat vänligen radera det ursprungliga meddelandet och 
underrätta avsändaren via e-post





Re: Intervals and ISO 8601 duration

2023-01-12 Thread Tom Lane
Sebastien Flaesch  writes:
> PostgreSQL has the INTERVAL type, which can be defined with fields such as:
> INTERVAL YEAR TO MONTH(year-month class)
> INTERVAL DAY TO SECOND(p)   (day-second class)

You can also say just INTERVAL, without any of the restrictions.

> It's not possible to define an INTERVAL YEAR TO SECOND(p), which makes
> sense,

It's not so much that it doesn't make sense as that the SQL standard
doesn't have such a spelling.  They enumerate a few allowed combinations
(I think that no-modifiers is one of them), and we accept those for
pro forma syntax compliance.

> Should the following convert to a day-second interval?

> test1=> select cast('P2Y10M15DT10H30M20S' as interval day to second);
>  interval
> --
>  2 years 10 mons 15 days 10:30:20
> (1 row)

> Should PostgreSQL not raise an SQL error in above cases?

We regard these modifiers as similar to precision restrictions in
numerics and timestamps: we will round off low-order fields to
match the typmod, but we will not throw away high-order fields.

This probably doesn't match the SQL spec in detail, but the
details of their datetime types are sufficiently brain-dead
that we've never worried about that too much (eg, they still
don't have a model for daylight-savings time, last I checked).

What Postgres actually stores for an interval is three fields:
months, days, and microseconds.  If we're forced to interconvert
between those units, we use 30 days = 1 month and 24 hours = 1 day,
but it's usually best to avoid doing that.

regards, tom lane




Re: default icu locale for new databases (PG15)

2023-01-12 Thread Laurenz Albe
On Thu, 2023-01-12 at 15:56 +0100, Robert Sjöblom wrote:
> When initializing a new database server with a default collation, there 
> are a number of different locales available. What's the difference between
> 
> 1. se-x-icu
> 2. se-SE-x-icu
> 3. sv-SE-x-icu
> 
> ? And, perhaps more importantly, how do I future-proof this so that I'm 
> not making a decision today that will make my life worse when upgrading 
> from postgres 15 to 16+? For a database with Swedish collation, which 
> option is "the best"? Or perhaps "the most foolproof"?

Of the alternatives you list, only "sv-SE-x-icu" makes sense, because "se"
stands for the Sámi or Lapp language, not Swedish.

You could choose between "sv-SE", "sv-FI" and "sv-AX", depending on whether
you are located in Sweden, Finland or the Åland Islands.  I don't speak Swedish,
so I don't know how different they are and how they differ from the generic 
"sv".

I don't think any of these Locales will go away in the forseeable future.

Yours,
Laurenz Albe




Re: Changing displayed time zone in RAISE NOTICE output?

2023-01-12 Thread Adrian Klaver

On 1/11/23 21:25, Ron wrote:

On 1/11/23 15:06, Adrian Klaver wrote:




Hmm.  I'd have sworn this didn't work when I tried it:


Did you do?:

DO $$
BEGIN
RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
END$$;
ERROR:  syntax error at or near "timezone"
LINE 3: RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';




postgres=#
postgres=# DO $$
BEGIN
RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
END$$;
NOTICE:  2023-01-12 05:22:40.517299

But it does work, so all's well that ends well.



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Changing displayed time zone in RAISE NOTICE output?

2023-01-12 Thread Pavel Stehule
čt 12. 1. 2023 v 16:39 odesílatel Adrian Klaver 
napsal:

> On 1/11/23 21:25, Ron wrote:
> > On 1/11/23 15:06, Adrian Klaver wrote:
>
> >
> > Hmm.  I'd have sworn this didn't work when I tried it:
>
> Did you do?:
>
> DO $$
> BEGIN
> RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
> END$$;
> ERROR:  syntax error at or near "timezone"
> LINE 3: RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
>
>
there should be space >>AT TIME ZONE<<

(2023-01-12 17:21:10) postgres=# DO $$
BEGIN
RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
END$$;
NOTICE:  2023-01-12 16:21:14.063256
DO
(2023-01-12 17:21:14) postgres=#


>
> >
> > postgres=#
> > postgres=# DO $$
> > BEGIN
> > RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
> > END$$;
> > NOTICE:  2023-01-12 05:22:40.517299
> >
> > But it does work, so all's well that ends well.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>


Re: Changing displayed time zone in RAISE NOTICE output?

2023-01-12 Thread Adrian Klaver

On 1/12/23 08:22, Pavel Stehule wrote:



čt 12. 1. 2023 v 16:39 odesílatel Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> napsal:


On 1/11/23 21:25, Ron wrote:
 > On 1/11/23 15:06, Adrian Klaver wrote:

 >
 > Hmm.  I'd have sworn this didn't work when I tried it:

Did you do?:

DO $$
BEGIN
RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
END$$;
ERROR:  syntax error at or near "timezone"
LINE 3: RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';


there should be space >>AT TIME ZONE<<

(2023-01-12 17:21:10) postgres=# DO $$
BEGIN
RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
END$$;
NOTICE:  2023-01-12 16:21:14.063256
DO
(2023-01-12 17:21:14) postgres=#


Yeah I know, I was responding to Ron saying his previous attempt had 
failed. Using 'timezone' instead of 'time zone' is a mistake I have made 
often enough that I threw it out there as a possible cause for the failure.





 >
 > postgres=#
 > postgres=# DO $$
 > BEGIN
 > RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
 > END$$;
 > NOTICE:  2023-01-12 05:22:40.517299
 >
 > But it does work, so all's well that ends well.
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Changing displayed time zone in RAISE NOTICE output?

2023-01-12 Thread Pavel Stehule
čt 12. 1. 2023 v 17:27 odesílatel Adrian Klaver 
napsal:

> On 1/12/23 08:22, Pavel Stehule wrote:
> >
> >
> > čt 12. 1. 2023 v 16:39 odesílatel Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> napsal:
> >
> > On 1/11/23 21:25, Ron wrote:
> >  > On 1/11/23 15:06, Adrian Klaver wrote:
> >
> >  >
> >  > Hmm.  I'd have sworn this didn't work when I tried it:
> >
> > Did you do?:
> >
> > DO $$
> > BEGIN
> > RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
> > END$$;
> > ERROR:  syntax error at or near "timezone"
> > LINE 3: RAISE NOTICE '%', clock_timestamp() at timezone 'UTC';
> >
> >
> > there should be space >>AT TIME ZONE<<
> >
> > (2023-01-12 17:21:10) postgres=# DO $$
> > BEGIN
> > RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
> > END$$;
> > NOTICE:  2023-01-12 16:21:14.063256
> > DO
> > (2023-01-12 17:21:14) postgres=#
>
> Yeah I know, I was responding to Ron saying his previous attempt had
> failed. Using 'timezone' instead of 'time zone' is a mistake I have made
> often enough that I threw it out there as a possible cause for the failure.
>

:-)


>
> >
> >
> >  >
> >  > postgres=#
> >  > postgres=# DO $$
> >  > BEGIN
> >  > RAISE NOTICE '%', clock_timestamp() at time zone 'UTC';
> >  > END$$;
> >  > NOTICE:  2023-01-12 05:22:40.517299
> >  >
> >  > But it does work, so all's well that ends well.
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


AW: [Extern] Re: postgres restore & needed history files

2023-01-12 Thread Zwettler Markus (OIZ)
> -Ursprüngliche Nachricht-
> Von: Laurenz Albe 
> Gesendet: Freitag, 6. Januar 2023 06:28
> An: Zwettler Markus (OIZ) ; pgsql-
> gene...@lists.postgresql.org
> Betreff: [Extern] Re: postgres restore & needed history files
> 
> On Tue, 2023-01-03 at 16:03 +, Zwettler Markus (OIZ) wrote:
> > We are using a DIY Postgres backup:
> > ---
> > psql -c "select pg_start_backup ('Full');"
> > save -s "${NSR_SERVER}" -g "${NSR_POOL}" "${PGDATA}"
> > psql -c "select pg_stop_backup();"
> > ---
> > The pg_wal directory is not saved with it because it is a linked directory.
> >
> >
> > After some time, we had to perform a point-in-time recovery of 1 of the 5
> databases to a point in time 7 days in the past.
> > We tried to perform a point-in-time restore on another host, which did not 
> > work
> until we copied the contents of the current pg_wal directory.
> > The current pg_wal directory included 8 history files: 0002.history to
> 0009.history.
> > The point-in-time restore worked smoodly after it had all these history 
> > files.
> >
> >
> > Afaik, all necessary history files should also be restored by the
> restore_command.
> > I had a look at our archived wal backups and found that 0002.history to
> 0008.history files already had been deleted due to our NSR backup 
> retention
> of 30 days.
> >
> > Question: Is it necessary to retain all history files?
> 
> Yes, the history files are an integral part of the database.
> You must not delete them from your WAL archive.
> 
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
> 



What would you do in case of a disaster when all history files in pg_wal are 
gone and also deleted in the backup due to the backup retention?

Thanks, Markus




Re: AW: [Extern] Re: postgres restore & needed history files

2023-01-12 Thread Ron

On 1/12/23 10:50, Zwettler Markus (OIZ) wrote:
[snip]

What would you do in case of a disaster when all history files in pg_wal are 
gone and also deleted in the backup due to the backup retention?


Yet another reason why you should not roll your own PITR backup solution.  
Use something like pgBackRest (what I use) or BarMan.


--
Born in Arizona, moved to Babylonia.




gexec from command prompt?

2023-01-12 Thread Ron

Postgresql 12.11

This might be more of a bash question, or it might be a psql vs engine problem.

I want to run this query using psql from a bash prompt:
select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name, table_name)
from dba.table_structure
order by table_name\gexec

Thus, I added an extra back

$ psql sides -atXc "select format('SELECT ''%s'', MIN(part_date) FROM %s;', 
table_name, table_name) from dba.table_structure order by table_name limit 
5\\gexec"
select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name, 
table_name) from dba.table_structure order by table_name limit 5\gexec

ERROR:  syntax error at or near "\"
LINE 1: ...) from dba.table_structure order by table_name limit 5\gexec

Removing "\\exec" from the statement, and appending -c "\\gexec" to the psql 
command technically worked, but did not run the commands.


--
Born in Arizona, moved to Babylonia.




Re: gexec from command prompt?

2023-01-12 Thread Pavel Stehule
čt 12. 1. 2023 v 18:25 odesílatel Ron  napsal:

> Postgresql 12.11
>
> This might be more of a bash question, or it might be a psql vs engine
> problem.
>
> I want to run this query using psql from a bash prompt:
> select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name,
> table_name)
> from dba.table_structure
> order by table_name\gexec
>
> Thus, I added an extra back
>
> $ psql sides -atXc "select format('SELECT ''%s'', MIN(part_date) FROM
> %s;',
> table_name, table_name) from dba.table_structure order by table_name limit
> 5\\gexec"
> select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name,
> table_name) from dba.table_structure order by table_name limit 5\gexec
> ERROR:  syntax error at or near "\"
> LINE 1: ...) from dba.table_structure order by table_name limit 5\gexec
>
> Removing "\\exec" from the statement, and appending -c "\\gexec" to the
> psql
> command technically worked, but did not run the commands.
>

I don't know why, but \g* commands don't work from the -c option. But in
this case it is not necessary

you can psql -c "xxx" | psql

Regards

Pavel


>
> --
> Born in Arizona, moved to Babylonia.
>
>
>


Re: gexec from command prompt?

2023-01-12 Thread Alvaro Herrera
On 2023-Jan-12, Ron wrote:

> Postgresql 12.11
> 
> This might be more of a bash question, or it might be a psql vs engine 
> problem.
> 
> I want to run this query using psql from a bash prompt:
> select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name, 
> table_name)
> from dba.table_structure
> order by table_name\gexec

Yeah, what I use in these cases is something like

echo "select format('SELECT ''%s'', MIN(part_date) FROM %s;', table_name, 
table_name)
from dba.table_structure
order by table_name \gexec" | psql -f-


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)




Why is a hash join preferred when it does not fit in work_mem

2023-01-12 Thread Dimitrios Apostolou

Hello list,

I have a very simple NATURAL JOIN that does not fit in the work_mem.  Why
does the query planner prefer a hash join that needs 361s, while with a
sort operation and a merge join it takes only 13s?

The server is an old Mac Mini with hard disk drive and only 4GB RAM.
Postgres version info:

 PostgreSQL 15.0 on x86_64-apple-darwin20.6.0, compiled by Apple clang
version 12.0.0 (clang-1200.0.32.29), 64-bit

The low work_mem and the disabled memoization are set on purpose, in order
to simplify a complex query, while reproducing the same problem that I
experienced there. This result is the simplest query I could get, where
the optimizer does not go for a faster merge join.

From my point of view a merge join is clearly faster, because the hash
table does not fit in memory and I expect a hash join to do a lot of
random I/O. But the query planner does not see that, and increasing
random_page_cost does not help either. In fact the opposite happens: the
merge join gets a higher cost difference to the hash join, as I increase
the random page cost!



# EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM
tasks_mm_workitems NATURAL JOIN workitem_ids;

QUERY PLAN
--
 Hash Join  (cost=121222.68..257633.01 rows=3702994 width=241) (actual
time=184498.464..360606.257 rows=3702994 loops=1)
   Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n,
workitem_ids.workitem_id
   Inner Unique: true
   Hash Cond: (tasks_mm_workitems.workitem_n = workitem_ids.workitem_n)
   Buffers: shared hit=15068 read=47434, temp read=56309 written=56309
   ->  Seq Scan on public.tasks_mm_workitems  (cost=0.00..53488.94
rows=3702994 width=8) (actual time=0.040..1376.084 rows=3702994 loops=1)
 Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
 Buffers: shared read=16459
   ->  Hash  (cost=59780.19..59780.19 rows=1373719 width=237) (actual
time=184361.874..184361.875 rows=1373737 loops=1)
 Output: workitem_ids.workitem_id, workitem_ids.workitem_n
 Buckets: 4096  Batches: 512  Memory Usage: 759kB
 Buffers: shared hit=15068 read=30975, temp written=43092
 ->  Seq Scan on public.workitem_ids  (cost=0.00..59780.19
rows=1373719 width=237) (actual time=0.026..1912.312 rows=1373737 loops=1)
   Output: workitem_ids.workitem_id, workitem_ids.workitem_n
   Buffers: shared hit=15068 read=30975
 Settings: effective_cache_size = '500MB', enable_memoize = 'off',
hash_mem_multiplier = '1', max_parallel_workers_per_gather = '1', work_mem
= '1MB'
 Planning:
   Buffers: shared hit=2 read=6
 Planning Time: 0.568 ms
 Execution Time: 361106.876 ms
(20 rows)


# EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM
tasks_mm_workitems NATURAL JOIN workitem_ids;

QUERY PLAN
---
 Merge Join  (cost=609453.49..759407.78 rows=3702994 width=241) (actual 
time=5062.513..10866.313 rows=3702994 loops=1)
   Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n, 
workitem_ids.workitem_id
   Merge Cond: (workitem_ids.workitem_n = tasks_mm_workitems.workitem_n)
   Buffers: shared hit=5343 read=66053, temp read=32621 written=32894
   ->  Index Scan using workitem_ids_pkey on public.workitem_ids
(cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 
rows=1373737 loops=1)
 Output: workitem_ids.workitem_n, workitem_ids.workitem_id
 Buffers: shared hit=5310 read=49627
   ->  Materialize  (cost=609372.91..627887.88 rows=3702994 width=8) (actual 
time=5062.389..7392.640 rows=3702994 loops=1)
 Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
 Buffers: shared hit=33 read=16426, temp read=32621 written=32894
 ->  Sort  (cost=609372.91..618630.40 rows=3702994 width=8) (actual 
time=5062.378..6068.703 rows=3702994 loops=1)
   Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n
   Sort Key: tasks_mm_workitems.workitem_n
   Sort Method: external merge  Disk: 65256kB
   Buffers: shared hit=33 read=16426, temp read=32621 written=32894
   ->  Seq Scan on public.tasks_mm_workitems
(cost=0.00..53488.94 rows=3702994 width=8) (actual time=0.045..1177.202 
rows=3702994 loops=1)
 Output: tasks_mm_workitems.workitem_n, 
tasks_mm_workitems.task_n
 Buffers: shared hit=33 read=16426
 Settings: effective_cache_size = '500MB', enable_hashjoin = 'off', 
enable_memoize = 'off', hash_mem_multiplier = '1', 
max_parallel_workers_per_gather = '1', work_mem = '1MB'
 Planning:
   Buffers: shared hit=8
 Planning Time: 0.677 ms
 Execution Time: 13364.545 ms
(23 rows)


Thank you in advance,
Dim

Re: gexec from command prompt?

2023-01-12 Thread David G. Johnston
On Thu, Jan 12, 2023 at 10:34 AM Pavel Stehule 
wrote:

>
> čt 12. 1. 2023 v 18:25 odesílatel Ron  napsal:
>
>>
>> Removing "\\exec" from the statement, and appending -c "\\gexec" to the
>> psql
>> command technically worked, but did not run the commands.
>>
>
> I don't know why, but \g* commands don't work from the -c option. But in
> this case it is not necessary
>
>
Well, the -c option states:

command must be either a command string that is completely parsable by the
server (i.e., it contains no psql-specific features), or a single backslash
command. Thus you cannot mix SQL and psql meta-commands within a -c option.

Thus any meta-command that interacts with server-parsed SQL is rendered
useless in -c

David J.


Re: Why is a hash join preferred when it does not fit in work_mem

2023-01-12 Thread David Rowley
On Fri, 13 Jan 2023 at 07:33, Dimitrios Apostolou  wrote:
>
> I have a very simple NATURAL JOIN that does not fit in the work_mem.  Why
> does the query planner prefer a hash join that needs 361s, while with a
> sort operation and a merge join it takes only 13s?

It's a simple matter of that the Hash Join plan appears cheaper based
on the costs that the planner has calculated.

A better question to ask would be, where are the costs inaccurate? and why.

One thing I noticed in your EXPLAIN ANALYZE output is that the Index
Scan to workitems_ids costed more expensively than the Seq scan, yet
was faster.

> ->  Seq Scan on public.workitem_ids  (cost=0.00..59780.19 rows=1373719 
> width=237) (actual time=0.026..1912.312 rows=1373737 loops=1)

> ->  Index Scan using workitem_ids_pkey on public.workitem_ids 
> (cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 
> rows=1373737 loops=1)

Perhaps the Seq scan is doing more actual I/O than the index scan is.

> The low work_mem and the disabled memoization are set on purpose, in order
> to simplify a complex query, while reproducing the same problem that I
> experienced there. This result is the simplest query I could get, where
> the optimizer does not go for a faster merge join.
>
> From my point of view a merge join is clearly faster, because the hash
> table does not fit in memory and I expect a hash join to do a lot of
> random I/O. But the query planner does not see that, and increasing
> random_page_cost does not help either. In fact the opposite happens: the
> merge join gets a higher cost difference to the hash join, as I increase
> the random page cost!

I'd expect reducing random_page_cost to make the Mege Join cheaper as
that's where the Index Scan is. I'm not quite sure where you think the
random I/O is coming from in a batched hash join.

It would be interesting to see the same plans with SET track_io_timing
= on; set.  It's possible that there's less *actual* I/O going on with
the Merge Join plan vs the Hash Join plan.  Since we do buffered I/O,
without track_io_timing, we don't know if the read buffers resulted in
an actual disk read or a read from the kernel buffers.

David