Re: Peer authentication failed ???

2023-10-04 Thread Amn Ojee Uw

Yes!
Thanks so much it worked!

On 10/3/23 9:37 a.m., Peter J. Holzer wrote:

On 2023-10-03 05:55:51 -0400, Amn Ojee Uw wrote:

psql -U my_group_worker -d my_group_db

psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432"
failed: FATAL:  Peer authentication failed for user "my_group_worker"

*** What am I doing wrong?

PostgreSQL uses the first matching rule from pg_hba.conf. Presumably
(because that's the default on Debian/Ubuntu) you have it set up to use
peer authentication on the unix socket and password authentication
(scram or md5) on everything else.

You are connecting via the Unix socket (/var/run/postgresql/.s.PGSQL.5432),
so the server will attempt peer authentication and nothing else. To get
it to attempt password authentication connect via a tcp socket:

psql -U my_group_worker -h localhost -d my_group_db

(More conveniently set up ident authentication for your users, then you
don't need a password.)

 hp






Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Marian Wendt

Am 04.10.2023 um 05:36 schrieb Steve Crawford:



On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore 
 wrote:


Hi all,

I'm attempting to generate some reports using user-driven
timezones for UTC data, and I'm having trouble writing a query
that meets the following criteria:

1) Data should be averaged at one of daily, hourly, or 15 minute
granularities (user-driven).
2) Data over a given period should reflect the localized value of
the UTC data for a user-provided time zone. E.g.:
    a) The 1 hour period starting at '2023-10-03
12:00:00.00-0400' (America/New_York) should reflect data with
timestamps between '2023-10-03 16:00:00.00Z' and '2023-10-03
17:00:00.00Z'.
    b) The 1 day period starting at '2023-10-03
00:00:00.00-0400' should reflect data with timestamps between
'2023-10-03 04:00:00.00Z' and '2023-10-04 04:00:00.00Z'.
3) When a period interacts with a DST change in the given
timezone, the data should not be clumped together. E.g.:
    a) Data points occurring at 2023-11-05 05:30:00.00Z and
2023-11-05 06:30:00.00Z should be treated as falling into
separate buckets when the time zone is America/New_York:
(2023-11-05 01:30:00.00-0400 and 2023-11-05
01:30:00.00-0500, respectively). This should be true for
either the 15 minute or 1 hour intervals.
    b) Some clumping for day resolution seems ok! E.g. the 1 day
period starting at '2023-11-05 00:00:00.00-0400' can and
probably should contain 25 hours' worth of data. Certainly it
should not reflect the data falling between '2023-11-05
00:04:00.00Z' and '2023-11-05 00:04:00.00Z' + '24
hours'::interval (= '2023-11-06 00:04:00.00Z'), because that
would be the local times of '2023-11-05 00:00:00.00-0400' and
'2023-11-04 23:00:00.00-0500'.
4) It would be relatively simple to do 15 minute and 1 hour
periods were all timezone offsets multiples of 1 hour (in that
case, all operations could be done in UTC and then converted after
the fact), but unfortunately some time zones have 30 min-based
offsets, which interferes with this approach.
5) Ideally, the solution would not involve messing with the
server/connection's value of timezone. (Though I would be
interested if there was a solution that relaxed this constraint
and was relatively safe/compatible with transactions and psycopg2.)
6) Ideally, my query would return periods that are missing data
(though could plausibly fill these in in the layer above). This
points toward generate_series but...
7) Sigh: I can't upgrade to 16. I gather that 16 has a timezone
parameter for generate_series, which I believe might help. But
tragically, Digitalocean doesn't yet support 16, and it's not
practical to migrate elsewhere. Based on historical release ->
support timing, I'd imagine they will not support it until Q2
2024, which is too late for this feature. If anyone had the inside
scoop about when they'd likely support it, I'd welcome it!

This looks pretty hairy written out as above, but I actually think
it reflects most people's intuitions about what data a local
period "should" correspond to (though I'd welcome feedback on this
point).

Here are some thoughts about approaches that I've tried, and what
their drawbacks seem to be. For all these, I'll use the following
CTE to demonstrate some data that crosses a DST boundary:

```
with original_data as (
    select
        ('2023-11-05 00:00:00.00Z'::timestamptz) + (15 * x ||
' minutes')::interval as "t"
    from
        generate_series(0, 1000) as x
)
```

1) date_trunc: it seems like as of v12, date_trunc accepts a third
argument of timezone, which essentially plays the role of the
server timezone setting for the scope of the function. This is
very handy, and *I believe* solves my issues for the hour/day periods:
```
[etc]
select
    date_trunc('day', t, 'America/New_York'),
    min(t),
    max(t),
    count(*)
from original_data
group by 1
order by 1;

       date_trunc       |          min           |  max          
| count

+++---
 2023-11-04 04:00:00+00 | 2023-11-05 00:00:00+00 | 2023-11-05
03:45:00+00 |    16
 2023-11-05 04:00:00+00 | 2023-11-05 04:00:00+00 | 2023-11-06
04:45:00+00 |   100
 2023-11-06 05:00:00+00 | 2023-11-06 05:00:00+00 | 2023-11-07
04:45:00+00 |    96
 2023-11-07 05:00:00+00 | 2023-11-07 05:00:00+00 | 2023-11-08
04:45:00+00 |    96
 2023-11-08 05:00:00+00 | 2023-11-08 05:00:00+00 | 2023-11-09
04:45:00+00 |    96
 2023-11-09 05:00:00+00 | 2023-11-09 05:00:00+00 | 2023-11-10
04:45:00+00 |    96

[etc]
```

This checks out,

Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Tom Lane
Steve Crawford  writes:
> On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore 
> wrote:
>> 5) Ideally, the solution would not involve messing with the
>> server/connection's value of timezone. (Though I would be interested if
>> there was a solution that relaxed this constraint and was relatively
>> safe/compatible with transactions and psycopg2.)

> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.

Expanding on that philosophy: you should be able to set the timezone
locally within a function, so that it wouldn't be that hard to make a
wrapper for generate_series that emulates the 4-argument version added
in v16.

Rather than messing with manually saving and restoring the prevailing
zone, I'd let the function SET infrastructure do it for me.  Sadly,
that SET clause only takes literal constant arguments, so it'd go
roughly like this:

create function generate_series(timestamptz, timestamptz, interval, text)
returns setof timestamptz
strict immutable language plpgsql as
$$
begin
  perform set_config('timezone', $4, true);
  return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

Setting the zone to UTC is a useless step, but that triggers
restoring the previous zone when the function exits; simpler
and probably faster than coding the save/restore explicitly.

Side note: whether this is really "immutable" is a matter for
debate, since time zone definitions tend to change over time.
But we chose to mark the new 4-argument version that way,
so you might as well do so too.

regards, tom lane




Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Marian Wendt


Am 04.10.2023 um 16:11 schrieb Tom Lane:

Steve Crawford  writes:

On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore
wrote:

5) Ideally, the solution would not involve messing with the
server/connection's value of timezone. (Though I would be interested if
there was a solution that relaxed this constraint and was relatively
safe/compatible with transactions and psycopg2.)

Note that setting the time zone is a client/connection setting so if you
set it within a transaction, it will stay set when the transaction
concludes. But time manipulation is tricky and trying to DIY reinvent the
wheel is painful and often buggy. Let PostgreSQL do the work for you.

Expanding on that philosophy: you should be able to set the timezone
locally within a function, so that it wouldn't be that hard to make a
wrapper for generate_series that emulates the 4-argument version added
in v16.

Rather than messing with manually saving and restoring the prevailing
zone, I'd let the function SET infrastructure do it for me.  Sadly,
that SET clause only takes literal constant arguments, so it'd go
roughly like this:

create function generate_series(timestamptz, timestamptz, interval, text)
returns setof timestamptz
strict immutable language plpgsql as
$$
begin
   perform set_config('timezone', $4, true);
   return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

Setting the zone to UTC is a useless step, but that triggers
restoring the previous zone when the function exits; simpler
and probably faster than coding the save/restore explicitly.

Side note: whether this is really "immutable" is a matter for
debate, since time zone definitions tend to change over time.
But we chose to mark the new 4-argument version that way,
so you might as well do so too.

regards, tom lane




As far as Lincoln describes it, the series is not the real problem here, 
but is just intended to be a simplified example of his actual data.The 
consideration that you can use the time zone using a function should 
apply here...The following SELECT should show as an example that the 
desired result can be achieved (TIMEZONE set to 'Etc/UTC'). Variations 
of date_bin for 15 minutes or 1 hour should work similarly...


SELECT
    sub.gs AS ts_in_utc
    ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
    ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', 
'2023-01-01')
FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, 
'2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub

WHERE
    sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND 
sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz


--
regards, marian wendt

Re: [EXT]Re: Strange error trying to import with Ora2PG

2023-10-04 Thread Adrian Klaver

On 10/4/23 06:24, Johnson, Bruce E - (bjohnson) wrote:
That’s not it. export_schema.sh writes the scripts, and then emits a 
ora2pg command to pull in the data (I've managed to do this a dozen 
times so far working out how to do things. :-)


The dates on the files in the sources directory are consistent with 
that, and don’t change when I run the script.


-rw-rw-r-- 1 johnson johnson 872 Sep 27 22:02 AHSC_SCHEDULE_ROUNDED_view.sql
-rw-rw-r-- 1 johnson johnson 817 Sep 27 22:02 COP_COLLEGE_HOUR_view.sql
-rw-rw-r-- 1 johnson johnson 628 Oct  3 17:06 
RESERVER_VISIBLE_RESTRICTION_view.sql
-rw-rw-r-- 1 johnson johnson 515 Sep 27 22:02 
RESOURCEAFFIL_WITHCODE_view.sql

-rw-rw-r-- 1 johnson johnson 535 Sep 27 22:02 UA_COLLEGE_LOOKUP_view.sql


Alright.

In your previous post you said "I tested the view creation with that 
code interactively, and it worked."


Does that mean you ran:

CREATE OR REPLACE VIEW reserver_visible_restriction 		(resource_id, 
resourcegroup_id, affil_id) AS select r.resource_id, 
rg.resourcegroup_id, a.affil_id FROM resources r, resourcegroupaffil rg, 
resourceaffil a where r.resource_id = a.resource_id and r.resource_id = 
rg.resource_id and r.requester_vis_only = '1’;


in a psql session by itself?


What happens if you do?:

psql ... -f RESERVER_VISIBLE_RESTRICTION_view.sql

When you run the import_all.sh does it show where it is pulling the 
scripts from?


If you search the source file directory for 
"reserver_visible_restriction" is there another file that contains it?






On Oct 3, 2023, at 2:27 PM, Adrian Klaver > wrote:


External Email

On 10/3/23 10:32, Johnson, Bruce E - (bjohnson) wrote:
I am trying to import an oracle schema with ora2pg and running into 
an odd error when the import_all.sh script gets to a specific view.

The error is:


However when I re-run the import_all.sh script it errors out at the 
same place with the same error (in fact the pasted error code is from 
running the script AFTER I fixed it). The sql isn’t being cached 
anywhere in the system is it?


I'm betting that every time you run import_all.sh it rewrites the 
scripts. Pretty sure if you look at the file the creation script will 
have been rewritten to its original from.



I know that I modified the correct file.
It happens even if I drop the database and recreate it.
--
Bruce Johnson
University of Arizona
College of Pharmacy
Information Technology Group
Institutions do not have opinions, merely customs


--
Adrian Klaver
adrian.kla...@aklaver.com 



--
Bruce Johnson
University of Arizona
College of Pharmacy
Information Technology Group

Institutions do not have opinions, merely customs



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
Thank you to all who have weighed in! Very much appreciated.

A few thoughts based on what I've read:

> As far as Lincoln describes it, the series is not the real problem here,
but is just intended to be a simplified example of his actual data.

This both is and isn't the case. I was using gneerate_series to create some
data for testing purposes, but I also would love to be able to use
generate_series for the logic as well.

> SELECT
> sub.gs AS ts_in_utc
> ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York',
> '2023-01-01')
> FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz,
> '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> WHERE
> sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND
> sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz

I believe this query will be funky around DST borders, because `sub.gs AT
TIME ZONE 'America/New_York'` will be localized in a way that erases the
difference between hours with different offsets, which are genuinely
different. For instance, I ran this and there are two rows within it that
look like:

` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
and
` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`

I think that the non-unique second column will pose an issue for the date
binning at a resolution finer than 1 day.

> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.

Yeah, I'm definitely interested in keeping as much of the DST stuff outside
my code as possible. I guess my concern is that I think there are other
places where this database is being used in my codebase that may rely on
the database time setting being UTC (or really, GMT, though I don't think
there's a difference for these purposes). It would be best if all of my
application's code declared its intentions about the time zone of the
database connection before running its query, but I don't think that's a
feasible change to make right now. That's what's motivating my pursuit of
finding a way to write these queries without changing this setting, through
appropriate casting and such.

> create function generate_series(timestamptz, timestamptz, interval, text)
> returns setof timestamptz
strict immutable language plpgsql as
$$
begin
perform set_config('timezone', $4, true);
return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

This is a nice suggestion, and in fact, it would be fine from my
perspective to reset to UTC every time. My concern is only around the
safety of the final `set timezone`. Under what circumstances/when can I
count on that being set? E.g. if a query using that function was cancelled
before finishing, would the connection timezone remain as $4? I guess the
3rd parameter to set_config is `is_local` (based on
https://pgpedia.info/s/set_config.html). Does that mean I could run this
outside this context of a function, and expect the setting to go back to
UTC on a rollback? Apologies if these are naive questions.

Thanks again for all the help.

Best,
Lincoln


On Wed, Oct 4, 2023 at 11:09 AM Marian Wendt  wrote:

>
> Am 04.10.2023 um 16:11 schrieb Tom Lane:
>
> Steve Crawford  
>  writes:
>
> On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore  
> 
> wrote:
>
> 5) Ideally, the solution would not involve messing with the
> server/connection's value of timezone. (Though I would be interested if
> there was a solution that relaxed this constraint and was relatively
> safe/compatible with transactions and psycopg2.)
>
> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.
>
> Expanding on that philosophy: you should be able to set the timezone
> locally within a function, so that it wouldn't be that hard to make a
> wrapper for generate_series that emulates the 4-argument version added
> in v16.
>
> Rather than messing with manually saving and restoring the prevailing
> zone, I'd let the function SET infrastructure do it for me.  Sadly,
> that SET clause only takes literal constant arguments, so it'd go
> roughly like this:
>
> create function generate_series(timestamptz, timestamptz, interval, text)
> returns setof timestamptz
> strict immutable language plpgsql as
> $$
> begin
>   perform set_config('timezone', $4, true);
>   return query select generate_series($1, $2, $3);
> end
> $$ set timezone = 'UTC';
>
> Setting the zone to UTC is a useless step, but that triggers
> restoring the previous zone when the function 

Re: [EXT]Re: Strange error trying to import with Ora2PG

2023-10-04 Thread Johnson, Bruce E - (bjohnson)
That’s not it. export_schema.sh writes the scripts, and then emits a ora2pg 
command to pull in the data (I've managed to do this a dozen times so far 
working out how to do things. :-)

The dates on the files in the sources directory are consistent with that, and 
don’t change when I run the script.

-rw-rw-r-- 1 johnson johnson 872 Sep 27 22:02 AHSC_SCHEDULE_ROUNDED_view.sql
-rw-rw-r-- 1 johnson johnson 817 Sep 27 22:02 COP_COLLEGE_HOUR_view.sql
-rw-rw-r-- 1 johnson johnson 628 Oct  3 17:06 
RESERVER_VISIBLE_RESTRICTION_view.sql
-rw-rw-r-- 1 johnson johnson 515 Sep 27 22:02 RESOURCEAFFIL_WITHCODE_view.sql
-rw-rw-r-- 1 johnson johnson 535 Sep 27 22:02 UA_COLLEGE_LOOKUP_view.sql



On Oct 3, 2023, at 2:27 PM, Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:

External Email

On 10/3/23 10:32, Johnson, Bruce E - (bjohnson) wrote:
I am trying to import an oracle schema with ora2pg and running into an odd 
error when the import_all.sh script gets to a specific view.
The error is:

However when I re-run the import_all.sh script it errors out at the same place 
with the same error (in fact the pasted error code is from running the script 
AFTER I fixed it). The sql isn’t being cached anywhere in the system is it?

I'm betting that every time you run import_all.sh it rewrites the scripts. 
Pretty sure if you look at the file the creation script will have been 
rewritten to its original from.

I know that I modified the correct file.
It happens even if I drop the database and recreate it.
--
Bruce Johnson
University of Arizona
College of Pharmacy
Information Technology Group
Institutions do not have opinions, merely customs

--
Adrian Klaver
adrian.kla...@aklaver.com


--
Bruce Johnson
University of Arizona
College of Pharmacy
Information Technology Group

Institutions do not have opinions, merely customs



Re: [EXT]Re: Strange error trying to import with Ora2PG

2023-10-04 Thread Johnson, Bruce E - (bjohnson)


On Oct 4, 2023, at 8:38 AM, Adrian Klaver  wrote:



If you search the source file directory for "reserver_visible_restriction" is 
there another file that contains it?



Sigh…that was it…Thank you.

There are two ‘views’ directories, one under ’sources/views’ and the other 
under /schema/views. Clearly the script is running the other one.


~/oracle_migration$ tree calendar3

calendar3

├── config

│   ├── ora2pg.conf

│   └── ora2pg.conf.orig


SNIP


├── schema

│   ├── dblinks

│


SNIP


│   └── views

│   ├── AHSC_SCHEDULE_ROUNDED_view.sql

│   ├── COP_COLLEGE_HOUR_view.sql

│   ├── RESERVER_VISIBLE_RESTRICTION_view.sql

│   ├── RESOURCEAFFIL_WITHCODE_view.sql

│   ├── UA_COLLEGE_LOOKUP_view.sql

│   └── view.sql

└── sources



SNIP


└── views

├── AHSC_SCHEDULE_ROUNDED_view.sql

├── COP_COLLEGE_HOUR_view.sql

├── RESERVER_VISIBLE_RESTRICTION_view.sql

├── RESOURCEAFFIL_WITHCODE_view.sql

├── UA_COLLEGE_LOOKUP_view.sql

└── view.sql


29 directories, 127 files


--
Bruce Johnson
University of Arizona
College of Pharmacy
Information Technology Group

Institutions do not have opinions, merely customs




I'm using both window and agg, is this expected sorting behavior?

2023-10-04 Thread jinser
Hello Friends,
I encountered a behavior that confused me when using window function
and group aggregate at the same time.

simple table:
CREATE TABLE t (a int, b int);

add some data to represent the behavior:
insert into t values
(1, 39),
(1, 95),
(2, 48),
(3, 87),
(4, 19),
(4, 78),
(4, 53);

When I execute:

-- query #1
select
a,
row_number() over (partition by a order by a)
from t
group by a;

| a   | row_number |
| --- | -- |
| 1   | 1  |
| 2   | 1  |
| 3   | 1  |
| 4   | 1  |

everything works as usual, but if I add desc sorting in over window:

-- query #2
select
a,
row_number() over (partition by a order by a desc)
from t
group by a;

| a   | row_number |
| --- | -- |
| 4   | 1  |
| 3   | 1  |
| 2   | 1  |
| 1   | 1  |

the sorting of the entire table also follows the constraints (desc) in over.

This is what confuses me, I've tried many versions of pgsql, at least
including 11-16, and the behavior is consistent with the above; but I
feel that maybe the order by in the window function should not affect
the final result.
Also, I don't know if I can mention this, but the results of both
queries tested on mysql 8.0 are the same.

I searched the mailing list for a while but couldn't find it. If
anyone has already mentioned this, sorry, please feel free to point it
out.




Re: I'm using both window and agg, is this expected sorting behavior?

2023-10-04 Thread Tom Lane
jinser  writes:
> This is what confuses me, I've tried many versions of pgsql, at least
> including 11-16, and the behavior is consistent with the above; but I
> feel that maybe the order by in the window function should not affect
> the final result.

Since you have not specified an ORDER BY for the overall query
result, the implementation is entitled to return the rows in
any order it pleases.  Ours happens to sort by the window ordering
before computing the window functions, so that's what you get.
In this case anyway --- you shouldn't rely on that.

regards, tom lane




Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Tom Lane
Lincoln Swaine-Moore  writes:
>>> create function generate_series(timestamptz, timestamptz, interval, text)
>>> returns setof timestamptz
>>> strict immutable language plpgsql as
>>> $$
>>> begin
>>> perform set_config('timezone', $4, true);
>>> return query select generate_series($1, $2, $3);
>>> end
>>> $$ set timezone = 'UTC';

> This is a nice suggestion, and in fact, it would be fine from my
> perspective to reset to UTC every time. My concern is only around the
> safety of the final `set timezone`. Under what circumstances/when can I
> count on that being set? E.g. if a query using that function was cancelled
> before finishing, would the connection timezone remain as $4?

No.  The function call mechanism will ensure that timezone goes back
to its previous state at function exit.  (In the case of an error
exit, that's actually handled by the transaction abort logic, but the
result is the same.)  Because of that, I think it doesn't really
matter whether the set_config call says "true" or "false", but saying
that it's a local setting seems less confusing.

> Does that mean I could run this
> outside this context of a function, and expect the setting to go back to
> UTC on a rollback?

An actual rollback would undo the effects of set_config, yes.  You
only need this function wrapper to ensure that subsequent operations
in the same transaction don't see the setting change.

regards, tom lane




Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
> No.  The function call mechanism will ensure that timezone goes back
> to its previous state at function exit.

> An actual rollback would undo the effects of set_config, yes.  You
> only need this function wrapper to ensure that subsequent operations
> in the same transaction don't see the setting change.

Excellent, thank you. So just to be explicit here, I could either run this
function, or set/run my query/set back, with the same behavior/safety
guarantees as if I was using the generate_series function with timezone
from v16?


On Wed, Oct 4, 2023 at 12:54 PM Tom Lane  wrote:

> Lincoln Swaine-Moore  writes:
> >>> create function generate_series(timestamptz, timestamptz, interval,
> text)
> >>> returns setof timestamptz
> >>> strict immutable language plpgsql as
> >>> $$
> >>> begin
> >>> perform set_config('timezone', $4, true);
> >>> return query select generate_series($1, $2, $3);
> >>> end
> >>> $$ set timezone = 'UTC';
>
> > This is a nice suggestion, and in fact, it would be fine from my
> > perspective to reset to UTC every time. My concern is only around the
> > safety of the final `set timezone`. Under what circumstances/when can I
> > count on that being set? E.g. if a query using that function was
> cancelled
> > before finishing, would the connection timezone remain as $4?
>
> No.  The function call mechanism will ensure that timezone goes back
> to its previous state at function exit.  (In the case of an error
> exit, that's actually handled by the transaction abort logic, but the
> result is the same.)  Because of that, I think it doesn't really
> matter whether the set_config call says "true" or "false", but saying
> that it's a local setting seems less confusing.
>
> > Does that mean I could run this
> > outside this context of a function, and expect the setting to go back to
> > UTC on a rollback?
>
> An actual rollback would undo the effects of set_config, yes.  You
> only need this function wrapper to ensure that subsequent operations
> in the same transaction don't see the setting change.
>
> regards, tom lane
>


-- 
Lincoln Swaine-Moore


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Tom Lane
Lincoln Swaine-Moore  writes:
> Excellent, thank you. So just to be explicit here, I could either run this
> function, or set/run my query/set back, with the same behavior/safety
> guarantees as if I was using the generate_series function with timezone
> from v16?

Yeah.  One thing to keep in mind is that that might have different
behavior in terms of the evaluation of the arguments to the function,
ie which timezone setting is your input parsed according to.

regards, tom lane




Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
> Yeah.  One thing to keep in mind is that that might have different
> behavior in terms of the evaluation of the arguments to the function,
> ie which timezone setting is your input parsed according to.

I see. You mean, in the event that it doesn't conform to an entry in
`pg_timezone_names`? I do have control over the possible options the user
can provide, so it should be straightforward to make sure those all have
entries.

Thanks,
Lincoln


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Tom Lane
Lincoln Swaine-Moore  writes:
>> Yeah.  One thing to keep in mind is that that might have different
>> behavior in terms of the evaluation of the arguments to the function,
>> ie which timezone setting is your input parsed according to.

> I see. You mean, in the event that it doesn't conform to an entry in
> `pg_timezone_names`? I do have control over the possible options the user
> can provide, so it should be straightforward to make sure those all have
> entries.

If you mean that your input will always include an explicit zone
specification, then this doesn't affect you.  What I was thinking
about was that

select generate_series('2023-10-04 13:30', ...)

is going to mean different things depending on the zone setting
that prevails when that constant is parsed.

regards, tom lane




Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
> If you mean that your input will always include an explicit zone
> specification, then this doesn't affect you.  What I was thinking
> about was that
>
> select generate_series('2023-10-04 13:30', ...)

> is going to mean different things depending on the zone setting
> that prevails when that constant is parsed.

Gotcha--I thought you meant the timezone argument to the function ($4). I
can make sure that all the datetime arguments to the function are "with
timezone", so there should be no ambiguity. But I guess if I didn't, the
timezone given by $4 would be the one parsing the naive timestamps, which I
think would also be ok from my perspective.

Thanks!


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Steve Crawford
...

> Yeah, I'm definitely interested in keeping as much of the DST stuff
> outside my code as possible. I guess my concern is that I think there are
> other places where this database is being used in my codebase that may rely
> on the database time setting being UTC (or really, GMT, though I don't
> think there's a difference for these purposes). It would be best if all of
> my application's code declared its intentions about the time zone of the
> database connection before running its query, but I don't think that's a
> feasible change to make right now. That's what's motivating my pursuit of
> finding a way to write these queries without changing this setting, through
> appropriate casting and such.
> ...
>

 Really UTC (not that it matters for calculation purposes):
https://en.wikipedia.org/wiki/Coordinated_Universal_Time

As to "other places" in your (or others) code, provided that you are
storing and manipulating your data as timestamp WITH time zone (point in
time), it can be inserted, reported, displayed, calculated, etc. in
whatever time zone any client or piece of code prefers. But when displaying
data around DST changes, fall-back in particular, the offset must be
included in your output to disambiguate things like 01:30:00-07 from
01:30:00-08. Both are 1:30am but are one hour apart.

Cheers,
Steve


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Alban Hertroys


> On 4 Oct 2023, at 17:58, Lincoln Swaine-Moore  wrote:
> 
> > SELECT
> > sub.gs AS ts_in_utc
> > ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> > ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York', 
> > '2023-01-01')
> > FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz, 
> > '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> > WHERE
> > sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND 
> > sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz
> 
> I believe this query will be funky around DST borders, because `sub.gs AT 
> TIME ZONE 'America/New_York'` will be localized in a way that erases the 
> difference between hours with different offsets, which are genuinely 
> different. For instance, I ran this and there are two rows within it that 
> look like: 
> 
> ` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
> and 
> ` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
> 
> I think that the non-unique second column will pose an issue for the date 
> binning at a resolution finer than 1 day.

What I do in such cases is to add an extra column with the UTC timestamp to 
serve as a linear scale to the local timestamps. That also helps with ordering 
buckets in reports and such during DST changes (especially the ones where an 
hour repeats).

Filtering in the queries occurs on the UTC scale, with the local timestamps 
calculated back to UTC, so that it doesn’t matter whether the local time has 
23, 24, 25 or 24.5 or 23.5 or whatever number of hours on a date-range - it all 
maps back because UTC always has 24 hours.

Something that I also do is to create calendar tables and views for the 
buckets, with 2 timestamps per bucket: the start of the bucket and the start of 
the next bucket. That gives you a range to put actual timestamps between (not 
BETWEEN between, because that’s inclusive). You can store and index that, as 
opposed to generated results using generate_series - basically I materialise 
those.

For hours and quarter hours I found it to be fairly convenient to base a view 
on a join between a date calendar and an (quarter of an) hour per UTC day 
table, but materialising that with some indexes may perform better (at the cost 
of disk space). I do materialise that currently, but our database server 
doesn’t have a lot of memory so I’m often not hitting the cache and performance 
suffers a bit (infrastructure is about to change for the better though).

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
> Really UTC (not that it matters for calculation purposes)

Sorry, yes--I just meant that literally when I run `show timezone` on the
server in question, I get `GMT`.

> As to "other places" in your (or others) code, provided that you are
storing and manipulating your data as timestamp WITH time zone (point in
time)

Unfortunately this is not the case for my codebase/schema. The data is
stored WITHOUT. I glossed over this fact in my post, because they are in
fact UTC times that have had their (0) offset truncated, and prior to my
usage of any solution I can use `at time zone 'UTC'` to properly handle
them. But I don't think it's the case that in the application more
generally this is being handled gracefully, and so the behavior that is
currently (correctly) running is being propped up by the database's time
zone setting. In the long term, I think it might be appropriate to create a
new column that is simply `t at time zone 'UTC'`, and use that everywhere,
but that's not feasible for the task I'm handling right now.

Thanks!


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Lincoln Swaine-Moore
> What I do in such cases is to add an extra column with the UTC timestamp
to serve as a linear scale to the local timestamps. That also helps with
ordering buckets in reports and such during DST changes (especially the
ones where an hour repeats).

> For hours and quarter hours I found it to be fairly convenient to base a
view on a join between a date calendar and an (quarter of an) hour per UTC
day table, but materialising that with some indexes may perform better (at
the cost of disk space). I do materialise that currently, but our database
server doesn’t have a lot of memory so I’m often not hitting the cache and
performance suffers a bit (infrastructure is about to change for the better
though).

That's an interesting idea, but I'm not sure I fully understand. Assuming
you're aggregating data: what do you group by? For instance, at an hourly
resolution, if you group by both the UTC timestamp and the local one, you
might end up, say, dividing an hour-long bucket in two for time zones with
half-hour-based offsets, no?

Thanks for the detailed writeup! Definitely helpful to learn more about
what people are using in production to handle this sort of thing.

-- 
Lincoln Swaine-Moore


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Brent Wood
Slightly off topic, but has anyone tried TimescaleDB for timeseries databases?

The issues discussed here are still there as they apply to the underlying 
Postgres ORDBMS.

We solve the problem (around 4 billion records of instrument sensor readings) 
by using UTC for the "native" timestamp, and working in that. Even though we 
are ½ way around the world. The local times can easily be determined & applied 
if desired, but by standardising on the reference time zone at the start, 
things have "just worked", for around 15 years now.


Brent Wood

Principal Technician, Fisheries
NIWA
DDI:  +64 (4) 3860529

From: Lincoln Swaine-Moore 
Sent: Thursday, October 5, 2023 08:30
To: Alban Hertroys 
Cc: Marian Wendt ; pgsql-general 

Subject: Re: Strategies for converting UTC data to local windows for arbitrary 
resolutions and timezones

> What I do in such cases is to add an extra column with the UTC timestamp to 
> serve as a linear scale to the local timestamps. That also helps with 
> ordering buckets in reports and such during DST changes (especially the ones 
> where an hour repeats).

> For hours and quarter hours I found it to be fairly convenient to base a view 
> on a join between a date calendar and an (quarter of an) hour per UTC day 
> table, but materialising that with some indexes may perform better (at the 
> cost of disk space). I do materialise that currently, but our database server 
> doesn’t have a lot of memory so I’m often not hitting the cache and 
> performance suffers a bit (infrastructure is about to change for the better 
> though).

That's an interesting idea, but I'm not sure I fully understand. Assuming 
you're aggregating data: what do you group by? For instance, at an hourly 
resolution, if you group by both the UTC timestamp and the local one, you might 
end up, say, dividing an hour-long bucket in two for time zones with 
half-hour-based offsets, no?

Thanks for the detailed writeup! Definitely helpful to learn more about what 
people are using in production to handle this sort of thing.

--
Lincoln Swaine-Moore
[https://www.niwa.co.nz/static/niwa-2018-horizontal-180.png] 

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529

National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
Connect with NIWA: niwa.co.nz 
Facebook 
LinkedIn 
Twitter 
Instagram 
YouTube
To ensure compliance with legal requirements and to maintain cyber security 
standards, NIWA's IT systems are subject to ongoing monitoring, activity 
logging and auditing. This monitoring and auditing service may be provided by 
third parties. Such third parties can access information transmitted to, 
processed by and stored on NIWA's IT systems.
Note: This email is intended solely for the use of the addressee and may 
contain information that is confidential or subject to legal professional 
privilege. If you receive this email in error please immediately notify the 
sender and delete the email.


Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

2023-10-04 Thread Alban Hertroys


> On 4 Oct 2023, at 21:30, Lincoln Swaine-Moore  wrote:
> 
> > What I do in such cases is to add an extra column with the UTC timestamp to 
> > serve as a linear scale to the local timestamps. That also helps with 
> > ordering buckets in reports and such during DST changes (especially the 
> > ones where an hour repeats).
> 
> > For hours and quarter hours I found it to be fairly convenient to base a 
> > view on a join between a date calendar and an (quarter of an) hour per UTC 
> > day table, but materialising that with some indexes may perform better (at 
> > the cost of disk space). I do materialise that currently, but our database 
> > server doesn’t have a lot of memory so I’m often not hitting the cache and 
> > performance suffers a bit (infrastructure is about to change for the better 
> > though).
> 
> That's an interesting idea, but I'm not sure I fully understand. Assuming 
> you're aggregating data: what do you group by? For instance, at an hourly 
> resolution, if you group by both the UTC timestamp and the local one, you 
> might end up, say, dividing an hour-long bucket in two for time zones with 
> half-hour-based offsets, no? 
> 
> Thanks for the detailed writeup! Definitely helpful to learn more about what 
> people are using in production to handle this sort of thing.

Frankly, I haven’t had to deal with half-hour-based offsets since I got this 
idea. I’m using it with whole-hour-offsets, where it doesn’t affect bin 
boundaries.

I suppose you could enrich your data in a similar fashion by adding a (virtual) 
column with the (client) time zone offset, so you could group by local 
timestamp + offset. That’s not going to match index expressions though, I fear…

For sorting, UTC timestamps would probably still be a useful addition, but 
they’re simple to add by either converting back from the local timestamps or by 
taking the min and max of the UTC-based column on the above grouping. Both 
solutions require that offset, obviously.

Now of course there are only 2 hours a year where this happens. Our data 
scientists chose to ignore the problem for simplicity’s sake and be slightly 
off with their numbers on those dates.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Multiple inserts with two levels of foreign keys

2023-10-04 Thread Dow Drake
Hi,

I'm trying to write a postgresql script to replicate a hierarchical
structure in a live database into my development database, where I can
debug and test more easily.  I can extract the data from the live database
that needs to be inserted, but I'm having trouble writing the insertion
script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many
crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that
farm, then insert two deliveries for each of the the two crops so that in
the end, my tables look like this:
farms
id name
1  'Happy Valley Farm'

crops
id farm_idname
11 'corn'
21 'wheat'

delvieries
id   crop_idticket
1 1  '3124'
2 2  '3127'
3 1  '3133'
4 2  '3140'

It's important that the deliveries get assigned to the right crops.  I
think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to
adapt it to multiple records.

Thanks for any help on this!


Re: Multiple inserts with two levels of foreign keys

2023-10-04 Thread Ron
Frame challenge: why can't you just "\copy to" the dev database tables in 
the correct order, to satisfy foreign key requirements?


On 10/4/23 18:59, Dow Drake wrote:

Hi,

I'm trying to write a postgresql script to replicate a hierarchical 
structure in a live database into my development database, where I can 
debug and test more easily.  I can extract the data from the live database 
that needs to be inserted, but I'm having trouble writing the insertion script


Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many 
crops and a crop has many deliveries.


create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that 
farm, then insert two deliveries for each of the the two crops so that in 
the end, my tables look like this:

farms
id name
1  'Happy Valley Farm'

crops
id farm_id name
1    1 'corn'
2    1 'wheat'

delvieries
id   crop_id ticket
1 1  '3124'
2 2  '3127'
3 1  '3133'
4 2  '3140'

It's important that the deliveries get assigned to the right crops.  I 
think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to 
adapt it to multiple records.


Thanks for any help on this!




--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

2023-10-04 Thread Dow Drake
Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I
dump the three tables to CSV as you suggest (and write a script to extract
the relevant records from those CSV dumps in the correct order).  It might
be that in the dev database, the next generated key values are 199 for
farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases
are independent.

Just inserting the records in the same order doesn't take care of setting
the foreign key values correctly -- does it?  I think I'm really looking
for a solution more along the lines of the link in my original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron  wrote:

> Frame challenge: why can't you just "\copy to" the dev database tables in
> the correct order, to satisfy foreign key requirements?
>
> On 10/4/23 18:59, Dow Drake wrote:
>
> Hi,
>
> I'm trying to write a postgresql script to replicate a hierarchical
> structure in a live database into my development database, where I can
> debug and test more easily.  I can extract the data from the live database
> that needs to be inserted, but I'm having trouble writing the insertion
> script
>
> Here's a simplified version of the problem I'm trying to solve:
> There are three tables: farms, crops and deliveries where a farm has many
> crops and a crop has many deliveries.
>
> create table farms (
>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>name character varying(30)
> );
> create table crops (
>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>farm_id bigint not null
>name character varying(30)
> );
> create table deliveries (
>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>crop_id bigint not null
>ticket character varying(30)
> );
> I want to insert a farm record, then insert two crops associated with that
> farm, then insert two deliveries for each of the the two crops so that in
> the end, my tables look like this:
> farms
> id name
> 1  'Happy Valley Farm'
>
> crops
> id farm_idname
> 11 'corn'
> 21 'wheat'
>
> delvieries
> id   crop_idticket
> 1 1  '3124'
> 2 2  '3127'
> 3 1  '3133'
> 4 2  '3140'
>
> It's important that the deliveries get assigned to the right crops.  I
> think this post: https://dba.stackexchange.com/questions/199916
> gets close to what I need, but I haven't been able to figure out how to
> adapt it to multiple records.
>
> Thanks for any help on this!
>
>
>
> --
> Born in Arizona, moved to Babylonia.
>


Re: Multiple inserts with two levels of foreign keys

2023-10-04 Thread Ron
Ah.  We'd truncate all of the dev tables, then load a "slice" (for example, 
accounts 1 to 1, and all associated records from downstream tables; 
lots and lots of views!!) from the prod database.


On 10/4/23 20:50, Dow Drake wrote:

Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I 
dump the three tables to CSV as you suggest (and write a script to extract 
the relevant records from those CSV dumps in the correct order).  It might 
be that in the dev database, the next generated key values are 199 for 
farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases 
are independent.


Just inserting the records in the same order doesn't take care of setting 
the foreign key values correctly -- does it? I think I'm really looking 
for a solution more along the lines of the link in my original post.


Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron  wrote:

Frame challenge: why can't you just "\copy to" the dev database tables
in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:

Hi,

I'm trying to write a postgresql script to replicate a hierarchical
structure in a live database into my development database, where I
can debug and test more easily.  I can extract the data from the live
database that needs to be inserted, but I'm having trouble writing
the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has
many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with
that farm, then insert two deliveries for each of the the two crops
so that in the end, my tables look like this:
farms
id name
1 'Happy Valley Farm'

crops
id farm_id    name
1 1 'corn'
2 1 'wheat'

delvieries
id crop_id    ticket
1 1  '3124'
2 2  '3127'
3 1  '3133'
4 2  '3140'

It's important that the deliveries get assigned to the right crops. 
I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how
to adapt it to multiple records.

Thanks for any help on this!




-- 
Born in Arizona, moved to Babylonia.




--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

2023-10-04 Thread Dow Drake
I see.  That would definitely work, but part of this for me is to get a
better understanding of PostgreSQL's capabilities.  I'm going to keep
working on a minimal solution that deletes no records from the dev
database, and only inserts the required records.

On Wed, Oct 4, 2023 at 6:58 PM Ron  wrote:

> Ah.  We'd truncate all of the dev tables, then load a "slice" (for
> example, accounts 1 to 1, and all associated records from
> downstream tables; lots and lots of views!!) from the prod database.
>
> On 10/4/23 20:50, Dow Drake wrote:
>
> Thanks for the reply, Ron!
> I'm not sure I see how to make your suggestion work, though.  Suppose I
> dump the three tables to CSV as you suggest (and write a script to extract
> the relevant records from those CSV dumps in the correct order).  It might
> be that in the dev database, the next generated key values are 199 for
> farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases
> are independent.
>
> Just inserting the records in the same order doesn't take care of setting
> the foreign key values correctly -- does it?  I think I'm really looking
> for a solution more along the lines of the link in my original post.
>
> Best,
> Dow
>
> On Wed, Oct 4, 2023 at 6:26 PM Ron  wrote:
>
>> Frame challenge: why can't you just "\copy to" the dev database tables in
>> the correct order, to satisfy foreign key requirements?
>>
>> On 10/4/23 18:59, Dow Drake wrote:
>>
>> Hi,
>>
>> I'm trying to write a postgresql script to replicate a hierarchical
>> structure in a live database into my development database, where I can
>> debug and test more easily.  I can extract the data from the live database
>> that needs to be inserted, but I'm having trouble writing the insertion
>> script
>>
>> Here's a simplified version of the problem I'm trying to solve:
>> There are three tables: farms, crops and deliveries where a farm has many
>> crops and a crop has many deliveries.
>>
>> create table farms (
>>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>>name character varying(30)
>> );
>> create table crops (
>>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>>farm_id bigint not null
>>name character varying(30)
>> );
>> create table deliveries (
>>id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
>>crop_id bigint not null
>>ticket character varying(30)
>> );
>> I want to insert a farm record, then insert two crops associated with
>> that farm, then insert two deliveries for each of the the two crops so that
>> in the end, my tables look like this:
>> farms
>> id name
>> 1  'Happy Valley Farm'
>>
>> crops
>> id farm_idname
>> 11 'corn'
>> 21 'wheat'
>>
>> delvieries
>> id   crop_idticket
>> 1 1  '3124'
>> 2 2  '3127'
>> 3 1  '3133'
>> 4 2  '3140'
>>
>> It's important that the deliveries get assigned to the right crops.  I
>> think this post: https://dba.stackexchange.com/questions/199916
>> gets close to what I need, but I haven't been able to figure out how to
>> adapt it to multiple records.
>>
>> Thanks for any help on this!
>>
>>
>>
>> --
>> Born in Arizona, moved to Babylonia.
>>
>
> --
> Born in Arizona, moved to Babylonia.
>


Re: Multiple inserts with two levels of foreign keys

2023-10-04 Thread Ron
Have you considered writing a stored procedure to process records that have 
been written to temporary tables?


0. Create temporary tables tmp_farms, tmp_crops and tmp_deliveries, which 
don't have id columns.

1. Truncate the three temporary tables
2. Insert into the temp tables a "set" of prod data.
3. Call a stored procedure in the dev database that does INSERT INTO ..., 
using RETURNING to get the relevant id values for the subsequent tables.

4. goto 1.


On 10/4/23 21:15, Dow Drake wrote:
I see.  That would definitely work, but part of this for me is to get a 
better understanding of PostgreSQL's capabilities.  I'm going to keep 
working on a minimal solution that deletes no records from the dev 
database, and only inserts the required records.


On Wed, Oct 4, 2023 at 6:58 PM Ron  wrote:

Ah.  We'd truncate all of the dev tables, then load a "slice" (for
example, accounts 1 to 1, and all associated records from
downstream tables; lots and lots of views!!) from the prod database.

On 10/4/23 20:50, Dow Drake wrote:

Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose
I dump the three tables to CSV as you suggest (and write a script to
extract the relevant records from those CSV dumps in the correct
order).  It might be that in the dev database, the next generated key
values are 199 for farm's id, 2145 for crop's id and 10242 for
deliveries' id.  The databases are independent.

Just inserting the records in the same order doesn't take care of
setting the foreign key values correctly -- does it?  I think I'm
really looking for a solution more along the lines of the link in my
original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron  wrote:

Frame challenge: why can't you just "\copy to" the dev database
tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:

Hi,

I'm trying to write a postgresql script to replicate a
hierarchical structure in a live database into my development
database, where I can debug and test more easily.  I can extract
the data from the live database that needs to be inserted, but
I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm
has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated
with that farm, then insert two deliveries for each of the the
two crops so that in the end, my tables look like this:
farms
id name
1 'Happy Valley Farm'

crops
id farm_id    name
1 1 'corn'
2 1 'wheat'

delvieries
id crop_id    ticket
1 1  '3124'
2 2  '3127'
3 1  '3133'
4 2  '3140'

It's important that the deliveries get assigned to the right
crops.  I think this post:
https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out
how to adapt it to multiple records.

Thanks for any help on this!




-- 
Born in Arizona, moved to Babylonia.




-- 
Born in Arizona, moved to Babylonia.




--
Born in Arizona, moved to Babylonia.

Index scan is not pushed down to union all subquery

2023-10-04 Thread Lauri Kajan
Hi,

Is it intended that indexes are not pushed down to union all
subqueries if even a single select contains a where clause? Is this
just not implemented, is it impossible to implement or am I doing
something wrong?

The following query does a SeqScan for "bikes" and "cars" tables even
though IndexScan on their column "dealer_name" would be a magnitude
faster. (Schema with sample data at the bottom of this mail)

WITH
  targets as (
select 'bike' vehicle, id, dealer_name FROM bikes WHERE frame_size = 52
union all
select 'car' vehicle, id, dealer_name FROM cars
-- In the real use case I have here dozens of tables
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
ON dealers.name = targets.dealer_name
  WHERE dealers.id in (54,12,456,315,468)


If the WHERE clause from the "bikes" subquery is removed then first
Index Scan on dealers_pk is made and then Bitmap Index Scans on
"bikes" and "cars" table using indexes on "dealer_name" columns.


---
 -- Available also at: https://www.db-fiddle.com/f/cEXt8HXSaQzsQ2yBDA2Z4H/7
CREATE TABLE dealers AS
SELECT
  id,
  (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM
generate_series(1, 4) WHERE id>0) name
FROM generate_series(1, 1000) AS id
;
ALTER TABLE dealers ADD primary key (id);
CREATE INDEX ON dealers(name);


CREATE TABLE bikes AS
SELECT
  generate_series AS id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT
(random()*1000)::int WHERE generate_series>0)) AS dealer_name,
  (random()*12+50)::int as frame_size
FROM generate_series(1, 10);
ALTER TABLE bikes ADD primary key (id);
CREATE INDEX ON bikes(dealer_name);


CREATE TABLE cars AS
SELECT
  generate_series as id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT
(random()*1000)::int WHERE generate_series>0)) AS dealer_name,
  (random()*7+14)::int as wheel_size
FROM generate_series(1, 10);
ALTER TABLE cars ADD primary key (id);
CREATE INDEX ON cars(dealer_name);


ANALYZE;


--
- Lauri