Re: timestamp and timestamptz

2020-04-16 Thread raf
David G. Johnston wrote:

> On Wed, Apr 15, 2020 at 4:53 PM raf  wrote:
> 
> > I don't see much difference in storing a timestamptz in UTC or a
> > timestamptz
> > in CET. As long as the intended offset from UTC is recorded (which it is
> > in a timestamptz) it should be fine.
> 
> I only really skimmed the entire response but this framing of how
> timestamptz stores results is wrong.  Once you've stored a timestamptz in
> PostgreSQL you no longer have any knowledge of the timezone.  If you truly
> need that you need to record that in a different field.  What you do know
> is that PostgreSQL has a known point-in-time in UTC and can give you back
> the same value expressed in any other timezone according to the rules in
> the timezone database.
> 
> Or, as written verbatim in the documentation:
> """
> For timestamp with time zone, the internally stored value is always in UTC
> (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> GMT). An input value that has an explicit time zone specified is converted
> to UTC using the appropriate offset for that time zone. If no time zone is
> stated in the input string, then it is assumed to be in the time zone
> indicated by the system's TimeZone parameter, and is converted to UTC using
> the offset for the timezone zone.
> """
> 
> https://www.postgresql.org/docs/12/datatype-datetime.html
> 
> David J.

You are misinterpreting the documentation, or the
documentation is incomplete/misleading at that
location. It doesn't just convert and store the time in
UTC. It stores the time in UTC and it also stores the
offset from UTC as determined by the time zone
specified on input (that section of the documentation
might not mention that fact but it is true
nonetheless).

I store localtimes in the database and always see the
offset when selecting it later. The timezone
information doesn't just disappear as you (or that
piece of documentation) might be suggesting.

If you don't believe me, try this:

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 
Australia/Sydney');
  select * from  example;
  drop table example;

When I run that, I see:

  CREATE TABLE
  INSERT 0 1
t   
  --
   2020-04-16 17:12:33.71768+10
  (1 row)
  
  DROP TABLE

So the timezone is stored (or the offset is stored if
you prefer). Had it been daylight savings on that date
in Sydney, the offset would have been "+11". It's all
fine.

cheers,
raf




Re: timestamp and timestamptz

2020-04-16 Thread Steve Baldwin
I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
time zone in a timestamptz column.

Try doing this before re-running your test:

set timezone to 'utc';

What you are seeing in your test is an artifact of that timezone setting.

Steve

On Thu, Apr 16, 2020 at 5:19 PM raf  wrote:

> David G. Johnston wrote:
>
> > On Wed, Apr 15, 2020 at 4:53 PM raf  wrote:
> >
> > > I don't see much difference in storing a timestamptz in UTC or a
> > > timestamptz
> > > in CET. As long as the intended offset from UTC is recorded (which it
> is
> > > in a timestamptz) it should be fine.
> >
> > I only really skimmed the entire response but this framing of how
> > timestamptz stores results is wrong.  Once you've stored a timestamptz in
> > PostgreSQL you no longer have any knowledge of the timezone.  If you
> truly
> > need that you need to record that in a different field.  What you do know
> > is that PostgreSQL has a known point-in-time in UTC and can give you back
> > the same value expressed in any other timezone according to the rules in
> > the timezone database.
> >
> > Or, as written verbatim in the documentation:
> > """
> > For timestamp with time zone, the internally stored value is always in
> UTC
> > (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> > GMT). An input value that has an explicit time zone specified is
> converted
> > to UTC using the appropriate offset for that time zone. If no time zone
> is
> > stated in the input string, then it is assumed to be in the time zone
> > indicated by the system's TimeZone parameter, and is converted to UTC
> using
> > the offset for the timezone zone.
> > """
> >
> > https://www.postgresql.org/docs/12/datatype-datetime.html
> >
> > David J.
>
> You are misinterpreting the documentation, or the
> documentation is incomplete/misleading at that
> location. It doesn't just convert and store the time in
> UTC. It stores the time in UTC and it also stores the
> offset from UTC as determined by the time zone
> specified on input (that section of the documentation
> might not mention that fact but it is true
> nonetheless).
>
> I store localtimes in the database and always see the
> offset when selecting it later. The timezone
> information doesn't just disappear as you (or that
> piece of documentation) might be suggesting.
>
> If you don't believe me, try this:
>
>   create table example (t timestamptz not null);
>   insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768
> Australia/Sydney');
>   select * from  example;
>   drop table example;
>
> When I run that, I see:
>
>   CREATE TABLE
>   INSERT 0 1
> t
>   --
>2020-04-16 17:12:33.71768+10
>   (1 row)
>
>   DROP TABLE
>
> So the timezone is stored (or the offset is stored if
> you prefer). Had it been daylight savings on that date
> in Sydney, the offset would have been "+11". It's all
> fine.
>
> cheers,
> raf
>
>
>


Re: timestamp and timestamptz

2020-04-16 Thread Pavel Stehule
čt 16. 4. 2020 v 9:19 odesílatel raf  napsal:

> David G. Johnston wrote:
>
> > On Wed, Apr 15, 2020 at 4:53 PM raf  wrote:
> >
> > > I don't see much difference in storing a timestamptz in UTC or a
> > > timestamptz
> > > in CET. As long as the intended offset from UTC is recorded (which it
> is
> > > in a timestamptz) it should be fine.
> >
> > I only really skimmed the entire response but this framing of how
> > timestamptz stores results is wrong.  Once you've stored a timestamptz in
> > PostgreSQL you no longer have any knowledge of the timezone.  If you
> truly
> > need that you need to record that in a different field.  What you do know
> > is that PostgreSQL has a known point-in-time in UTC and can give you back
> > the same value expressed in any other timezone according to the rules in
> > the timezone database.
> >
> > Or, as written verbatim in the documentation:
> > """
> > For timestamp with time zone, the internally stored value is always in
> UTC
> > (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> > GMT). An input value that has an explicit time zone specified is
> converted
> > to UTC using the appropriate offset for that time zone. If no time zone
> is
> > stated in the input string, then it is assumed to be in the time zone
> > indicated by the system's TimeZone parameter, and is converted to UTC
> using
> > the offset for the timezone zone.
> > """
> >
> > https://www.postgresql.org/docs/12/datatype-datetime.html
> >
> > David J.
>
> You are misinterpreting the documentation, or the
> documentation is incomplete/misleading at that
> location. It doesn't just convert and store the time in
> UTC. It stores the time in UTC and it also stores the
> offset from UTC as determined by the time zone
> specified on input (that section of the documentation
> might not mention that fact but it is true
> nonetheless).
>
> I store localtimes in the database and always see the
> offset when selecting it later. The timezone
> information doesn't just disappear as you (or that
> piece of documentation) might be suggesting.
>
> If you don't believe me, try this:
>
>   create table example (t timestamptz not null);
>   insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768
> Australia/Sydney');
>   select * from  example;
>   drop table example;
>
> When I run that, I see:
>
>   CREATE TABLE
>   INSERT 0 1
> t
>   --
>2020-04-16 17:12:33.71768+10
>   (1 row)
>
>   DROP TABLE
>
> So the timezone is stored (or the offset is stored if
> you prefer). Had it been daylight savings on that date
> in Sydney, the offset would have been "+11". It's all
> fine.
>

surely not. Postgres doesn't store offset. Every timestamptz is transalated
to UTC when it is parsed to binary form, and later translated to client
time zone when it is displayed.

Regards

Pavel



> cheers,
> raf
>
>
>


Re: timestamp and timestamptz

2020-04-16 Thread raf
Steve Baldwin wrote:

> I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
> time zone in a timestamptz column.
> 
> Try doing this before re-running your test:
> 
> set timezone to 'utc';
> 
> What you are seeing in your test is an artifact of that timezone setting.
> 
> Steve

Thanks. You're right.

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 
Australia/Sydney');
  select * from  example;
  set timezone to 'utc';
  select * from  example;
  drop table example;

Does this:

  CREATE TABLE
  INSERT 0 1
  t   
  --
   2020-04-16 17:12:33.71768+10
  (1 row)

  SET
t   
  --
   2020-04-16 07:12:33.71768+00
  (1 row)

  DROP TABLE

So it doesn't store the offset, but I've used postgres
for 12 years without knowing that and it hasn't been
a problem. Yay, postgres!

It doesn't store the offset but, by using timestamptz,
it knows that the timezone is UTC. That's what matters.
The fact that it knows the time zone is what makes everything
work. Timestamp without time zone is best avoided I think.

cheers,
raf







Re: timestamp and timestamptz

2020-04-16 Thread Magnus Hagander
On Thu, Apr 16, 2020 at 6:12 AM Niels Jespersen  wrote:

>
>
>
>
> *Fra:* Magnus Hagander 
> *Sendt:* 15. april 2020 20:05
> *Til:* Niels Jespersen 
> *Cc:* pgsql-general@lists.postgresql.org
> *Emne:* Re: timestamp and timestamptz
>
>
>
>
>
>
>
> On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen  wrote:
>
> Hello all
>
>
>
> We have some data that have entered a timestamp column from a csv. The
> data in the csv are in utc. We want to access the data in our native
> timezone (CET).
>
>
>
> I am considering a few alternatives:
>
>
>
> 1.   Early in the process, convert to timestamptz and keep this
> datatype.
>
> 2.   Early in the process, convert to timestamp as understood in
> CET.  This will imply by convention that the data in the timestamp column
> represents CET. Users will need to be told that data represents CET, even
> if data is somwhere in the future kept in another country in another
> timezone.
>
>
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably
> because we almost never have used timestamptz.
>
>
>
> Yes, you should.
>
>
>
>
>
> Can we agree that the below query is selecting both the original utc
> timestamps and 2 and 1 (as decribed above)?
>
>
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp
> read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz
> from t limit 10;
>
>
>
>
>
> As long as you use option 1:
>
>
>
> SELECT read_time
>
> will return the time in CET (as a timestamptz) after you've set timezone
> to 'cet'. If you set timezone to 'utc' it will directly return utc.
>
>
>
> SELECT read_time AT TIME ZONE 'utc'
>
> will return the time in UTC (as a timestamp)
>
>
>
>
>
> And just make sure you have done a "set time zone 'utc'" before you *load*
> the data, and everything should just work automatically.
>
>
>
> --
>
>  Magnus Hagander
>  Me: https://www.hagander.net/ 
>  Work: https://www.redpill-linpro.com/ 
>
>
>
>
>
> Thank you Magnus (and others) for your replies.
>
>
>
> The raw input data are in this, slightly strange format: 2019.05.01
> 00:00:00. No timezone indicator, just an informal guarantee from the
> supplier that it is indeed utc. And no real chance of changing the format.
> We know, from experience.
>
>
>
> The data volume is a bit high, a few billion rows pr month. So, table
> partitioning is very helpful (aka really indispensable). Data will be
> aggregated in several ways for analytics. Time aggregations must be
> according to our local timezone (cet). We do not want data from one day
> being aggregated into the wrong date because of timezone issues. This means
> that partition boundaries (monthly pratitions most often, sometimes day
> partitions) must be on CET-boundaries so that partition pruning will pull
> data from the relevant cet-month not the utc-month.
>
>
>
> Now, if I load data into a timestamptz with timezone set to utc, partition
> to cet-boundaries, query and aggredate with timezone set to cet, everything
> wil be ok, I think. My small testcase below shows that the row goes into
> the april-partition (as it should). The planner does the correct partition
> pruning according to specified filtering and set timezone. All good.
>
>
>
> create table t (t_id bigserial, ts timestamptz) partition by range (ts);
>
> create table t_2020_02 partition of t for values from ('2020-02-01
> 00:00:00+01') to ('2020-03-01 00:00:00+01');
>
> create table t_2020_03 partition of t for values from ('2020-03-01
> 00:00:00+01') to ('2020-04-01 00:00:00+02');
>
> create table t_2020_04 partition of t for values from ('2020-04-01
> 00:00:00+02') to ('2020-05-01 00:00:00+02');
>
>
>
> set timezone to 'utc';
>
> insert into t (ts) values('2020-03-31 23:30:00');
>
>
>
> Once again, thank you for invaluable feedback.
>
>
>

Yes, this should work just fine. The internal representation of timestamptz
is always UTC, and it's only converted on entry/exit.

You can see this clearly if you create your partitions like above, and then
do a "set timezone to 'America/Los_Angeles'" followed by \d+ t in psql.
This will now show you what the partition bounds are in that timezone.

You can also just specify the timestamps when you create your partition
without including the timezone (+01) specifier. In this PostgreSQL will
interpret it as whatever your current value for the timezone setting is, so
as long as it's CET it should work fine, and you don't have to remember
which months are in DST and which are not.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Recursive Queries

2020-04-16 Thread Alex Magnum
Hi,
I have a simple table with singup timestamps

What I would like to do is to create a table as shown below that displays
the counts per our for the past n dates.

I can do this with a function but is there an easy way to use recursive
queries?


* Counts per hour for given date*

*HR   2020-04-01  2020-04-02  ... 2020-04-10*00  38   33
   36
01  33   26  18
02  26   36  17
03  36   18  10
04  18   17   3
05  17   10   3
06  103   6
07   33  10
.36  13
.6   10  22
.   10   13  12
22  13   22   9
23  22   11   8

Thanks for any suggestions.
A


SV: timestamp and timestamptz

2020-04-16 Thread Niels Jespersen


Fra: Magnus Hagander 
Sendt: 16. april 2020 10:28
Til: Niels Jespersen 
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz



On Thu, Apr 16, 2020 at 6:12 AM Niels Jespersen 
mailto:n...@dst.dk>> wrote:


Fra: Magnus Hagander mailto:mag...@hagander.net>>
Sendt: 15. april 2020 20:05
Til: Niels Jespersen mailto:n...@dst.dk>>
Cc: 
pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz



On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen 
mailto:n...@dst.dk>> wrote:
Hello all

We have some data that have entered a timestamp column from a csv. The data in 
the csv are in utc. We want to access the data in our native timezone (CET).

I am considering a few alternatives:


1.   Early in the process, convert to timestamptz and keep this datatype.

2.   Early in the process, convert to timestamp as understood in CET.  This 
will imply by convention that the data in the timestamp column represents CET. 
Users will need to be told that data represents CET, even if data is somwhere 
in the future kept in another country in another timezone.

I probably should choose 1 over 2. But I am a bit hesitant, probably because we 
almost never have used timestamptz.

Yes, you should.


Can we agree that the below query is selecting both the original utc timestamps 
and 2 and 1 (as decribed above)?

set timezone to 'cet';
select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t 
limit 10;


As long as you use option 1:

SELECT read_time
will return the time in CET (as a timestamptz) after you've set timezone to 
'cet'. If you set timezone to 'utc' it will directly return utc.

SELECT read_time AT TIME ZONE 'utc'
will return the time in UTC (as a timestamp)


And just make sure you have done a "set time zone 'utc'" before you *load* the 
data, and everything should just work automatically.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Thank you Magnus (and others) for your replies.

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. 
No timezone indicator, just an informal guarantee from the supplier that it is 
indeed utc. And no real chance of changing the format. We know, from experience.

The data volume is a bit high, a few billion rows pr month. So, table 
partitioning is very helpful (aka really indispensable). Data will be 
aggregated in several ways for analytics. Time aggregations must be according 
to our local timezone (cet). We do not want data from one day being aggregated 
into the wrong date because of timezone issues. This means that partition 
boundaries (monthly pratitions most often, sometimes day partitions) must be on 
CET-boundaries so that partition pruning will pull data from the relevant 
cet-month not the utc-month.

Now, if I load data into a timestamptz with timezone set to utc, partition to 
cet-boundaries, query and aggredate with timezone set to cet, everything wil be 
ok, I think. My small testcase below shows that the row goes into the 
april-partition (as it should). The planner does the correct partition pruning 
according to specified filtering and set timezone. All good.

create table t (t_id bigserial, ts timestamptz) partition by range (ts);
create table t_2020_02 partition of t for values from ('2020-02-01 
00:00:00+01') to ('2020-03-01 00:00:00+01');
create table t_2020_03 partition of t for values from ('2020-03-01 
00:00:00+01') to ('2020-04-01 00:00:00+02');
create table t_2020_04 partition of t for values from ('2020-04-01 
00:00:00+02') to ('2020-05-01 00:00:00+02');

set timezone to 'utc';
insert into t (ts) values('2020-03-31 23:30:00');

Once again, thank you for invaluable feedback.


Yes, this should work just fine. The internal representation of timestamptz is 
always UTC, and it's only converted on entry/exit.

You can see this clearly if you create your partitions like above, and then do 
a "set timezone to 'America/Los_Angeles'" followed by \d+ t in psql. This will 
now show you what the partition bounds are in that timezone.

You can also just specify the timestamps when you create your partition without 
including the timezone (+01) specifier. In this PostgreSQL will interpret it as 
whatever your current value for the timezone setting is, so as long as it's CET 
it should work fine, and you don't have to remember which months are in DST and 
which are not.


--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Thank you, Magnus. Especially for the last hint.


Re: pg_restore: could not close data file: Success

2020-04-16 Thread Kyotaro Horiguchi
At Thu, 16 Apr 2020 14:40:09 +0900, Michael Paquier  wrote 
in 
> On Thu, Apr 16, 2020 at 12:08:09PM +0900, Kyotaro Horiguchi wrote:
> > I'm surprised to find an old thread about the same issue.
> > 
> > https://www.postgresql.org/message-id/20160307.174354.251049100.horiguchi.kyotaro%40lab.ntt.co.jp
> > 
> > But I don't think it's not acceptable that use fake errno for gzclose,
> > but cfclose properly passes-through the error code from gzclose, so it
> > is enought that the caller should recognize the difference.
> 
> A problem with this patch is that we may forget again to add this
> special error handling if more code paths use cfclose().

Definitely.  The reason for the patch is the error codes are diffrent
according to callers and some of callers don't even checking the error
(seemingly intentionally).

> As of HEAD, there are three code paths where cfclose() is called but
> it does not generate an error: two when ending a blob and one when
> ending a data file.  Perhaps it would make sense to just move all this
> error within the routine itself?  Note that it would also mean
> registering file names in lclContext or equivalent as that's an
> important piece of the error message.

Hmm. Sounds reasonable.  I'm going to do that.  Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Recursive Queries

2020-04-16 Thread Rob Northcott
From: Alex Magnum 
Sent: 16 April 2020 09:36
To: Postgres General 
Subject: Recursive Queries

Hi,
I have a simple table with singup timestamps

What I would like to do is to create a table as shown below that displays the 
counts per our for the past n dates.

I can do this with a function but is there an easy way to use recursive queries?


 Counts per hour for given date
HR   2020-04-01  2020-04-02  ... 2020-04-10
00  38   33  36
01  33   26  18
02  26   36  17
03  36   18  10
04  18   17   3
05  17   10   3
06  103   6
07   33  10
.36  13
.6   10  22
.   10   13  12
22  13   22   9
23  22   11   8

Thanks for any suggestions.
A




Wouldn’t it be easier to do it the other way round, with a column per hour and 
a row per date?  That way the query just needs one sub-query per hour to select 
just the count for that hour, and group by date.

Rob


Re: Recursive Queries

2020-04-16 Thread Julien Rouhaud
On Thu, Apr 16, 2020 at 2:49 PM Rob Northcott
 wrote:
>
> From: Alex Magnum 
>
> What I would like to do is to create a table as shown below that displays the 
> counts per our for the past n dates.
>
>
>
> I can do this with a function but is there an easy way to use recursive 
> queries?
>
>  Counts per hour for given date
>
> HR   2020-04-01  2020-04-02  ... 2020-04-10
> 00  38   33  36
> 01  33   26  18
> 02  26   36  17
> 03  36   18  10
> 04  18   17   3
> 05  17   10   3
> 06  103   6
> 07   33  10
> .36  13
> .6   10  22
> .   10   13  12
> 22  13   22   9
> 23  22   11   8
>
> Wouldn’t it be easier to do it the other way round, with a column per hour 
> and a row per date?  That way the query just needs one sub-query per hour to 
> select just the count for that hour, and group by date.

Note that you wouldn't need subqueries for that, the FILTER clause can
be used and is supported since version 9.4.




Re: Recursive Queries

2020-04-16 Thread Michael Lewis
You don't want recursion, you want pivot table (Excel) behavior to reformat
rows into columns. The easiest way to get this data in its raw form would
be to group by date and hour of day and compute the count.

If you have the option to add extensions in your environment, then you
should be able to pivot your data pretty simply.

https://www.postgresql.org/docs/12/tablefunc.html


Re: Recursive Queries

2020-04-16 Thread Edward Macnaghten
On 16/04/2020 09:35, Alex Magnum wrote:
> Hi,
> I have a simple table with singup timestamps
>
> What I would like to do is to create a table as shown below that
> displays the counts per our for the past n dates.

Various ways, but for me...

SELECT hour, SUM(CASE(WHEN date = date THEN 1 ELSE 0)), , 
WHERE whatever
GROUP BY hour

could be your friend






Re: Recursive Queries

2020-04-16 Thread Olivier Gautherot
Hi Alex,


On Thu, Apr 16, 2020 at 10:36 AM Alex Magnum  wrote:

> Hi,
> I have a simple table with singup timestamps
>
> What I would like to do is to create a table as shown below that displays
> the counts per our for the past n dates.
>
> I can do this with a function but is there an easy way to use recursive
> queries?
>
>
> * Counts per hour for given date*
>
> *HR   2020-04-01  2020-04-02  ... 2020-04-10*00  38   33
>  36
> 01  33   26  18
> 02  26   36  17
> 03  36   18  10
> 04  18   17   3
> 05  17   10   3
> 06  103   6
> 07   33  10
> .36  13
> .6   10  22
> .   10   13  12
> 22  13   22   9
> 23  22   11   8
>
> Thanks for any suggestions.
> A
>

You don't need subqueries. The WHEN statement can help you in this case (a
bit tedious to write but fast to run):

WITH q AS (select date_part('hour', ts) AS hr, ts::date AS mydate FROM
your_table_or_query)
SELECT hr,
sum(CASE WHEN mydate = '2020-04-01'::date THEN 1 ELSE 0 END),
sum(CASE WHEN mydate = '2020-04-02'::date THEN 1 ELSE 0 END),
...
FROM q ORDER BY hr;


Hope it helps
--
Olivier Gautherot
Tel: +33 6 02 71 92 23
https://www.linkedin.com/in/ogautherot/


Re: Recursive Queries

2020-04-16 Thread Edward Macnaghten
On 16/04/2020 14:36, Edward Macnaghten wrote:
> On 16/04/2020 09:35, Alex Magnum wrote:
>> Hi,
>> I have a simple table with singup timestamps
>>
>> What I would like to do is to create a table as shown below that
>> displays the counts per our for the past n dates.
SELECT hour, SUM(CASE(WHEN date = date THEN 1 ELSE 0 END)), , ...
WHERE whatever
GROUP BY hour

Could be your friend





possibilities for SQL optimization

2020-04-16 Thread Chris Stephens
PG12
RHEL 8

I suspect there's little I can do to get the following query to run
faster/more efficiently but thought I'd post to list and confirm.

Caveat: I'm coming from an Oracle background and am extremely wet behind
ears w/ respect to postgresql (please be kind :)).

Right now, we can't change the software generating the SQL though that is
very possible/likely in the future. For now, the query is what it is. I can
alter indexing, add table partitioning, or anything else that doesn't
change logical structure of table.

This is a very wide table holding astronomical data which will be used to
track changes in the sky to generate alerts for astronomers to evaluate.
The query has a variable number of "pixelId BETWEEN" predicates. As data
volume in DiaSource increases, performance of query decreases. I need to
confirm this but I suspect the primary reason for the decreased performance
is increased number of "pixelId BETWEEN" predicates generated by
application. Predicate count is the only thing that changes. I don't think
performance would otherwise degrade given execution plan.

[local] @z=# \dS+ "DiaSource"
Table
"public.DiaSource"
   Column|Type | Collation |
Nullable |Default| Storage | Stats target | Description
-+-+---+--+---+-+--+-
 diaSourceId | bigint  |   |
not null | '0'::bigint   | plain   |  |
 ccdVisitId  | bigint  |   |
not null | '0'::bigint   | plain   |  |
 diaObjectId | bigint  |   |
   | '0'::bigint   | plain   |  |
 ssObjectId  | bigint  |   |
   | '0'::bigint   | plain   |  |
 parentDiaSourceId   | bigint  |   |
   | '0'::bigint   | plain   |  |
 prv_procOrder   | integer |   |
not null | 0 | plain   |  |
 ssObjectReassocTime | timestamp without time zone |   |
   |   | plain   |  |
 midPointTai | double precision|   |
not null | '0'::double precision | plain   |  |
 ra  | double precision|   |
not null | '0'::double precision | plain   |  |
 raSigma | double precision|   |
not null | '0'::double precision | plain   |  |
 decl| double precision|   |
not null | '0'::double precision | plain   |  |
 declSigma   | double precision|   |
not null | '0'::double precision | plain   |  |
 ra_decl_Cov | double precision|   |
not null | '0'::double precision | plain   |  |
 x   | double precision|   |
not null | '0'::double precision | plain   |  |
 xSigma  | double precision|   |
not null | '0'::double precision | plain   |  |
 y   | double precision|   |
not null | '0'::double precision | plain   |  |
 ySigma  | double precision|   |
not null | '0'::double precision | plain   |  |
 x_y_Cov | double precision|   |
not null | '0'::double precision | plain   |  |
 apFlux  | double precision|   |
not null | '0'::double precision | plain   |  |
 apFluxErr   | double precision|   |
not null | '0'::double precision | plain   |  |
 snr | double precision|   |
not null | '0'::double precision | plain   |  |
 psFlux  | double precision|   |
   | '0'::double precision | plain   |  |
 psFluxSigma | double precision|   |
   | '0'::double precision | plain   |  |
 psRa| double precision|   |
   | '0'::double precision | plain   |  |
 psRaSigma   | double precision|   |
   | '0'::double precision | plain   |  |
 psDecl  | double precision|   |
   | '0'::double precision | plain   |  |
 psDeclSigma | double precision 

Re: possibilities for SQL optimization

2020-04-16 Thread Pavel Stehule
čt 16. 4. 2020 v 16:08 odesílatel Chris Stephens 
napsal:

> PG12
> RHEL 8
>
> I suspect there's little I can do to get the following query to run
> faster/more efficiently but thought I'd post to list and confirm.
>
> Caveat: I'm coming from an Oracle background and am extremely wet behind
> ears w/ respect to postgresql (please be kind :)).
>
> Right now, we can't change the software generating the SQL though that is
> very possible/likely in the future. For now, the query is what it is. I can
> alter indexing, add table partitioning, or anything else that doesn't
> change logical structure of table.
>
> This is a very wide table holding astronomical data which will be used to
> track changes in the sky to generate alerts for astronomers to evaluate.
> The query has a variable number of "pixelId BETWEEN" predicates. As data
> volume in DiaSource increases, performance of query decreases. I need to
> confirm this but I suspect the primary reason for the decreased performance
> is increased number of "pixelId BETWEEN" predicates generated by
> application. Predicate count is the only thing that changes. I don't think
> performance would otherwise degrade given execution plan.
>
> [local] @z=# \dS+ "DiaSource"
> Table
> "public.DiaSource"
>Column|Type | Collation |
> Nullable |Default| Storage | Stats target | Description
>
> -+-+---+--+---+-+--+-
>  diaSourceId | bigint  |   |
> not null | '0'::bigint   | plain   |  |
>  ccdVisitId  | bigint  |   |
> not null | '0'::bigint   | plain   |  |
>  diaObjectId | bigint  |   |
>| '0'::bigint   | plain   |  |
>  ssObjectId  | bigint  |   |
>| '0'::bigint   | plain   |  |
>  parentDiaSourceId   | bigint  |   |
>| '0'::bigint   | plain   |  |
>  prv_procOrder   | integer |   |
> not null | 0 | plain   |  |
>  ssObjectReassocTime | timestamp without time zone |   |
>|   | plain   |  |
>  midPointTai | double precision|   |
> not null | '0'::double precision | plain   |  |
>  ra  | double precision|   |
> not null | '0'::double precision | plain   |  |
>  raSigma | double precision|   |
> not null | '0'::double precision | plain   |  |
>  decl| double precision|   |
> not null | '0'::double precision | plain   |  |
>  declSigma   | double precision|   |
> not null | '0'::double precision | plain   |  |
>  ra_decl_Cov | double precision|   |
> not null | '0'::double precision | plain   |  |
>  x   | double precision|   |
> not null | '0'::double precision | plain   |  |
>  xSigma  | double precision|   |
> not null | '0'::double precision | plain   |  |
>  y   | double precision|   |
> not null | '0'::double precision | plain   |  |
>  ySigma  | double precision|   |
> not null | '0'::double precision | plain   |  |
>  x_y_Cov | double precision|   |
> not null | '0'::double precision | plain   |  |
>  apFlux  | double precision|   |
> not null | '0'::double precision | plain   |  |
>  apFluxErr   | double precision|   |
> not null | '0'::double precision | plain   |  |
>  snr | double precision|   |
> not null | '0'::double precision | plain   |  |
>  psFlux  | double precision|   |
>| '0'::double precision | plain   |  |
>  psFluxSigma | double precision|   |
>| '0'::double precision | plain   |  |
>  psRa| double precision|   |
>| '0'::double precision | plain   |  |
>  psRaSigma   | double precision|   |
>| '0'::double pr

Re: possibilities for SQL optimization

2020-04-16 Thread Chris Stephens
disastrous :)

Planning Time: 7.569 ms
Execution Time: 316969.474 ms

On Thu, Apr 16, 2020 at 9:23 AM Pavel Stehule 
wrote:

>
>
> čt 16. 4. 2020 v 16:08 odesílatel Chris Stephens 
> napsal:
>
>> PG12
>> RHEL 8
>>
>> I suspect there's little I can do to get the following query to run
>> faster/more efficiently but thought I'd post to list and confirm.
>>
>> Caveat: I'm coming from an Oracle background and am extremely wet behind
>> ears w/ respect to postgresql (please be kind :)).
>>
>> Right now, we can't change the software generating the SQL though that is
>> very possible/likely in the future. For now, the query is what it is. I can
>> alter indexing, add table partitioning, or anything else that doesn't
>> change logical structure of table.
>>
>> This is a very wide table holding astronomical data which will be used to
>> track changes in the sky to generate alerts for astronomers to evaluate.
>> The query has a variable number of "pixelId BETWEEN" predicates. As data
>> volume in DiaSource increases, performance of query decreases. I need to
>> confirm this but I suspect the primary reason for the decreased performance
>> is increased number of "pixelId BETWEEN" predicates generated by
>> application. Predicate count is the only thing that changes. I don't think
>> performance would otherwise degrade given execution plan.
>>
>> [local] @z=# \dS+ "DiaSource"
>> Table
>> "public.DiaSource"
>>Column|Type | Collation |
>> Nullable |Default| Storage | Stats target | Description
>>
>> -+-+---+--+---+-+--+-
>>  diaSourceId | bigint  |   |
>> not null | '0'::bigint   | plain   |  |
>>  ccdVisitId  | bigint  |   |
>> not null | '0'::bigint   | plain   |  |
>>  diaObjectId | bigint  |   |
>>  | '0'::bigint   | plain   |  |
>>  ssObjectId  | bigint  |   |
>>  | '0'::bigint   | plain   |  |
>>  parentDiaSourceId   | bigint  |   |
>>  | '0'::bigint   | plain   |  |
>>  prv_procOrder   | integer |   |
>> not null | 0 | plain   |  |
>>  ssObjectReassocTime | timestamp without time zone |   |
>>  |   | plain   |  |
>>  midPointTai | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  ra  | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  raSigma | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  decl| double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  declSigma   | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  ra_decl_Cov | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  x   | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  xSigma  | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  y   | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  ySigma  | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  x_y_Cov | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  apFlux  | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  apFluxErr   | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  snr | double precision|   |
>> not null | '0'::double precision | plain   |  |
>>  psFlux  | double precision|   |
>>  | '0'::double precision | plain   |  |
>>  psFluxSigma | double precision|   |
>>  | '0'::double precision | plain   |  |
>>  psRa 

Re: possibilities for SQL optimization

2020-04-16 Thread Olivier Gautherot
Hi Chris,

40ms for a select on 302 millions rows sounds reasonable. What I would try
to do is group by binary patterns on pixelId to reduce the number of cases
(use some arithmetics) and use a calculated index but I'm not sure of how
much you would save.


On Thu, Apr 16, 2020 at 4:57 PM Chris Stephens 
wrote:

> disastrous :)
>
> Planning Time: 7.569 ms
> Execution Time: 316969.474 ms
>
> On Thu, Apr 16, 2020 at 9:23 AM Pavel Stehule 
> wrote:
>
>>
>>
>> čt 16. 4. 2020 v 16:08 odesílatel Chris Stephens 
>> napsal:
>>
>>> PG12
>>> RHEL 8
>>>
>>> I suspect there's little I can do to get the following query to run
>>> faster/more efficiently but thought I'd post to list and confirm.
>>>
>>> Caveat: I'm coming from an Oracle background and am extremely wet behind
>>> ears w/ respect to postgresql (please be kind :)).
>>>
>>> Right now, we can't change the software generating the SQL though that
>>> is very possible/likely in the future. For now, the query is what it is. I
>>> can alter indexing, add table partitioning, or anything else that doesn't
>>> change logical structure of table.
>>>
>>> This is a very wide table holding astronomical data which will be used
>>> to track changes in the sky to generate alerts for astronomers to evaluate.
>>> The query has a variable number of "pixelId BETWEEN" predicates. As data
>>> volume in DiaSource increases, performance of query decreases. I need to
>>> confirm this but I suspect the primary reason for the decreased performance
>>> is increased number of "pixelId BETWEEN" predicates generated by
>>> application. Predicate count is the only thing that changes. I don't think
>>> performance would otherwise degrade given execution plan.
>>>
>>> [local] @z=# \dS+ "DiaSource"
>>> Table
>>> "public.DiaSource"
>>>Column|Type | Collation |
>>> Nullable |Default| Storage | Stats target | Description
>>>
>>> -+-+---+--+---+-+--+-
>>>  diaSourceId | bigint  |   |
>>> not null | '0'::bigint   | plain   |  |
>>>  ccdVisitId  | bigint  |   |
>>> not null | '0'::bigint   | plain   |  |
>>>  diaObjectId | bigint  |   |
>>>  | '0'::bigint   | plain   |  |
>>>  ssObjectId  | bigint  |   |
>>>  | '0'::bigint   | plain   |  |
>>>  parentDiaSourceId   | bigint  |   |
>>>  | '0'::bigint   | plain   |  |
>>>  prv_procOrder   | integer |   |
>>> not null | 0 | plain   |  |
>>>  ssObjectReassocTime | timestamp without time zone |   |
>>>  |   | plain   |  |
>>>  midPointTai | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  ra  | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  raSigma | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  decl| double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  declSigma   | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  ra_decl_Cov | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  x   | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  xSigma  | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  y   | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  ySigma  | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  x_y_Cov | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  apFlux  | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  apFluxErr   | double precision|   |
>>> not null | '0'::double precision | plain   |  |
>>>  snr

Re: possibilities for SQL optimization

2020-04-16 Thread Michael Lewis
>
> My other thought was to range partition by pixelID + brin index.

>>>
I would expect brin index to be INSTEAD of partitioning. You didn't share
buffer hits, which I expect were 100% on the subsequent explain analyze
runs, but the index scan may still be faster if the planner knows it only
needs to scan a few small indexes on one, or a few, partitions.

What sort of growth do you see on this table? Is future scalability a
significant concern, or is the problem just that 40-300ms for this select
is unacceptable?

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
Have you tuned effective_io_concurrency? The documentation says "this
setting only affects bitmap heap scans" and nearly all the time is there.
If it is still set to 1 as default, then increasing to 200 or perhaps more
might be prudent when on SSD or other memory backed storage. You don't even
need to change the server config defaults for testing the impact-

set effective_io_concurrency = 200;
/* select query */
reset effective_io_concurrency; /* if doing other things in the same
session and wanting to revert to default behavior, else just disconnect */


Re: timestamp and timestamptz

2020-04-16 Thread Adrian Klaver

On 4/16/20 1:23 AM, raf wrote:

Steve Baldwin wrote:


I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
time zone in a timestamptz column.

Try doing this before re-running your test:

set timezone to 'utc';

What you are seeing in your test is an artifact of that timezone setting.

Steve


Thanks. You're right.

   create table example (t timestamptz not null);
   insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 
Australia/Sydney');
   select * from  example;
   set timezone to 'utc';
   select * from  example;
   drop table example;

Does this:

   CREATE TABLE
   INSERT 0 1
   t
   --
2020-04-16 17:12:33.71768+10
   (1 row)

   SET
 t
   --
2020-04-16 07:12:33.71768+00
   (1 row)

   DROP TABLE

So it doesn't store the offset, but I've used postgres
for 12 years without knowing that and it hasn't been
a problem. Yay, postgres!

It doesn't store the offset but, by using timestamptz,
it knows that the timezone is UTC. That's what matters.


Well that is somewhat misleading. The value entered is stored as 
timestamp with an implicit tz of 'UTC'. The issue that trips people up 
is the format of the input value. If you enter an input with an offset 
or correct tz value then you have given Postgres an explicit value to 
work off for converting it to 'UTC'. For what is correct see here:


https://www.postgresql.org/docs/12/datatype-datetime.html#DATATYPE-TIMEZONES

If you input a value that does not have the above then Postgres uses the 
SET TimeZone value to implicitly set the input value's tz(as pointed out 
by David Johnson upstream). In other words Postgres does not assume an 
input value is in 'UTC'. For the OP's case that could cause issues if 
the timestamp in the CSV file does not have a proper offset/tz and the 
client is using a tz other then 'UTC'(as pointed by others upstream). 
The bottom line is that when dealing with timestamps explicit is better 
then implicit.



The fact that it knows the time zone is what makes everything
work. Timestamp without time zone is best avoided I think.

cheers,
raf








--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Recursive Queries

2020-04-16 Thread Alex Magnum
thanks for the suggestion. tablefunc extension might be the easiest one

On Thu, Apr 16, 2020 at 9:46 PM Edward Macnaghten 
wrote:

> On 16/04/2020 14:36, Edward Macnaghten wrote:
> > On 16/04/2020 09:35, Alex Magnum wrote:
> >> Hi,
> >> I have a simple table with singup timestamps
> >>
> >> What I would like to do is to create a table as shown below that
> >> displays the counts per our for the past n dates.
> SELECT hour, SUM(CASE(WHEN date = date THEN 1 ELSE 0 END)), , ...
> WHERE whatever
> GROUP BY hour
>
> Could be your friend
>
>
>
>


Re: possibilities for SQL optimization

2020-04-16 Thread Chris Stephens
On Thu, Apr 16, 2020 at 10:47 AM Michael Lewis  wrote:

> My other thought was to range partition by pixelID + brin index.
>

> I would expect brin index to be INSTEAD of partitioning. You didn't share
> buffer hits, which I expect were 100% on the subsequent explain analyze
> runs, but the index scan may still be faster if the planner knows it only
> needs to scan a few small indexes on one, or a few, partitions.
>

agreed but i wondered if partition elimination might be a faster way to
eliminate significant portions of table up fron then possibly parallelize
remaining partitioned brin index scans. not even sure its worth trying
though. this is a data volume vs cache size and predicate count (w/ each
predicate requiring a very efficient but not instantaneous index lookup)
issue.


>
What sort of growth do you see on this table? Is future scalability a
> significant concern, or is the problem just that 40-300ms for this select
> is unacceptable?
>

that's not really clear to me at this point but data will grow linearly for
a year and then remain constant. i think current volume represents ~ 3
months of data but i'm not sure. it is the 40-300ms response time that is
the issue. this system has ridiculous time constraints and has other
processing separate from database queries to account for. query response
times must fit into those requirements but since other pieces are still
being looked at, specific requirements aren't available as far as i
understand. "as fast as possible" is what we have right now. :(


>
>
> https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
> Have you tuned effective_io_concurrency? The documentation says "this
> setting only affects bitmap heap scans" and nearly all the time is there.
> If it is still set to 1 as default, then increasing to 200 or perhaps more
> might be prudent when on SSD or other memory backed storage. You don't even
> need to change the server config defaults for testing the impact-
>
> set effective_io_concurrency = 200;
> /* select query */
> reset effective_io_concurrency; /* if doing other things in the same
> session and wanting to revert to default behavior, else just disconnect */
>

I just tried that. results were same as without. thanks for the suggestion!


Using unlogged tables for web sessions

2020-04-16 Thread Stephen Carboni
Hello.

I was wondering if anyone was using unlogged tables for website
sessions in production. I'm interested if it breaks the prevailing
opinion that you don't put sessions in PG.




Need for box type with 1/4 precision and gist indexes

2020-04-16 Thread Anzor Apshev
Hi all! I would like to use postgres for time-series data and need
geometric index. For this reason I am using timescale extension and GiST
index on box type. Overall ingest and query performance is fantastic! But I
would like to optimize disk usage a bit. More technically my table schema:

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

CREATE TABLE tracks (
   tstamp timestamp  NOT NULL,
   objectId integer NOT NULL,
   rect box NOT NULL,
  -- five more smallint fields
);
SELECT create_hypertable('tracks', 'tstamp', chunk_time_interval =>
interval '30 day');

At this point I am ingesting my test data ~85M rows. An checking table size

 >SELECT pg_size_pretty( pg_database_size('db_test') );
 >14 GB
Then created index like this:
CREATE INDEX rect_indx  ON tracks USING gist(rect);
After index is created reported table size is - 16 GB.

Then I started to optimize disk usage. Builtin box type is 32 bytes (4x8
bytes), for my case i need only 2 bytes for coordinate precision, I changed
table schema like this:

CREATE TABLE tracks2 (
   tstamp timestamp  NOT NULL,
   objectId integer NOT NULL,

 -- replaced box type with 4 explicit coordinates
 rleft smallint NOT NULL,
 rtop smallint NOT NULL,
 rright smallint NOT NULL,
 rbottom smallint NOT NULL,

 -- five more smallint fields
);

Ingesting the same test data, database size is only 7.2 GB!
After that I have created index like this:
CREATE INDEX rect_indx  ON tracks2 USING gist( box( point(rleft,rtop),
point(rright,rbottom))) ;

With this index created my table size is 14 GB, which is disappointing. I
can't explain why db became so bloated.

Any help on this is much appreciated.

Thank you for reading to this point.

-- 
ГукIи псэкIи фыфей / Sincerely yours
Iэпщэ Анзор / Anzor Apshev


Re: Using of --data-checksums

2020-04-16 Thread Magnus Hagander
On Sun, Apr 12, 2020 at 4:23 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > And FWIW, I do think we should change the default. And maybe spend some
> > extra effort on the message coming out of pg_upgrade in this case to make
> > it clear to people what their options are and exactly what to do.
>
> Is there any hard evidence of checksums catching problems at all?
> Let alone in sufficient number to make them be on-by-default?
>

I would say yes. I've certainly had a fair number of cases where they've
detected storage corruption, especially with larger SAN type installation.
And coupled with validating the checksum on backup (either with
pg_basebackup or pgbackrest) it enables you to find the errors *early*,
while you can still restore a previous backup and replay WAL to get to a
point where you don't have to lose any data.

I believe both Stephen and David have some good stories they've heard from
people catching such issues with backrest as well.

This and as Michael also points out, it lets you know that the problem
occurred outside of PostgreSQL, makes for very important information when
tracking down issues.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


File Foreign Table Doesn't Exist when in Exception

2020-04-16 Thread Virendra Kumar
Hello Everyone,
I have a weird situation with file_fdw extension when I am creating a foreign 
table in anonymous block. Here is setup:
Create extension and server:==
postgres=# create extension file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER log_server FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER

Here is anonymous block, when I query the foreign table (FT) created in block 
with incorrect data. I get error and the FT is lost. See 
below:
 postgres=# DO $$
postgres$# DECLARE
postgres$# v_ft_file_name text;
postgres$# temp_variable text;
postgres$# v_sql text;
postgres$# log_min_time date;
postgres$# BEGIN
postgres$# 
postgres$# v_ft_file_name:='abc.csv';
postgres$# 
postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
postgres$#   user_name text,
postgres$#   database_name text,
postgres$#   connection_from   text
postgres$# ) SERVER log_server
postgres$# OPTIONS (filename 
''/opt/postgres/122/data/'||v_ft_file_name||''')';
postgres$# execute v_sql;
postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
postgres$# execute v_sql into log_min_time; <-- Querying from FT with incorrect 
data
postgres$#    
postgres$# END;
postgres$# $$ LANGUAGE 'plpgsql';
ERROR:  missing data for column "database_name"
CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
SQL statement "select min(user_name) from "abc.csv""
PL/pgSQL function inline_code_block line 19 at EXECUTE
postgres=# postgres=# select ftrelid::regclass::text from pg_foreign_table 
postgres-# where ftrelid::regclass::text like '%abc.csv%';
 ftrelid 
-
(0 rows)


When I don't query the FT I can see the foreign table:=
postgres=# DO $$postgres$# DECLARE
postgres$# v_ft_file_name text;
postgres$# temp_variable text;
postgres$# v_sql text;
postgres$# log_min_time date;
postgres$# BEGIN
postgres$# 
postgres$# v_ft_file_name:='abc.csv';
postgres$# 
postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
postgres$#   user_name text,
postgres$#   database_name text,
postgres$#   connection_from   text
postgres$# ) SERVER log_server
postgres$# OPTIONS (filename 
''/opt/postgres/122/data/'||v_ft_file_name||''')';
postgres$# execute v_sql;
postgres$# 
postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
postgres$# --execute v_sql into log_min_time; <-Commented SELECT on FT
postgres$#    
postgres$# END;
postgres$# $$ LANGUAGE 'plpgsql';
DO
postgres=# 
postgres=# 
postgres=# select ftrelid::regclass::text from pg_foreign_table 
postgres-# where ftrelid::regclass::text like '%abc.csv%';
  ftrelid  
---
 "abc.csv"
(1 row)

postgres=# 


When I query the table outside anonymous block it is still there. So I am 
thinking may be I am missing some concept here or hitting a 
bug:
postgres=# select min(user_name) from "abc.csv";
ERROR:  missing data for column "database_name"
CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
postgres=# 
postgres=# 
postgres=# select ftrelid::regclass::text from pg_foreign_table 

    where ftrelid::regclass::text like 
'%abc.csv%';
  ftrelid  
---
 "abc.csv"
(1 row)


Regards,
Virendra Kumar



Re: File Foreign Table Doesn't Exist when in Exception

2020-04-16 Thread Adrian Klaver

On 4/16/20 3:39 PM, Virendra Kumar wrote:

Hello Everyone,

I have a weird situation with file_fdw extension when I am creating a 
foreign table in anonymous block. Here is setup:


Create extension and server:
==
postgres=# create extension file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER log_server FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER


Here is anonymous block, when I query the foreign table (FT) created in 
block with incorrect data. I get error and the FT is lost. See below:


postgres=# DO $$
postgres$# DECLARE
postgres$# v_ft_file_name text;
postgres$# temp_variable text;
postgres$# v_sql text;
postgres$# log_min_time date;
postgres$# BEGIN
postgres$#
postgres$# v_ft_file_name:='abc.csv';
postgres$#
postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
postgres$#   user_name text,
postgres$#   database_name text,
postgres$#   connection_from   text
postgres$# ) SERVER log_server
postgres$# OPTIONS (filename 
''/opt/postgres/122/data/'||v_ft_file_name||''')';

postgres$# execute v_sql;
postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
postgres$# execute v_sql into log_min_time; <-- Querying from FT with 
incorrect data

postgres$#
postgres$# END;
postgres$# $$ LANGUAGE 'plpgsql';
ERROR:  missing data for column "database_name"
CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
SQL statement "select min(user_name) from "abc.csv""
PL/pgSQL function inline_code_block line 19 at EXECUTE

postgres=#
postgres=# select ftrelid::regclass::text from pg_foreign_table
postgres-# where ftrelid::regclass::text like '%abc.csv%';
  ftrelid
-
(0 rows)


When I don't query the FT I can see the foreign table:
=
postgres=# DO $$
postgres$# DECLARE
postgres$# v_ft_file_name text;
postgres$# temp_variable text;
postgres$# v_sql text;
postgres$# log_min_time date;
postgres$# BEGIN
postgres$#
postgres$# v_ft_file_name:='abc.csv';
postgres$#
postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
postgres$#   user_name text,
postgres$#   database_name text,
postgres$#   connection_from   text
postgres$# ) SERVER log_server
postgres$# OPTIONS (filename 
''/opt/postgres/122/data/'||v_ft_file_name||''')';

postgres$# execute v_sql;
postgres$#
postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
postgres$# --execute v_sql into log_min_time; <-Commented SELECT on FT
postgres$#
postgres$# END;
postgres$# $$ LANGUAGE 'plpgsql';
DO
postgres=#
postgres=#
postgres=# select ftrelid::regclass::text from pg_foreign_table
postgres-# where ftrelid::regclass::text like '%abc.csv%';
   ftrelid
---
  "abc.csv"
(1 row)

postgres=#


When I query the table outside anonymous block it is still there. So I 
am thinking may be I am missing some concept here or hitting a bug:


postgres=# select min(user_name) from "abc.csv";
ERROR:  missing data for column "database_name"


To me it looks like your CSV data is either missing the column/data for 
the column database_name or the data is malformed.



CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
postgres=#
postgres=#
postgres=# select ftrelid::regclass::text from 
pg_foreign_table 
where ftrelid::regclass::text like '%abc.csv%';

   ftrelid
---
  "abc.csv"
(1 row)


Regards,
Virendra Kumar




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: File Foreign Table Doesn't Exist when in Exception

2020-04-16 Thread Adrian Klaver

On 4/16/20 3:59 PM, Virendra Kumar wrote:
Please reply to list also.
Ccing list.

Thank you Adrian!

I know the data is malformed I am more concerned about the behavior that 
the foreign table itself doesn't exists when it has malformed data and 
is being queried in anonymous block.


https://www.postgresql.org/docs/12/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

"By default, any error occurring in a PL/pgSQL function aborts execution 
of the function, and indeed of the surrounding transaction as well. You 
can trap errors and recover from them by using a BEGIN block with an 
EXCEPTION clause. The syntax is an extension of the normal syntax for a 
BEGIN block: ..."




Regards,
Virendra


On Thursday, April 16, 2020, 3:47:08 PM PDT, Adrian Klaver 
 wrote:



On 4/16/20 3:39 PM, Virendra Kumar wrote:
 > Hello Everyone,
 >
 > I have a weird situation with file_fdw extension when I am creating a
 > foreign table in anonymous block. Here is setup:
 >
 > Create extension and server:
 > ==
 > postgres=# create extension file_fdw;
 > CREATE EXTENSION
 > postgres=# CREATE SERVER log_server FOREIGN DATA WRAPPER file_fdw;
 > CREATE SERVER
 >
 >
 > Here is anonymous block, when I query the foreign table (FT) created in
 > block with incorrect data. I get error and the FT is lost. See below:
 > 
 > postgres=# DO $$
 > postgres$# DECLARE
 > postgres$# v_ft_file_name text;
 > postgres$# temp_variable text;
 > postgres$# v_sql text;
 > postgres$# log_min_time date;
 > postgres$# BEGIN
 > postgres$#
 > postgres$# v_ft_file_name:='abc.csv';
 > postgres$#
 > postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
 > postgres$#   user_name text,
 > postgres$#   database_name text,
 > postgres$#   connection_from   text
 > postgres$# ) SERVER log_server
 > postgres$# OPTIONS (filename
 > ''/opt/postgres/122/data/'||v_ft_file_name||''')';
 > postgres$# execute v_sql;
 > postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
 > postgres$# execute v_sql into log_min_time; <-- Querying from FT with
 > incorrect data
 > postgres$#
 > postgres$# END;
 > postgres$# $$ LANGUAGE 'plpgsql';
 > ERROR:  missing data for column "database_name"
 > CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
 > SQL statement "select min(user_name) from "abc.csv""
 > PL/pgSQL function inline_code_block line 19 at EXECUTE
 >
 > postgres=#
 > postgres=# select ftrelid::regclass::text from pg_foreign_table
 > postgres-# where ftrelid::regclass::text like '%abc.csv%';
 >   ftrelid
 > -
 > (0 rows)
 >
 >
 > When I don't query the FT I can see the foreign table:
 > =
 > postgres=# DO $$
 > postgres$# DECLARE
 > postgres$# v_ft_file_name text;
 > postgres$# temp_variable text;
 > postgres$# v_sql text;
 > postgres$# log_min_time date;
 > postgres$# BEGIN
 > postgres$#
 > postgres$# v_ft_file_name:='abc.csv';
 > postgres$#
 > postgres$# v_sql:= 'CREATE FOREIGN TABLE "'||v_ft_file_name||'"(
 > postgres$#   user_name text,
 > postgres$#   database_name text,
 > postgres$#   connection_from   text
 > postgres$# ) SERVER log_server
 > postgres$# OPTIONS (filename
 > ''/opt/postgres/122/data/'||v_ft_file_name||''')';
 > postgres$# execute v_sql;
 > postgres$#
 > postgres$# v_sql:='select min(user_name) from "'||v_ft_file_name||'"';
 > postgres$# --execute v_sql into log_min_time; <-Commented SELECT on FT
 > postgres$#
 > postgres$# END;
 > postgres$# $$ LANGUAGE 'plpgsql';
 > DO
 > postgres=#
 > postgres=#
 > postgres=# select ftrelid::regclass::text from pg_foreign_table
 > postgres-# where ftrelid::regclass::text like '%abc.csv%';
 >    ftrelid
 > ---
 >   "abc.csv"
 > (1 row)
 >
 > postgres=#
 >
 >
 > When I query the table outside anonymous block it is still there. So I
 > am thinking may be I am missing some concept here or hitting a bug:
 > 
 > postgres=# select min(user_name) from "abc.csv";
 > ERROR:  missing data for column "database_name"

To me it looks like your CSV data is either missing the column/data for
the column database_name or the data is malformed.


 > CONTEXT:  COPY abc.csv, line 1: "aa,bb,cc"
 > postgres=#
 > postgres=#
 > postgres=# select ftrelid::regclass::text from
 > pg_foreign_table
 > where ftrelid::regclass::text like '%abc.csv%';
 >    ftrelid
 > ---
 >   "abc.csv"
 > (1 row)
 >
 >
 > Regards,
 > Virendra Kumar

 >


--
Adrian Klaver
adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Using unlogged tables for web sessions

2020-04-16 Thread Tim Cross


Stephen Carboni  writes:

> Hello.
>
> I was wondering if anyone was using unlogged tables for website
> sessions in production. I'm interested if it breaks the prevailing
> opinion that you don't put sessions in PG.

This really depends on what you define as website session data and what
benefit you would see compared to the additional overhead of maintaining
this session information remotely (from the client). Depending on your
application, there is often some session information which makes more
sense stored on the back end server rather than in the client - notably,
data used by your server API to modify responses or possibly encrypted
data to handle 'remember me' type functionality. However, you probably
don't want to store session data used by the client API e.g. browser
Javascript as this would introduce additional network overheads,
latency, load on web and db server, increased web and db server API
complexity and possibly additional data privacy/security concerns you
will need to manage. This can be hard to justify when you have good
client data storage facilities available.

I have not come across a use case where it made sense to store ALL
session data remotely in the database. I have seen situations with a
very specialised application where having a more full featured LOCAL (to
the client) database server to record session information can be useful,
but this is rare.


--
Tim Cross




performance of first exec of prepared statement

2020-04-16 Thread Ted Toth
I've noticed that the first exec of an INSERT prepared statement takes ~5
time longer (I'm using libpq in C and wrapping the calls to time them) then
subsequent exec's is this the expected behavior and if so is there any
thing I can do to mitigate this affect?

Ted


Re: performance of first exec of prepared statement

2020-04-16 Thread Ted Toth
On Thu, Apr 16, 2020 at 6:29 PM Ted Toth  wrote:

> I've noticed that the first exec of an INSERT prepared statement takes ~5
> time longer (I'm using libpq in C and wrapping the calls to time them) then
> subsequent exec's is this the expected behavior and if so is there any
> thing I can do to mitigate this affect?
>
> Ted
>

For example (in my environment) I'm seeing the prepare take ~10ms, the
first exec take ~30 ms and subsequent exec's take ~4 ms.


Re: performance of first exec of prepared statement

2020-04-16 Thread Adrian Klaver

On 4/16/20 4:59 PM, Ted Toth wrote:



On Thu, Apr 16, 2020 at 6:29 PM Ted Toth > wrote:


I've noticed that the first exec of an INSERT prepared statement
takes ~5 time longer (I'm using libpq in C and wrapping the calls to
time them) then subsequent exec's is this the expected behavior and
if so is there any thing I can do to mitigate this affect?

Ted


For example (in my environment) I'm seeing the prepare take ~10ms, the 
first exec take ~30 ms and subsequent exec's take ~4 ms.




I don't have an answer. I believe though that to help those that might 
it would be helpful to show the actual code.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Using of --data-checksums

2020-04-16 Thread Michael Paquier
On Thu, Apr 16, 2020 at 03:47:34PM -0700, Jeremy Schneider wrote:
> Data checksums are a hard requirement across the entire RDS PostgreSQL
> fleet - we do not allow it to be disabled in RDS. I've definitely seen a
> lot of hard evidence (for example, customer cases I've personally been
> involved in) that it catches problems.

Oh, that's good to know.  Thanks for the information.  I pushed hard
as well to make this a requirement on what I work on.

> I could not exaggerate how useful
> and important I think this feature is: being able to very quickly and
> easily know that a problem originated outside of the PostgreSQL code.

The worst part with checksums disabled is having to tell a customer or
a support staff that you don't actually know from where the problem
comes, what is the actual origin of it, and why you think that the
error you are seeing in the Postgres logs is most likely linked to a
lower-level corruption as there can be many patterns, like broken
btree pages, toast errors, missing attributes in catalogs, failures
with FK references, primary key duplicates, etc.  And people like
to complain a lot about the database being broken because that's a
very sensitive piece and usually more things depend on it.  With
checksums enabled, you still cannot say exactly from where the problem
comes, but you can redirect the complains more easily to the correct
people to help find out what the actual problem is.  Even better, you
can also know if a problem probably comes directly from Postgres and
some backend logic if you don't see a checksum failure (note that
could be as well a misdesigned HA workflow, custom backup script as
well who knows but at least you know that something you control
directly gets wrong).  And the error message provided is clear.

> This was in part what led to that long blog article I wrote about
> checksums, and it's why enabling checksums was happiness hint #1 until I
> broke them into categories.

Reference? ;p
--
Michael


signature.asc
Description: PGP signature


Could not resolve host name error in psycopg2

2020-04-16 Thread derwin theduck
We have been getting this error intermittently (about once a week) in
Django with channels since switching from a local database server to a
hosted one:

could not translate host name "timescaledb" to address: Name or service not
known

Connecting directly to the server with psycopg2 in the python interpreter
works ok:

>>> import psycopg2 as pg
>>> pg.__version__
'2.7.7 (dt dec pq3 ext lo64)'
>>> conn = pg.connect(host='timescaledb', database=*, user=*, password=*)
  >>> conn.server_version
110005

The resident database / network expert maintains that everything is ok on
the networking side, and that django, once it establishes a connection,
should not be attempting to reconnect.

The problem happens not on startup, but in the middle of the day after
processing transactions without any issues for several hours.

Any idea how to go about troubleshooting / fixing this? :'(

 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
line 276, in __iter__)
 (self._fetch_all())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
line 1261, in _fetch_all)
 (self._result_cache = list(self._iterable_class(self)))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
line 57, in __iter__)
 (results = compiler.execute_sql(chunked_fetch=self.chunked_fetch,
chunk_size=self.chunk_size))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py",
line 1135, in execute_sql)
 (cursor = self.connection.cursor())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
line 24, in inner)
 (return func(*args, **kwargs))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
line 260, in cursor)
 (return self._cursor())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
line 236, in _cursor)
 (self.ensure_connection())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
line 24, in inner)
 (return func(*args, **kwargs))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
line 220, in ensure_connection)
 (self.connect())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/utils.py", line
90, in __exit__)
 (raise dj_exc_value.with_traceback(traceback) from exc_value)
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
line 220, in ensure_connection)
 (self.connect())
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
line 24, in inner)
 (return func(*args, **kwargs))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
line 197, in connect)
 (self.connection = self.get_new_connection(conn_params))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
line 24, in inner)
 (return func(*args, **kwargs))
 (  File
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/postgresql/base.py",
line 185, in get_new_connection)
 (connection = Database.connect(**conn_params))
 (  File "/usr/local/lib64/python3.7/site-packages/psycopg2/__init__.py",
line 130, in connect)
 (conn = _connect(dsn, connection_factory=connection_factory,
**kwasync))
 (  could not translate host name "timescaledb" to address: Name or service
not known)


Re: timestamp and timestamptz

2020-04-16 Thread raf
Adrian Klaver wrote:

> On 4/16/20 1:23 AM, raf wrote:
> > Steve Baldwin wrote:
> > 
> > > I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
> > > time zone in a timestamptz column.
> > > 
> > > Try doing this before re-running your test:
> > > 
> > > set timezone to 'utc';
> > > 
> > > What you are seeing in your test is an artifact of that timezone setting.
> > > 
> > > Steve
> > 
> > Thanks. You're right.
> > 
> >create table example (t timestamptz not null);
> >insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 
> > Australia/Sydney');
> >select * from  example;
> >set timezone to 'utc';
> >select * from  example;
> >drop table example;
> > 
> > Does this:
> > 
> >CREATE TABLE
> >INSERT 0 1
> >t
> >--
> > 2020-04-16 17:12:33.71768+10
> >(1 row)
> > 
> >SET
> >  t
> >--
> > 2020-04-16 07:12:33.71768+00
> >(1 row)
> > 
> >DROP TABLE
> > 
> > So it doesn't store the offset, but I've used postgres
> > for 12 years without knowing that and it hasn't been
> > a problem. Yay, postgres!
> > 
> > It doesn't store the offset but, by using timestamptz,
> > it knows that the timezone is UTC. That's what matters.
> 
> Well that is somewhat misleading. The value entered is stored as timestamp
> with an implicit tz of 'UTC'. The issue that trips people up is the format
> of the input value. If you enter an input with an offset or correct tz value
> then you have given Postgres an explicit value to work off for converting it
> to 'UTC'.

When I said "it knows that the timezone is UTC", I was only referring
to the values once stored as a timestamptz, not the input. Sorry that
wasn't clear enough. I meant to say that once a value is stored in a
timestamptz (as opposed to a timestamp), postgres knows that it is
stored in UTC and will do the right things with it.

> For what is correct see here:
> 
> https://www.postgresql.org/docs/12/datatype-datetime.html#DATATYPE-TIMEZONES
> 
> If you input a value that does not have the above then Postgres uses the SET
> TimeZone value to implicitly set the input value's tz(as pointed out by
> David Johnson upstream). In other words Postgres does not assume an input
> value is in 'UTC'. For the OP's case that could cause issues if the
> timestamp in the CSV file does not have a proper offset/tz and the client is
> using a tz other then 'UTC'(as pointed by others upstream). The bottom line
> is that when dealing with timestamps explicit is better then implicit.

I couldn't agree more.

> > The fact that it knows the time zone is what makes everything
> > work. Timestamp without time zone is best avoided I think.
> > 
> > cheers,
> > raf
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 




Re: Could not resolve host name error in psycopg2

2020-04-16 Thread Adrian Klaver

On 4/16/20 5:38 PM, derwin theduck wrote:
We have been getting this error intermittently (about once a week) in 
Django with channels since switching from a local database server to a 
hosted one:


could not translate host name "timescaledb" to address: Name or service 
not known


Connecting directly to the server with psycopg2 in the python 
interpreter works ok:


 >>> import psycopg2 as pg
 >>> pg.__version__
'2.7.7 (dt dec pq3 ext lo64)'
 >>> conn = pg.connect(host='timescaledb', database=*, user=*, 
password=*)        >>> conn.server_version

110005


Well I'm go out on a limb and say there is some sort of DNS resolution 
issue going on.




The resident database / network expert maintains that everything is ok 
on the networking side, and that django, once it establishes a 
connection, should not be attempting to reconnect.


Huh? Leaving open connections is not considered a good thing. In other 
words a connection should last for as long as it takes to get it's task 
done and then it should close.




The problem happens not on startup, but in the middle of the day after 
processing transactions without any issues for several hours.


Any idea how to go about troubleshooting / fixing this? :'(

  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py", 
line 276, in __iter__)

  (    self._fetch_all())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py", 
line 1261, in _fetch_all)

  (    self._result_cache = list(self._iterable_class(self)))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py", 
line 57, in __iter__)
  (    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, 
chunk_size=self.chunk_size))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", 
line 1135, in execute_sql)

  (    cursor = self.connection.cursor())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py", line 
24, in inner)

  (    return func(*args, **kwargs))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py", 
line 260, in cursor)

  (    return self._cursor())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py", 
line 236, in _cursor)

  (    self.ensure_connection())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py", line 
24, in inner)

  (    return func(*args, **kwargs))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py", 
line 220, in ensure_connection)

  (    self.connect())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/utils.py", 
line 90, in __exit__)

  (    raise dj_exc_value.with_traceback(traceback) from exc_value)
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py", 
line 220, in ensure_connection)

  (    self.connect())
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py", line 
24, in inner)

  (    return func(*args, **kwargs))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py", 
line 197, in connect)

  (    self.connection = self.get_new_connection(conn_params))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py", line 
24, in inner)

  (    return func(*args, **kwargs))
  (  File 
"/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/postgresql/base.py", 
line 185, in get_new_connection)

  (    connection = Database.connect(**conn_params))
  (  File 
"/usr/local/lib64/python3.7/site-packages/psycopg2/__init__.py", line 
130, in connect)
  (    conn = _connect(dsn, connection_factory=connection_factory, 
**kwasync))
  (  could not translate host name "timescaledb" to address: Name or 
service not known)






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: performance of first exec of prepared statement

2020-04-16 Thread Rob Sargent

On 4/16/20 6:15 PM, Adrian Klaver wrote:

On 4/16/20 4:59 PM, Ted Toth wrote:



On Thu, Apr 16, 2020 at 6:29 PM Ted Toth > wrote:


    I've noticed that the first exec of an INSERT prepared statement
    takes ~5 time longer (I'm using libpq in C and wrapping the calls to
    time them) then subsequent exec's is this the expected behavior and
    if so is there any thing I can do to mitigate this affect?

    Ted


For example (in my environment) I'm seeing the prepare take ~10ms, 
the first exec take ~30 ms and subsequent exec's take ~4 ms.




I don't have an answer. I believe though that to help those that might 
it would be helpful to show the actual code.



You expect the subsequent calls to benefit from the cached query parse 
and planning.  What does you query cost without begin wrapped in a 
prepared statement (preferably from a cold start).






Re: Could not resolve host name error in psycopg2

2020-04-16 Thread derwin theduck
Thank you, I've changed it to use the server's IP address since, so I'll
wait to see if the error happens again.

On Fri, 17 Apr 2020 at 09:00, Adrian Klaver 
wrote:

> On 4/16/20 5:38 PM, derwin theduck wrote:
> > We have been getting this error intermittently (about once a week) in
> > Django with channels since switching from a local database server to a
> > hosted one:
> >
> > could not translate host name "timescaledb" to address: Name or service
> > not known
> >
> > Connecting directly to the server with psycopg2 in the python
> > interpreter works ok:
> >
> >  >>> import psycopg2 as pg
> >  >>> pg.__version__
> > '2.7.7 (dt dec pq3 ext lo64)'
> >  >>> conn = pg.connect(host='timescaledb', database=*, user=*,
> > password=*)>>> conn.server_version
> > 110005
>
> Well I'm go out on a limb and say there is some sort of DNS resolution
> issue going on.
>
> >
> > The resident database / network expert maintains that everything is ok
> > on the networking side, and that django, once it establishes a
> > connection, should not be attempting to reconnect.
>
> Huh? Leaving open connections is not considered a good thing. In other
> words a connection should last for as long as it takes to get it's task
> done and then it should close.
>
> >
> > The problem happens not on startup, but in the middle of the day after
> > processing transactions without any issues for several hours.
> >
> > Any idea how to go about troubleshooting / fixing this? :'(
> >
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
> > line 276, in __iter__)
> >   (self._fetch_all())
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
> > line 1261, in _fetch_all)
> >   (self._result_cache = list(self._iterable_class(self)))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/models/query.py",
> > line 57, in __iter__)
> >   (results = compiler.execute_sql(chunked_fetch=self.chunked_fetch,
> > chunk_size=self.chunk_size))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py",
>
> > line 1135, in execute_sql)
> >   (cursor = self.connection.cursor())
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
> line
> > 24, in inner)
> >   (return func(*args, **kwargs))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
>
> > line 260, in cursor)
> >   (return self._cursor())
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
>
> > line 236, in _cursor)
> >   (self.ensure_connection())
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
> line
> > 24, in inner)
> >   (return func(*args, **kwargs))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
>
> > line 220, in ensure_connection)
> >   (self.connect())
> >   (  File
> > "/home/dduck/.local/lib/python3.7/site-packages/django/db/utils.py",
> > line 90, in __exit__)
> >   (raise dj_exc_value.with_traceback(traceback) from exc_value)
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
>
> > line 220, in ensure_connection)
> >   (self.connect())
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
> line
> > 24, in inner)
> >   (return func(*args, **kwargs))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/base/base.py",
>
> > line 197, in connect)
> >   (self.connection = self.get_new_connection(conn_params))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/utils/asyncio.py",
> line
> > 24, in inner)
> >   (return func(*args, **kwargs))
> >   (  File
> >
> "/home/dduck/.local/lib/python3.7/site-packages/django/db/backends/postgresql/base.py",
>
> > line 185, in get_new_connection)
> >   (connection = Database.connect(**conn_params))
> >   (  File
> > "/usr/local/lib64/python3.7/site-packages/psycopg2/__init__.py", line
> > 130, in connect)
> >   (conn = _connect(dsn, connection_factory=connection_factory,
> > **kwasync))
> >   (  could not translate host name "timescaledb" to address: Name or
> > service not known)
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Detecting renamed columns via pgouput in logical replication ?

2020-04-16 Thread Andreas Andreakis
Hello,

when using Postgres 10 or higher, it seems that pgoutput can be used as an
output plugin for logical replication.

Does this allow to detect column renames ? Or is there a ticket for adding
support if the feature does not exist (if it is feasible to implement) ?

https://www.postgresql.org/docs/10/protocol-logicalrep-message-formats.html
should
be the format spec of pgoutput and it does not seem to contain sufficient
information for detecting renames. Also checked for Postgres version 11 and
12.

What I was hoping to find is the ordinal position of columns and use that
to infer column renames. As new columns always get a higher ordinal
position and renamed columns keep their position. Hence, a column rename
could be detected if different column names are received for the same
ordinal position, by tracking the column name per ordinal position at the
consumer. (Please let me know if any of that is incorrect)

cheers