from:"Peter J. Holzer"

Re: Rationale for aversion to the central database?

2018-04-09 Thread Peter J. Holzer

On 2018-04-08 14:39:49 -0700, Guyren Howe wrote:
> I am a Rails developer at a medium-large size company. I’ve mostly worked at
> smaller companies. I’ve some exposure to other web development communities.
> 
> When it comes to databases, I have universally encountered the attitude that
> one should treat the database as a dumb data bucket. There is a *very* strong
> aversion to putting much of any business logic in the database. I encounter
> substantial aversion to have multiple applications access one database, or 
> even
> the reverse: all abstraction should be at the application layer.

Like Tim I would like to make a case for finding a middle ground. The
sweet spot may be different from project to project and it is certainly
different for different teams or companies.

I'll try to outline what I like to solve within the database and what I
like to put into the application, and why:

Everything which is easy to specify *declaratively* goes into the
database: Tables (of course), views, constraints, etc. These are
basically type declarations for the data which is stored in the
database, so it belongs there.

Simple stored procedures and triggers go into the database, too. They
should make using the database easier and should not surprise the user.
This is mostly optional.

Anything I would consider "application logic" (when the user does X,
then the system should do Y) goes into the application. There is often a
distinction between a backend (or data access layer) and a frontend. The
backend could theoretically be within the database and be called via
stored procedures, but I find it much easier to test and debug code
outside of the database (this may be just my lack of knowledge about
available tools).

In my applications I use SQL heavily. RDBMs are good at processing
queries, so use them for that. If all you want is a key-value store,
don't use PostgreSQL. I'm not very fond of ORMs. I know what I want to
do and can express it in SQL. An ORM makes me translate that into a
different (and usually inferior) query language, which is then
translated back into SQL. That doesn't make things easier for me.

> My best theory is that these communities developed at a time when Windows was
> more dominant, and just generally it was *significantly* easier to use MySQL
> than Postgres for many, particularly new, developers.

I come from Oracle, not MySQL, But I have also used MySQL, and I guess
the very wide gap in capabilities between Oracle and MySQL made me
cautious about putting too much into the database. There is also the
expectation that you should be able to use a different database engine
(SQL is a standard, right?) just like you should be able to use a
different C compiler, but in practice that never works. And of course I
wasn't very impressed with PL/SQL. (PostgreSQL gives you a much wider
range of languages for stored procedures than Oracle, but PL/PerlU still
isn't quite the same as Perl (And I suspect it's the same for Python).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Rationale for aversion to the central database?

2018-04-09 Thread Peter J. Holzer

On 2018-04-08 19:39:43 -0400, Stephen Frost wrote:
> * Alvaro Aguayo Garcia-Rada (aagu...@opensysperu.com) wrote:
> > 1. Portability. Being tied to a single database engine is not always
> > a good idea. When you write business logic in database, you have to
> > write and maintain your store procedures for every database engine
> > you want to support. That can be really complicated, and will surely
> > take pretty much time, as programming languages for different
> > databases are very different from each other. And it's permanent:
> > Every time you make a change to a store procedure, you must make
> > that change for every supported database.
> 
> The portability claim tends to be both a false one and often, when
> realized, results in a solution where you aren't using the database for
> anything complicated and you'd be better off with a much simpler data
> store.  You also don't actually offer any justification for the claim
> that being tied to a single database engine is not always a good idea-
> why is that?  With commercial databases it tends to be because you are
> at the behest of some very large commercial company- but that isn't an
> issue with PostgreSQL.

Many companies don't like to support multiple database engines. There is
the license issue, true, but having DBAs is even more important. If you
are an Oracle shop you have your Oracle DBAs, and they know how to make
backups (and how to restore them), how to monitor the database, how to
track down performance issues, etc. If some application needs MS-SQL or
PostgreSQL or MySQL, they won't have that expertise, so they will have
to be trained, or you need additional DBAs (maybe on a part-time or
support contract basis). So all other things being equal, companies will
prefer applications which work with databases they already use.

So if you are an application developer, it makes sense to develop your
application to work with several databases. It vastly expands your
potential clientele.

At least that was the situation 10 years ago. These days much software
is offered as a service. If the customer sees only a REST API and
doesn't have to host the database on their own servers, they won't care
about the RDBMS underneath.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Rationale for aversion to the central database?

2018-04-27 Thread Peter J. Holzer

On 2018-04-27 22:52:39 +, g...@luxsci.net wrote:
> Perhaps I'm extreme. In my ideal world, developers might not even know table
> names! I'm kidding ,sorta...

If they don't know the table names, how can they write those stored
procedures?

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Rationale for aversion to the central database?

2018-04-28 Thread Peter J. Holzer

On 2018-04-28 09:54:27 -0500, Steven Lembark wrote:
> On Sat, 28 Apr 2018 08:02:21 +0200
> "Peter J. Holzer"  wrote:
> 
> > On 2018-04-27 22:52:39 +, g...@luxsci.net wrote:
> > > Perhaps I'm extreme. In my ideal world, developers might not even
> > > know table names! I'm kidding ,sorta...  
> > 
> > If they don't know the table names, how can they write those stored
> > procedures?
> 
> One of the main reasons DBA's need to be pigs. Much spaghetti can be
> avoided in ORM frameworks with well-designed views. 

Apparently my remark was too short (and perhaps with too little context)
to be clear.

As I understood g...@luxsci.net, they were arguing for writing lots of
stored procedures so that developers wouldn't even have to know the
table names (they would just call the procedures) [Although I now see
that it was somebody else who proposed that as the only interface].

But those stored procedures don't write themselves. Somebody has to
write them and a person who writes code is called a developer. So there
needs to be at least one developer who knows the table names - the
developer of the stored procedures. And that developer should better be
good - API design is hard, and if you take away SQL from your fellow
developers you should give them something better (for the task at hand),
not something worse.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-10 Thread Peter J. Holzer

On 2018-05-10 21:37:26 +0100, Ben Hood wrote:
> On 10 May 2018, at 16:33, Francisco Olarte  wrote:
> 
> For what you want to do I think you'll have to parse the text value,
> maybe by definig a view with a text columns and using some
> rule/trigger magic for insert / updates.
> 
> 
> Sorry for being unclear - the solution I have in production appears to work
> with 
> 
> CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK 
> (EXTRACT(TIMEZONE
> FROM VALUE) = 0);
> 
> This raises an exception when an app doesn’t use UTC.

I don't understand how this can work. As Francisco demonstrated,
EXTRACT(TIMEZONE FROM ts) doesn't extract the time zone from the value
ts, it reports the offset of the client's time zone.

So, if my time zone is set to Europe/Vienna,
extract(timezone from '2018-05-10 23:17:44+00'::timestamptz)
will still return 7200, even though I have explicitely specified a UTC
timestamp.

What your check probably does is to enforce that the client's time zone
is set to UTC.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-13 Thread Peter J. Holzer

On 2018-05-13 12:46:42 -0700, Adrian Klaver wrote:
> Not trying to trick anyone and no magic. The difference in the represented
> values between ts_tz and ts_naive is the heart of my argument. Timestamptz
> values are stored in manner that allows you to have the output with a time
> zone offset. Timestamps w/notz are not.

I disagree. The difference isn't in how they are *stored*. We have
already established that they are stored in the same format.

The difference is in their *semantics*.

A timestamptz denotes a unique and unambiguous point in time. This point
in time can be represented in various time zones. So the point in time
when Apollo 11 launched can be represented as '1969-07-16 09:32:00-04'
(local time) or '1969-07-16 13:32:00+00' (UTC) or '1969-07-16
14:32:00+01' (CET). These are just different ways to denote the same
point in time - and in fact all three are stored as the same timestamptz
value (-1455288000, I think). Only when displaying the value or
doing certain operations on it is it converted to YMDhmsfz format.

A timestamp without timezone does NOT denote an unambiguous point in
time. It is just a compact form of representing a date and time. But
without any additional context (the location or time zone) this doesn't
tell you much. '2018-01-01 00:00' in Kiribati was 25 hours before
'2018-01-01 00:00' in American Samoa.

> > But timestamps do not have timezone. They are points in the time line.
> > Points in earth surface have timezones, countries have timezones, but
> > nor timestamp.
> 
> I don't know about you but I am living on the earths surface:). That means
> when I deal with timestamps they are with reference to a location.

But when you store a timestamp as a timestamptz, you lose that reference
to a location. All that is left is an abstract reference to a point in
time. Only when you read that value again (and do certain operations
with it) is that reference to a location added again - but the current
location of the reader, not the the original locaton (that is lost
forever, unless it was stored elsewhere).

> I will agree that timestamptz is stored as number only. However that number
> in Postgres has an implied time zone of UTC:
> 
> https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT
> 
> "For timestamp with time zone, the internally stored value is always in UTC
> (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> GMT)"

This is not actually true. There is nothing in the storage format which
depends on UTC (well, the epoch is at Midnight UTC, at if you say the
epoch is at 08:00 Beijing time it is equally correct).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Out of memory error with PG10.3, 10.4 but not 9.3.19

2018-05-18 Thread Peter J. Holzer

On 2018-05-15 15:02:48 +, ChatPristi wrote:
> I have a SELECT command (in partitionned tables) that failed with:
> psql:/tmp/query.txt:1: ERROR:  out of memory
> DETAIL:  Cannot enlarge string buffer containing 1073741818 bytes by 32 more
> bytes.

1073741818 is a bit less than 1GB and 1073741818+32 is a bit more. So
you are obviously hitting a 1GB limit here.

Given that 1GB is the maximum length of a character type value in
PostgreSQL and the error message mentions a "string buffer", I suspect
that your query tries to construct a very long string. Try to rewrite
the query so that it creates several shorter strings instead.

    hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Renice on Postgresql process

2018-05-24 Thread Peter J. Holzer

On 2018-05-07 11:04:31 -0700, Ben Chobot wrote:
> On May 7, 2018, at 7:46 AM, Ayappan P2  wrote:
> We are using Postgresql in AIX. Unlike some other databases, Postgresql 
> has
> lot of other process running in the background along with the main 
> process.
> 
> We do "renice" only on the Postgres main process. Is it sufficient to have
> higher priority only for the main process or we have to do "renice" for 
> all
> the Postgresql related process ( like wal writer, logger , checkpointer
> etc.,) ?
> 
> 
> What do you hope to achieve with your renicing? There is a compelling school 
> of
> thought which holds that nice database processes take longer to relinquish
> their resources, which doesn't end up helping anything at all.

I think you misunderstood Ayappan. He doesn't want to make the database
processes nicer, he wants to make them less nice ("higher priority").

So in theory, they should be able to complete requests faster because
they aren't interrupted by other processes so often. 

Whether that is true, depends on whether the processes are cpu or disk
bound and what exactly the "nice value" affects. The best way to find
out is probably to try it.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: case statement within insert

2018-05-28 Thread Peter J. Holzer

On 2018-05-25 17:04:25 +0800, tango ward wrote:
> 
> 
> I want to insert data from mysql into a table in postgresql. I want to check
> when the subjectcode contains PE or NSTP so I can assign True or False to
> another column in destination DB.
> 
> 
> # Source data:
> 
>  # Source data: MySQL
>    curr_msql.execute(''' SELECT code, subjectname
>   FROM test_subj ''')
> 
> # Destination
> for row in curr_msql:
>     curr_psql.execute(''' INSERT INTO subs (
>     created, modified,
>     subjcode, subjname,
>     is_pe_or_nstp)
> 
>   VALUES (current_timestamp, 
> current_timestamp,
>   %s, %s,
>   CASE
>     WHEN code like '%%PE%%' or code like '%%NSTP%%'
>     THEN True
>     ELSE False
>     END) ''', (row['code'], row['subjectname']))

Others have already explained why that doesn't work and how you can fix
it.

But since you are inserting one row at a time in a Python loop, I don't
see any advantage in writing the condition in SQL. Do it in Python
instead:

for row in curr_msql:
is_pe_or_nstp = 'PE' in row['code'] or 'NSTP' in row['code']
curr_psql.execute(''' INSERT INTO subs (
created, modified,
subjcode, subjname,
is_pe_or_nstp)

  VALUES (current_timestamp, current_timestamp,
  %s, %s,
  %s)
  ''',
  (row['code'], row['subjectname'],
           is_pe_or_nstp,))

A bit more readable, IMHO.

Alternatively, import the complete table *unchanged* from MySQL (you can
use copy_from() for this which is much faster than individual inserts),
and then convert it with a single SQL statement.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: execute block like Firebird does

2018-05-30 Thread Peter J. Holzer

On 2018-05-30 15:16:56 +0200, Félix GERZAGUET wrote:
> On Wed, May 30, 2018 at 2:50 PM, PegoraroF10  wrote:
> So, is that possible to change a DO structure is ran, to be possible to
> return one or more values ?
> It would be like ...
> DO returns(ID Integer, Description Text) as
> $$
> begin
>   select ...
>   insert ...
>   select ... into ID, Description
> end
> $$
> 
> Using this way would be possible to create that script on client, call it
> just one time and have a result for that execution, exactly the way a
> "execute block" does on Firebird.
[...]
> 
> Since you seems to be able to construct dynamically the statement from the
> client application, I think it is already possible to do that in one SQL
> statement using CTE.
[...]
> So you can construct arbitrary complex thing using any combination of SELECT,
> UPDATE and DELETE.

But note that:

| The sub-statements in WITH are executed concurrently with each other and
| with the main query. Therefore, when using data-modifying statements in
| WITH, the order in which the specified updates actually happen is
| unpredictable. All the statements are executed with the same snapshot
| (see Chapter 13), so they cannot “see” one another's effects on the
| target tables. This alleviates the effects of the unpredictability of
| the actual order of row updates, and means that RETURNING data is the
| only way to communicate changes between different WITH sub-statements
| and the main query. 
-- 
https://www.postgresql.org/docs/10/static/queries-with.html#QUERIES-WITH-MODIFYING

In a DO block the statements are processed sequentially and each
statement sees the results of the previous statements.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Inefficient plan on 10.4

2018-07-05 Thread Peter J. Holzer

I have this table:

Table "public.relation"
  Column   |Type | Collation | Nullable |   
Default
---+-+---+--+--
 parent| integer |   | not null | 
 child | integer |   | not null | 
 type  | character varying   |   |  | 
 sortorder | integer |   |  | 
 valid_from| timestamp without time zone |   | not null | now()
 from_job_queue_id | integer |   |  | 
 id| integer |   | not null | 
nextval('relation_id_seq'::regclass)
Indexes:
"relation_pkey" PRIMARY KEY, btree (id)
"relation_child_idx" btree (child)
"relation_parent_idx" btree (parent)
Foreign-key constraints:
"relation_child_fkey" FOREIGN KEY (child) REFERENCES concept(id) DEFERRABLE
"relation_parent_fkey" FOREIGN KEY (parent) REFERENCES concept(id) 
DEFERRABLE

which has about 150 million rows:

wdsah=> select count(*) from relation;
   count   
---
 147810590
(1 row)

I'm trying to get to get the siblings of a node (including itself):

select r2.parent, r2.type, r2.child
from relation r1, relation r2
where r1.child=643541 and r2.parent=r1.parent
order by r2.type

This worked fine on 9.5, but on 10.4 it takes several seconds:

 QUERY 
PLAN  
-
 Gather Merge  (cost=12032221.68..25195781.12 rows=112822632 width=15) (actual 
time=4086.255..4086.257 rows=2 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=12031221.66..12172249.95 rows=56411316 width=15) (actual 
time=4080.862..4080.862 rows=1 loops=3)
 Sort Key: r2.type
 Sort Method: quicksort  Memory: 25kB
 ->  Nested Loop  (cost=0.57..1876228.04 rows=56411316 width=15) 
(actual time=3036.547..4080.826 rows=1 loops=3)
   ->  Parallel Seq Scan on relation r1  (cost=0.00..1856722.83 
rows=1 width=4) (actual time=3036.525..4080.802 rows=0 loops=3)
 Filter: (child = 643541)
 Rows Removed by Filter: 49270196
   ->  Index Scan using relation_parent_idx on relation r2  
(cost=0.57..17041.69 rows=246351 width=15) (actual time=0.053..0.055 rows=2 
loops=1)
 Index Cond: (parent = r1.parent)
 Planning time: 0.418 ms
 Execution time: 4090.442 ms

If I disable sequential scans, it works fine again:

  QUERY 
PLAN   
---
 Sort  (cost=25246497.92..25584965.81 rows=135387158 width=15) (actual 
time=0.119..0.119 rows=2 loops=1)
   Sort Key: r2.type
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=1.14..19513.79 rows=135387158 width=15) (actual 
time=0.101..0.104 rows=2 loops=1)
 ->  Index Scan using relation_child_idx on relation r1  
(cost=0.57..8.59 rows=1 width=4) (actual time=0.079..0.080 rows=1 loops=1)
   Index Cond: (child = 643541)
 ->  Index Scan using relation_parent_idx on relation r2  
(cost=0.57..17041.69 rows=246351 width=15) (actual time=0.018..0.020 rows=2 
loops=1)
   Index Cond: (parent = r1.parent)
 Planning time: 0.446 ms
 Execution time: 0.165 ms

There are several points about these plans that I don't understand:

* Why does the nested loop expect 56E6 or even 135E6 rows? It expects 1
  row for the outer table and then 246351 rows in the inner table for
  each of them. 1 * 246351 == 246351. So it should expect 246351 rows.
  (246351 itself is way too high, but the table has a very skewed
  distribution, and I've already set the statistics target to the
  maximum of 1, so there's not much I can do about that)

* Why does the Parallel Seq Scan report actual rows=0? It did return 1
  row (or is that the average per worker? That would be < 1, and
  probably rounded down to 0)

hp

PS: The obvious workaround is to remove "order by r2.type". I can easily
get the required partial order in the application. But I'd like to
understand what the optimizer is doing here.


-- 
   _  | Peter J. Holzer| we build much bigger, bett

Re: User documentation vs Official Docs

2018-07-19 Thread Peter J. Holzer

On 2018-07-18 08:09:35 +1000, Tim Cross wrote:
> If using web widgets to author content on the wiki is the main
> impediment for contributing content, maybe we should see if the wiki
> provides alternative access methods. I've used wikis in the past which
> allowed users to upload content via xmlrpc, api etc. Perhaps something
> similar could be made available for those making significant
> contributions or to a select few 'curators' who could accept content
> from others.

There are also browser plugins like It's all text, textern, wasavi, etc.
which allow the user to use a real text editor instead of a text area.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: User documentation vs Official Docs

2018-07-19 Thread Peter J. Holzer

On 2018-07-19 11:43:18 -0600, Rob Sargent wrote:
> On 07/19/2018 11:04 AM, Peter J. Holzer wrote:
> > On 2018-07-18 08:09:35 +1000, Tim Cross wrote:
> > > If using web widgets to author content on the wiki is the main
> > > impediment for contributing content, maybe we should see if the wiki
> > > provides alternative access methods. I've used wikis in the past which
> > > allowed users to upload content via xmlrpc, api etc. Perhaps something
> > > similar could be made available for those making significant
> > > contributions or to a select few 'curators' who could accept content
> > > from others.
> > There are also browser plugins like It's all text, textern, wasavi, etc.
> > which allow the user to use a real text editor instead of a text area.
> > 
> Keep in mind that Chrome broke "It's all text" compatibility, at least with
> emacs and now it's done via "Edit with Emacs".  This in my experience is a
> step backwards from "It's all text".

Oh, Chrome, too?

Firefox also broke compatibility with a lot of add-ons recently ("It's
all text" among them). So I switched to textern, which was a bit more
complicated to set up, but otherwise works almost the same.

But yeah, browser add-ons have a certain tendency to succumb to bit-rot,
so they are nice tools for a user but not something a service provider
should depend on.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Connections on cluster not being logged

2018-07-24 Thread Peter J. Holzer

On 2018-07-24 06:46:18 -0700, Adrian Klaver wrote:
> On 07/24/2018 06:25 AM, Sandy Becker wrote:
> > There is only one set of logs since it's a hardware cluster.  The two
> > nodes share the underlying database storage.  Not sure why, but when the
> 
> The community Postgres can't do that, have two instances share the same data
> storage, at least AFAIK. So are you using some fork of Postgres or are there
> actually two data directories?

Maybe Postgresql only runs on the active node? I.e. a classic failover
cluster.

(I'm still not sure what a "hardware cluster" is. Probably some kind of
appliance which packages two nodes, some storage and the HA software.)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Duplicating data folder without tablespace, for read access

2018-08-14 Thread Peter J. Holzer

On 2018-08-14 11:57:38 -0400, Jack Cushman wrote:
> I have a large database of text, with a 600GB table and a 100GB table 
> connected
> by a join table. They both see occasional updates throughout the week. Once a
> week I want to "cut a release," meaning I will clone just the 100GB table and
> copy it to a "release" server for read-only web access.
> 
> My procedure would be:
> 
> - keep the 600GB table on a separate tablespace
> - cleanly stop postgres on both servers
> - copy the data folder to the release server
> - delete pg_tblspc/* on the release server
> - start postgres on both servers

If you copy the whole database anyway before deleting the tablespace:

Why don't you just drop the 600 GB table on the release server?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: PG8.3->10 migration data differences

2018-09-10 Thread Peter J. Holzer

On 2018-09-10 11:57:42 +0300, Achilleas Mantzios wrote:
> On 10/09/2018 11:22, Csaba Ragasits wrote:
> Hello,
> 
> We would like to migrate our old databases to new postgres server,
> with the simple backup-restore process. The process run fine
> without errors but when we're  checking the source and the
> migrated datas as field level, we found the following differences:
[...]
> 
> Field type: REAL
> pg93: 2.2
> pg10: 2.2005
> 
> 
> Those have to do with rounding. Precision for real is 6 decimal
> digits. Your difference is on the 8-th digit.

Nitpick: Precision is 24 binary digits which is *about* 6 decimal
digits. The number which is actually stored (on both pg93 and pg10) is
actually 2.200476837158203125 (1.00011001100110011001101 * 2^1 in
binary). Apparently Pg9.3 rounds this to "2.2" on output while Pg10
prints the more precise (but still not exact) "2.2005". 

(I would argue that the Pg9.3 output is better, since it represents the
same value in fewer digits, but always printing the minimum number of
digits necessary is surprisingly difficult.)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Out of Memory

2018-09-29 Thread Peter J. Holzer

On 2018-09-28 07:23:59 +0200, Laurenz Albe wrote:
> Rob Sargent wrote:
> > > Christoph Moench-Tegeder wrote:
> > > > ## Laurenz Albe (laurenz.a...@cybertec.at):
> > > > 
> > > > > vm.overcommit_memory = 2
> > > > > vm_overcommit_ratio = 100
> > > > > 
> > > > > Linux commits (swap * overcommit_ratio * RAM / 100),
> > > > 
> > > >  ^
> > > >  That should be a "+".
> > > 
> > > Yes; shame on me for careless typing, and thank you for the
> > > correction.
> > 
> > Are there any parentheses needed in that formula?
> 
> No.  It is swap space plus a certain percentage of RAM.
> 
> I don't know how the Linux developers came up with that
> weird formula.

I suspect they copied it from some other Unix.

Traditionally, Unix allocated all memory in the swap space. You could
say that the swap space was the "real memory" and RAM was a cache for
that (Hence the rule that swap must be at least as large as RAM and
should preferrably be 2 or 4 times the size of RAM). So, when Unixes
started to allow allocating more virtual memory than swap space, they
were "overcommitting". 

But for Linux that doesn't make much sense, since a page lived either in
RAM /or/ in swap right from the start, so the limit was always RAM+swap,
not swap alone, and you are only overcommitting if you exceeded the size
of the sum. The overcommitment in Linux is of a different kind: Linux
uses copy on write whereever it can (e.g. when forking processes, but
also when mallocing memory), and a CoW page may or may not be written in
the future. It only needs additional space when it's actually written,
so by counting the page only once (hoping that there will be enough
space if and when that page is written) the kernel is overcommitting
memory. 

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: How to maintain the csv log files in pg_log directory only for past 30 days

2018-09-30 Thread Peter J. Holzer

On 2018-09-29 07:12:32 -0700, Adrian Klaver wrote:
> On 9/28/18 7:05 PM, Raghavendra Rao J S V wrote:
> > Hope you all are recommending below settings to maintain only max 30
> > days logs in *pg_log* directory. Please correct me if I am wrong.
> 
> Well it would actually be 31 days as:
> 
> http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
> 
> %d
> Replaced by the day of the month as a decimal number [01,31]. [ tm_mday]

Additionally, the log file for the 31st of a month is usually only
overwritten after 61 days.

> Not sure if that matters or not.

It may or may not. Personally I prefer to use find -mtime (or logrotate,
or cleandir, or keepfree, ...) to avoid the irregularities of the
Gregorian calendar.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: COPY threads

2018-10-11 Thread Peter J. Holzer

On 2018-10-10 17:19:50 -0400, Ravi Krishna wrote:
> > On Oct 10, 2018, at 17:18 , Andres Freund  wrote:
> > On October 10, 2018 2:15:19 PM PDT, Ravi Krishna  wrote:
> >> If I have a large file with say 400 million rows, can I first split it
> >> into 10 files of 40 million rows each and then fire up 10 different
> >> COPY sessions , each reading from a split file, but copying into the
> >> same table.  I thought not.  It will be great if we can do this.
> > 
> > Yes, you can.
> > 
> Thank you.  Let me test it and see the benefit. We have a use case for this.

You should of course test this on your own hardware with your own data,
but here are the results of a simple benchmark (import 1 million rows
into a table without indexes via different methods) I ran a few weeks
ago on one of our servers:

https://github.com/hjp/dbbench/blob/master/import_pg_comparison/results/claudrin.2018-09-22/results.png

y axis is rows per second. x axis are different runs, sorted from
slowest to fastest (so 2 is the median).

As you can see it doesn't parallelize perfectly: 2 copy processes are
only about 50 % faster than 1, and 4 are about 33 % faster than 2. But
there is a still quite a respectable performance boost.

hp

PS: The script is of course in the same repo, but I didn't include the
test data because I don't think I'm allowed to include that.

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: convert real to numeric.

2018-10-18 Thread Peter J. Holzer

On 2018-10-18 10:15:40 -0400, Tom Lane wrote:
> Alessandro Aste  writes:
> > I need to convert an SQL field from real to numeric, but I’m getting a
> > strange behavior.
> > select amount, amount::numeric, amount::numeric(16,4),
> > amount::varchar::numeric from mytable where id = 32560545;
> > Result:
> > 17637.75, 17637.8, 17637.8000, 17637.75
> 
> You realize of course that "real" (a/k/a float4) can only be trusted
> to six decimal digits in the first place.  When I try this, I get

24 bits, actually. Using decimal digits when talking about binary
numbers is misleading.

[...]
> You could ju-jitsu the system into duplicating that behavior by casting
> to text (which invokes float4out) and then to numeric:
> 
> regression=# set extra_float_digits to 1;
> SET
> regression=# select '17637.75'::real, '17637.75'::real::numeric;
>   float4  | numeric 
> --+-
>  17637.75 | 17637.8
> (1 row)
> 
> regression=# select '17637.75'::real, '17637.75'::real::text::numeric;
>   float4  | numeric  
> --+--
>  17637.75 | 17637.75
> (1 row)

I suggest casting first to float8 and then to numeric. The conversion
from float4 to float8 is exact, and any rounding error introduced by the
float8->numeric conversion is certainly much smaller than the
uncertainty of the original float4 value.

> I think, however, that you ought to spend some time contemplating
> the fact that that extra digit is partially garbage.

If we assume that 17637.75 was the result of rounding a more precise
value to a float4, then the real value was somewhere between
17637.7490234375 and 17637.7509765625. Rounding to 17637.8 introduces an
error almost 50 times larger.

> I'm not really convinced that doing it like this rather than doing the
> standard conversion is a good idea.  You can't manufacture precision
> where there is none

It may be that the real value of that number is only known to +/- 0.1.
Or maybe only to +/- 100. But postgresql can't know that, and
gratuitously adding additional rounding errors doesn't help.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: convert real to numeric.

2018-10-20 Thread Peter J. Holzer

On 2018-10-18 18:58:13 -0400, Tom Lane wrote:
> "Peter J. Holzer"  writes:
> > On 2018-10-18 10:15:40 -0400, Tom Lane wrote:
> >> You could ju-jitsu the system into duplicating that behavior by casting
> >> to text (which invokes float4out) and then to numeric:
> 
> > I suggest casting first to float8 and then to numeric. The conversion
> > from float4 to float8 is exact, and any rounding error introduced by the
> > float8->numeric conversion is certainly much smaller than the
> > uncertainty of the original float4 value.
> 
> I do not think that the OP will care for the results of that.  The problem
> is that now the output function will think that the result is worth
> printing to 16 digits, and the last ten or so of those will be garbage.

You are thinking about "printing" (i.e., presentation to a user), I am
thinking about "converting" (i.e. storing the value as accurately as
possible in a different presentation). These are different things,

You should only think about "printing" when you actually print a value.
Depending on the inherent accuracy of the value and the purpose of the
display the best way to display the float4 value 17637.740234375 may be
'17637.740234375' (the exact value), '17637.74' (the minimum number of
decimal digits necessary to reconstruct the exact value), '17638' (we
don't care about fractions), '17,700' (three digits ought to be enough
for anybody), '17.7k' (the same, but more compact) or anything between.

It is for the application programmer to decide how to display a value,
because the programmer knows what it means, where it comes from and
what the user is supposed to do with that information. The runtime
environment doesn't know this. So it shouldn't throw away accuracy.
Often even parts of the application don't know this. So they shouldn't
either.

> As an example, even though the cited value happens to work nicely:
> 
> regression=# select '17637.75'::float4::float8;
>   float8  
> --
>  17637.75
> (1 row)
> 
> nearby ones don't:
> 
> regression=# select '17637.74'::float4::float8;
>  float8  
> -
>  17637.740234375
> (1 row)
> 
> Yeah, in some sense that's a valid representation of the stored float4,

It is indeed the precise value which is stored (9030523 / 512).

> but it likely has little to do with the originally presented value.

It is much closer to the original value than 17637.7 (the relative
errors are about 1.33e-8 and 2.27e-6, so the error introduced by
PostgreSQLs default rounding is 170 times greater).

I think using FLT_DIG and DBL_DIG for converting from binary to decimal
is wrong. They represent the "number of decimal digits, q, such that any
floating-point number with q decimal digits can be rounded into a
floating-point number with p radix b digits and back again without
change to the q decimal digits" (ISO/IEC 9899:2011), which is not the
same as the number of decimal digits sufficient to convert a
binary to decimal and back again. Unfortunately, the latter is not a
constant and converting binary to decimal isn't trivial - nevertheless
the problem has been solved (at least for IEEE-754 arithmetic), so I
might take a stab at it (even with my limited numerical skills).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Why does jsonb_set() remove non-mentioned keys?

2019-07-05 Thread Peter J. Holzer

On 2019-07-05 10:59:31 +0200, Thomas Kellerer wrote:
> Gianni Ceccarelli schrieb am 05.07.2019 um 10:00:
> >> strict functions with sql null inputs yield sql null output without
> >> even executing the function
> > 
> > So when the SQL-level executor sees a call to any function declared
> > strict with some NULL parameters, it doesn't call the function at
> > all. `whatever_my_function('a string',1234,NULL)` is always `NULL`
> > 
> 
> Ah, I see. Thanks for the clarification
> 
> Then I would question if declaring jsonb_set as "strict" makes sense

I think it does but I raise the same question for to_jsonb. It's defined
on anyelement and the documentation says:

| Returns the value as json or jsonb. Arrays and composites are converted
| (recursively) to arrays and objects; otherwise, if there is a cast from
| the type to json, the cast function will be used to perform the
| conversion; otherwise, a scalar value is produced. For any scalar type
| other than a number, a Boolean, or a null value, the text representation
| will be used, in such a fashion that it is a valid json or jsonb value.

The documentation explicitely singles out "a number, a Boolean, or a
null value", but doesn't specify how they are treated. I would expect
that they are treated equivalently, though: An SQL number is converted
to a JSON number, an SQL boolean is converted to JSON true or false and
an SQL null is converted to JSON null. Returning SQL null instead of a
JSON null breaks that expectation for no discernible reason. It also
isn't consistent, since an SQL null in an array or composite is
converted to a JSON null, as I would expect.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Rearchitecting for storage

2019-07-19 Thread Peter J. Holzer

On 2019-07-19 10:41:31 -0400, Matthew Pounsett wrote:
> Okay.  So I guess the short answer is no, nobody really knows how to
> judge how much space is required for an upgrade?  :)

As I understand it, a pg_upgrade --link uses only negligible extra
space. It duplicates a bit of householding information, but not your
data tables or indexes. Your 18 TB table will definitely not be duplicated
during the upgrade if you can use --link.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Rearchitecting for storage

2019-07-19 Thread Peter J. Holzer

On 2019-07-19 11:37:52 -0400, Matthew Pounsett wrote:
> On Fri, 19 Jul 2019 at 11:25, Peter J. Holzer  wrote:
> 
> On 2019-07-19 10:41:31 -0400, Matthew Pounsett wrote:
> > Okay.  So I guess the short answer is no, nobody really knows how to
> > judge how much space is required for an upgrade?  :)
> 
> As I understand it, a pg_upgrade --link uses only negligible extra
> space. It duplicates a bit of householding information, but not your
> data tables or indexes. Your 18 TB table will definitely not be duplicated
> during the upgrade if you can use --link.
> 
> 
> The documentation for pg_upgrade --link says that the old copy is no longer
> usable, which means it's modifying files that are linked.  If it were only
> modifying small housekeeping files, then it would be most efficient not to 
> link
> those, which would keep both copies of the db usable.

This was discussed recently: The old database is made intentionally
unusable to prevent accidentally starting both (which would result in
data corruption).

> That seems incompatible with your suggestion that it doesn't need to
> modify the data files.  Depending on how it goes about doing that, it
> could mean a significant short-term increase in storage requirements
> while the data is being converted.  
> 
> Going back to our recent 'reindex database' attempt, pgsql does not
> necessarily do these things in the most storage-efficient manner; it
> seems entirely likely that it would choose to use links to duplicate
> the data directory, then create copies of each data file as it
> converts them over, then link that back to the original for an atomic
> replacement.  That could eat up a HUGE amount of storage during the
> conversion process without the start and end sizes being very
> different at all.  

I can't really think of a scenario in which this would be the best
(or even a good) strategy to convert the database. I am quite confident
that pg_upgrade doesn't do that at present and reasonably confident that
it won't do it in the future.


> Sorry, but I can't reconcile your use of "as I understand it" with
> your use of "definitely".  It sounds like you're guessing, rather than
> speaking from direct knowledge of how the internals of pg_upgrade.

I don't have direct knowledge of the internals of pg_upgrade, but I
have upgraded a database of about 1 TB at least twice with --link. Since
I had much less than 1 TB of free space and the upgrade completed very
quickly, I am very confident that no user defined tables are copied. I
have also been on this mailing list for a few years and read quite a few
discussions about the usage of pg_upgrade in that time (though I may not
always have paid much attention to them).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: How to run a task continuously in the background

2019-07-19 Thread Peter J. Holzer

On 2019-07-17 12:34:41 +0100, Rory Campbell-Lange wrote:
> We make extensive use of postgresql 'contacting' an external process,
> but our use case involves many databases in a cluster rather than many
> schemas. Also we don't have to deal with cancelling the external
> process. We chose this architecture to avoid many postgresql
> connections for LISTEN/NOTIFY.
> 
> We use a pgmemcache interface trigger to update memcached with a
> notification of the database holding items to be flushed.

Memcached is a cache. It will delete old items if storage is full (or if
they expire). Is this not a problem in your case or did you make sure
that this cannot happen?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Default ordering option

2019-07-27 Thread Peter J. Holzer

On 2019-07-26 09:52:52 +0200, Cyril Champier wrote:
> No, the code I pasted was an existing production bug: the last_name
> should have been unique, so the selected patient would always be the
> same. This should have been detected in tests, but since the order was
> "almost always the same", our test was green 99% of the time, so we
> discarded it as flaky.
> 
> Fuzzy testing could be an option, but this would go too far, as for
> Peter extension suggestion. We have huge existing codebase with more
> than 10K tests, and I do not want to modify our whole testing
> strategy.
> 
> Meanwhile, I went for an ORM patch (ActiveRecord) and forbid usages
> that can workaround it.

Another idea:

How do ypu prepare your test data? Do you have a (possibly large) test
database or do you populate a test database with test-specific data in a
fixture?

If you do the latter, you might be able insert the data in random order.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Why does backend send buffer size hardcoded at 8KB?

2019-07-28 Thread Peter J. Holzer

On 2019-07-27 19:10:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-07-27 18:34:50 -0400, Tom Lane wrote:
> >> Yeah.  The existing commentary about that is basically justifying 8K
> >> as being large enough to avoid performance issues; if somebody can
> >> show that that's not true, I wouldn't have any hesitation about
> >> kicking it up.
> 
> > You think that unnecessary fragmentation, which I did show, isn't good
> > enough? That does have cost on the network level, even if it possibly
> > doesn't show up that much in timing.
> 
> I think it is worth doing some testing, rather than just blindly changing
> buffer size, because we don't know how much we'd have to change it to
> have any useful effect.

I did a little test with nttcp between two of our servers (1 Gbit to
different switches, switches connected by 10 Gbit). The difference
between a 1024 byte buffer and a 1460 byte buffer is small but
measurable. Anything larger doesn't make a difference. So increasing the
buffer beyond 8 kB probably doesn't improve performance on a 1 Gbit LAN.

I didn't test 10 Gbit LAN or WAN - those might be different.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Tablespace column value null on select * from pg_tables

2019-07-28 Thread Peter J. Holzer

On 2019-07-16 19:20:24 +, Alex Williams wrote:
> 1. postgres=# ALTER DATABASE mydatabase SET TABLESPACE pg_default;
> 
> ERROR:  some relations of database "mydatabase" are already in tablespace 
> "pg_default"
> 
> HINT:  You must move them back to the database's default tablespace before 
> using this command.

What's the reason for this error? Wouldn't it be simpler to leave
relations alone which are already in the correct tablespace?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: How do I create a Backup Operator account ?

2019-07-31 Thread Peter J. Holzer

On 2019-07-31 07:48:36 +0200, Luca Ferrari wrote:
> On Wed, Jul 31, 2019 at 2:48 AM Marcos Aurelio Nobre
>  wrote:
> > But I don't know how to implement this on Linux, nor how to write this 
> > entry in the pg_hba.conf file.
> 
> I would start with an entry in pg_hba.conf like the following:
> 
> hostall   pg_backup_usernamelocalhost   md5
> 
> or
> 
> hostall   pg_backup_usernamelocalhost   md5
> 
> 
> The problem then comes on how to prevent the operating system user to
> run psql.

This problem can be solved by not granting anyone shell access as that
user. It is only used as a target for sudo, and sudo is configured to
run only pg_dump and pg_restore as that user (plus maybe other programs
to list available backups, review logs, remove old backups, ...)

A web interface might be used as an alternative to sudo.

> Even something like the following (untested) in /etc/sudoers will NOT
> prevent the user to access the database:
> 
> 
> User_Alias PGBACKUPUSERS = pg_backup_username
> Cmd_Alias PGBACKUP = /usr/local/bin/pg_dump,
> /usr/local/bin/pg_restore, ! /usr/local/bin/psql
> PGBACKUPUSERS backup_host = PGBACKUP

This is the wrong way around. It should be something like

alice, bob = (pg_backup_username) /usr/local/bin/pg_dump

(Apologies if I didn't get the syntax right. Slogging through the sudoes
manual reminded me why I wrote xssd 15 years ago).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Understanding PostgreSQL installer debug log

2019-08-09 Thread Peter J. Holzer

On 2019-08-09 16:55:13 +0530, Ramesh Maddi wrote:
> performing post-bootstrap initialization ... Failed to initialize the database
> cluster with initdb with the error "invalid byte sequence for encoding "UTF8"
> 
> Script stderr:
>  FATAL:  invalid byte sequence for encoding "UTF8": 0xeb 0x2f 0xdb
> child process exited with exit code 1
> initdb: removing contents of data directory "/home/postgres/9.6/data"
> 
> Error running /home/postgres/9.6/installer/server/initcluster.sh "postgres"
> "postgres" "/home/postgres/9.6" "/home/postgres/9.6/data" 5432 DEFAULT: FATAL:
>  invalid byte sequence for encoding "UTF8": 0xeb 0x2f 0xdb

0xeb 0x2f 0xdb is indeed not valid UTF-8. So whereever this sequence
comes from isn't UTF-8 encoded. In ISO-8859-1 that sequence would be
"ë/Û". Does this ring any bell? Any subdirectory with a name that ends
in "ë", for example?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: How to check if a field exists in NEW in trigger

2019-08-09 Thread Peter J. Holzer

On 2019-08-07 15:54:33 -0700, Igal @ Lucee.org wrote:
> There is no reason, however, for an email address to be not-lower-cased, so
> while in some cases (no pun intended) it makes sense to keep the original
> CaSe while performing a case insensitive comparison, when it comes to email
> addresses I rather collapse the value upon insertion/update.

You should be aware that according to the specs, the local part of an
email address is case-sensitive. So joesm...@example.com and
joesm...@example.com might be two distinct email addresses.

Admittedly I've never seen that in practice (although I've seen a lot of
other problems caused by people who made unwarranted assumptions about
email addresses).

    hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Quoting style (was: Bulk Inserts)

2019-08-11 Thread Peter J. Holzer

On 2019-08-10 21:01:50 -0600, Rob Sargent wrote:
> On Aug 10, 2019, at 8:47 PM, Souvik Bhattacherjee 
> wrote:
> 
> Hi Adrian,
> 
> Thanks for the response.
> 
> > Yes, but you will some code via client or function that batches the 
> > inserts for you.
> 
> Could you please elaborate a bit on how EXP 1 could be performed such that
> it uses bulk inserts?
> 
> Best,
> -SB
> 
> On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver 
> wrote:
> 
[70 lines of full quote removed]

> Top-posting (i.e. putting your reply at the top is discouraged here)

He didn't really top-post. He quoted the relevant part of Adrian's
posting and then wrote his reply below that. This is the style I prefer,
because it makes it really clear what one is replying to.

After his reply, he quoted Adrian's posting again, this time completely.
I think this is unnecessary and confusing (you apparently didn't even
see that he quoted something above his reply). But it's not as bad as
quoting everything below the answer (or - as you did - quoting
everything before the answer which I think is even worse: If I don't see
any original content within the first 100 lines or so I usually skip the
rest).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Recomended front ends?

2019-08-11 Thread Peter J. Holzer

On 2019-08-08 10:47:47 -0700, Rich Shepard wrote:
> On Thu, 8 Aug 2019, Adrian Klaver wrote:
> > Unfortunately it does not:
> > https://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys
> > 
> > Given that the issue:
> > https://code.djangoproject.com/ticket/373
> > is 14 years old does not inspire confidence that it will change anytime 
> > soon.
> 
> Adrian,
> 
> That's really interesting. I don't see how a framework cannot implement
> multi-column PKs.

You are a database guy. You have a database and want to write an
application for it. Even in a greenfield project, you probably do the
database design first. (I know I do)

The Django framwork comes from the opposite direction. The programmers
aren't expected to know much about relational databases[1]. They are
expected to write their application and Django provides (among other
things like session and form handling) a persitence layer which happens
to be backed by a relational database.

So Django's ORM layer isn't intended to deal with any possible (or even
any reasonable) database model - just with database models generated by
Django. 

Django lets you use "unmanaged" tables, but it is quite noticeable that
this isn't the primary use case.

> Many databases I have include tables for samples (geochemical, biological,
> physical) where the PK for each row is location, date, parameter. Good thing
> I don't like browser user interfaces, eh? :-)

Django isn't the only web framework.

hp

[1] It should be noted, however, that the Django ORM is what Joel
Spolsky calls a leaky abstraction. If you are ignorant about how
RDBMSs work, your application will probably be horrendously slow.
    Even if you do know this, you often have to bend over backwards to
get reasonable performance.

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Databases and servers

2019-08-20 Thread Peter J. Holzer

On 2019-08-20 10:33:17 +, Karl Martin Skoldebrand wrote:
> I just discovered that a client has done this:
> 
> They have two web applications A1 and A2. They have seperate hostnames/URLs.
> Both have a production and a test database A1p and A1t/ A2p and A2t.
> 
> What they've done is have both A1p and A2p on the same actual databaser server
> and A1t and A2t on the same server.
> 
> So, I'm thinking - if a bug in application A1 crashes the application and
> database badly it will risk bringing down both services A1 and A2. The same
> risk would be evident on a successful security breach.
> 
> I would prefer to A1p and A2p on seperate servers, maybe keeping A1t and A2t 
> on
> the same. (This is what seems to be happening when the database servers are
> being repladed).

On rereading this I notice that I'm not sure what that means. If you
propose replacing the two servers with three (two production, one test)
or even four (two production and two test), I agree.

If you want to keep two servers, but rearrange them so that one server
has both the production and the test database for each app on the same
server, see below.

> What is the general thought on the current setup?

Without knowing the details I think I would side with your client here:
It seems to me that the risk of accidentally clobbering a production
database which is in the same host as the test database is higher than
the risk of two different production databases interfering with each
other. Also, if you have the test and production database on the same
host, there are some procedures which you can't safely test (e.g. an OS
upgrade).

I would think about putting each database in virtual machine or at least
a container, though.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Permission for not Django app to do Write and Read

2019-08-25 Thread Peter J. Holzer

On 2019-08-24 13:22:38 +0200, Luca Ferrari wrote:
> On Sat, Aug 24, 2019 at 11:53 AM Peter Wainaina  wrote:
> >
> > Thanks much for the response. This is what I mean am a database
  ^
> > administrator for a production company and the product owner doesn't
^
> > want me to be able to either read or write information that will
> > come from Django application.
> 
> I personally hate this kind of setup, because it does not make any
> sense to me that a developer must setup an application that must
> interact with a database that the developer himself cannot interact
> with.

Peter is the DBA here, not the developer. I don't think you can prevent
the DBA from seeing the data in the database - except by encrypting it
(as you suggested below). I guess it would be possible to write a Django
backend which implements an encryption layer, but getting decent
performance out of it might be ... challenging.

> However, keep it simple: define a django user, assign each object to
> such user, revoke any permission from public.
> And then let's the product owner setup a password.

You can still access the tables as postgres.

Of course you could set up another database instance where you don't
have DBA privileges, but then the product owner has to trust the system
administrator instead of the DBA. At some point you have to trust
somebody or do everything yourself.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Recomended front ends?

2019-08-28 Thread Peter J. Holzer

On 2019-08-27 16:04:02 +0100, Daniele Varrazzo wrote:
> Using the Django ORM to create complex queries is a joy (especially
> nesting subqueries),

Not for me. I usually know what SQL I want to execute. Then I have to
convert that SQL into a weird[1] and limited query language composed of
method calls so that the ORM can turn it into SQL again (hopefully
something close to the SQL I had in mind). It hurts my brain. 

hp

[1] Well, SQL is weird, too. But it is weird in an "invented 40+ years
ago and grown organically since" way, not a "constrained by the syntax
of a different language" way.

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Recomended front ends?

2019-08-28 Thread Peter J. Holzer

On 2019-08-27 08:16:08 -0700, Adrian Klaver wrote:
> Django takes Postgres as it's reference database which makes things easier,
> especially when you add in
> contrib.postgres(https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/)

Looks nice.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: floating point output

2019-09-09 Thread Peter J. Holzer

On 2019-09-09 01:03:06 -0400, Tom Lane wrote:
> Rob Sargent  writes:
> > Below I show a floating point column from a recent query.  
> 
> > Here's my guess at what's happening in the formatting:
> >   1) scientific notation if smaller than 1.0e-04 (I’m not in the 
> > very-much-larger-than-one world)
> >   2) 15 digits after the most signicant 0
> >   3) remove trailing zeros
> 
> > I may be alone in this but I find the presentation messy and that's
> > unusual in the PostgreSQL world.
> 
> In released versions of Postgres, float8out basically devolves to
> sprintf with "%.15g" format, and the above is what POSIX says that
> should do.
> 
> As of v12, we'll use the "Ryu" output algorithm, which adjusts
> the number of displayed digits to ensure that the value will
> re-load exactly by default.

Cool. I looked into that some time ago, but never got around to
implement it.

> I'm afraid that will probably make your complaint about variable
> numbers of digits worse not better. It's unlikely anybody will
> entertain an argument to undo it, though, because (a) Ryu is faster
> than the old code, and (b) it's hard to argue against ensuring that
> output/input preserves values.

I agree in general, but I wonder whether it would be worthwhile to add
display formatting options to psql (something like "COLUMN foo FORMAT ..."
in Oracle's sqlplus), so that the user can decide to display a specific
column (or maybe all float numbers) as (for example) "%8.3f" or ".6e".

This is of course already possible by using to_char in the query (e.g.
to_char(f, '.999') or to_char(f, '9.99')) but that is a bit
cumbersome. 

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Permissions on postgresql.conf, psql and patroni

2019-09-09 Thread Peter J. Holzer

Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1 from
http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main).

Today I noticed that I couldn't invoke psql as an unprivileged user
anymore:

% psql
Error: Invalid data directory for cluster 11 main

Further investigation showed that the read permissions on
/etc/.../postgresql.conf were revoked at the time of the upgrade. Either
by a post-install script or maybe by patroni itself, when it started up
again.

This leads me to two questions:

1) Is there a reason to restrict read access to postgresql.conf to the
   user postgres? AFAIK this file doesn't normally contain sensitive data
   (unlike pg_hba.conf and pg_ident.conf which are restricted by default).

2) Why does psql need to read postgresql.conf, and more specifically,
   why does it care about the location of the data directory? It
   shouldn't access files directly, just talk to the server via the
   socket.

    hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Permissions on postgresql.conf, psql and patroni

2019-09-09 Thread Peter J. Holzer

On 2019-09-09 13:29:38 +0200, Daniel Verite wrote:
>   Peter J. Holzer wrote:
> > 2) Why does psql need to read postgresql.conf, and more specifically,
> >   why does it care about the location of the data directory? It
> >   shouldn't access files directly, just talk to the server via the
> >   socket.
> 
> It's not psql itself, it's pg_wrapper.
> 
> $ ls -l /usr/bin/psql
> lrwxrwxrwx 1 root root 37 Aug  8 12:48 /usr/bin/psql ->
> ../share/postgresql-common/pg_wrapper
> 
> See http://manpages.ubuntu.com/manpages/bionic/man1/pg_wrapper.1.html

Ah, thanks.


On 2019-09-09 10:03:57 -0400, Tom Lane wrote:
> "Peter J. Holzer"  writes:
> > Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1 from
> > http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main).
> > Today I noticed that I couldn't invoke psql as an unprivileged user
> > anymore:
> > % psql
> > Error: Invalid data directory for cluster 11 main
> 
> FYI, there's no such error string in the core Postgres sources.
> I imagine that it's coming from some wrapper script.

Yep. Daniel shoved me into the right direction. The message is from
PgCommon.pm, used by pg_wrapper.

It's a good thing both packages are maintained by the same team. Saves
the discussion who's going to fix it :-).

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Permissions on postgresql.conf, psql and patroni

2019-09-09 Thread Peter J. Holzer

On 2019-09-09 19:15:19 +0200, Peter J. Holzer wrote:
> On 2019-09-09 10:03:57 -0400, Tom Lane wrote:
> > "Peter J. Holzer"  writes:
> > > Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1
> > > from http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main).
> > > Today I noticed that I couldn't invoke psql as an unprivileged user
> > > anymore:
> > > % psql
> > > Error: Invalid data directory for cluster 11 main
> > 
> > FYI, there's no such error string in the core Postgres sources.
> > I imagine that it's coming from some wrapper script.
> 
> Yep. Daniel shoved me into the right direction. The message is from
> PgCommon.pm, used by pg_wrapper.
> 
> It's a good thing both packages are maintained by the same team. Saves
> the discussion who's going to fix it :-).

Since this issue is specific to the debian/ubuntu packages, I've opened
a ticket in redmine: https://redmine.postgresql.org/issues/4717 (as
suggested in the wiki).

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Strange Behaviour with multicolumn indexes

2019-09-12 Thread Peter J. Holzer

[PostgreSQL 11.5 (Ubuntu 11.5-1.pgdg18.04+1) on x86_64-pc-linux-gnu]

I have a table with many columns and many indexes (actually many tables
with many columns and many indexes), but for the sake of this posting,
we'll consider just three columns, which we unimaginatively call a, b,
and c. There are also three indexes:

t_a_idx   btree (a) WHERE a IS NOT NULL
t_b_idx   btree (b) WHERE b IS NOT NULL
t_a_b_idx btree (a, b) WHERE a IS NOT NULL AND b IS NOT NULL

Nowe I have a query 
select c from t where a='A' and b='B';

This uses t_b_idx, not - as I expected - t_a_b_idx.

If I create an index with the columns swapped:

t_b_a_idx btree (b, a) WHERE b IS NOT NULL and a IS NOT NULL

this index will be used.

The distribution of values in columns a and b is quite different: a has
10 different values of similar frequency (and no null values). b has
only a single non-null value which with a frequency of about 1 %. 

So I definitely understand why it would prefer t_b_idx to t_a_idx, but
certainly t_a_b_idx should be even better? After all it would have to
read only 1/1000 of the rows instead of 1/100. it would also have to
scan much less of the index, so the fact the fact that the index is a
bit larger shouldn't make a difference.

Explain shows that the row estimates are spot on, but the cost for using
t_a_b_idx is higher than for t_b_idx (which is in turn higher than for
t_b_a_idx).

    hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Strange Behaviour with multicolumn indexes

2019-09-12 Thread Peter J. Holzer

On 2019-09-12 12:54:55 -0400, Tom Lane wrote:
> "Peter J. Holzer"  writes:
> > we'll consider just three columns, which we unimaginatively call a, b,
> > and c. There are also three indexes:
> 
> > t_a_idx   btree (a) WHERE a IS NOT NULL
> > t_b_idx   btree (b) WHERE b IS NOT NULL
> > t_a_b_idx btree (a, b) WHERE a IS NOT NULL AND b IS NOT NULL
> 
> > Nowe I have a query 
> > select c from t where a='A' and b='B';

Just in case somebody wants to play along at home: A narrow table with
just these three columns and synthetic data shows the problem but
doesn't use the flipped multicolumn index either. I guess you have to
pad the table a bit to sufficiently change the costs.

> > This uses t_b_idx, not - as I expected - t_a_b_idx.
> 
> > The distribution of values in columns a and b is quite different: a has
> > 10 different values of similar frequency (and no null values). b has
> > only a single non-null value which with a frequency of about 1 %. 
> 
> > So I definitely understand why it would prefer t_b_idx to t_a_idx, but
> > certainly t_a_b_idx should be even better?
> 
> Not necessarily --- t_a_b_idx is (presumably) physically bigger than
> t_b_idx, which makes it more expensive to search.  The additional
> selectivity gain apparently doesn't outweigh that.

The index is about 20 % larger, but only 1/10 of it has to be searched,
so it should take 1 - (1.2 / 10) = 88 % less time.

> > If I create an index with the columns swapped:
> > t_b_a_idx btree (b, a) WHERE b IS NOT NULL and a IS NOT NULL
> > this index will be used.
> 
> Hmm.  Probably that has something to do with a calculation about
> the selectivity of the leading index column, ie do you have to
> scan 10% of the index or 1% of the index.

Does it use only the selectivity of the first column? That would explain
it. The total selectivity of both columns is obviously the same.

> It's not taking the partial-index filter into account in that, I
> suspect, which skews the results in this case --- but that would be
> hard to account for accurately.

Hmm. Wouldn't that be a problem for partial indexes in general? They
usually cover only a small portion of the table and if the selectivity
is computed relative to the whole table the result may be way off.

I think it should be possible to adjust for a "WHERE column IS NOT NULL"
filter, because null_frac is in the statistics. For the general case you
would need an estimate of the number of rows covered by the index, which
I don't think we have.

> Anyway I can't get excited about optimizing for a single non-null
> value.

That's certainly an extreme case, but I've seen bad plans for similar
queries every now and then. At least now I have an idea what causes it
and maybe a solution/workaround.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: Strange Behaviour with multicolumn indexes

2019-09-12 Thread Peter J. Holzer

On 2019-09-12 21:04:25 +0200, Peter J. Holzer wrote:
> On 2019-09-12 12:54:55 -0400, Tom Lane wrote:
> > It's not taking the partial-index filter into account in that, I
> > suspect, which skews the results in this case --- but that would be
> > hard to account for accurately.
> 
> Hmm. Wouldn't that be a problem for partial indexes in general? They
> usually cover only a small portion of the table and if the selectivity
> is computed relative to the whole table the result may be way off.
> 
> I think it should be possible to adjust for a "WHERE column IS NOT NULL"
> filter, because null_frac is in the statistics. For the general case you
> would need an estimate of the number of rows covered by the index, which
> I don't think we have.

Looking through the source I see that the planner does estimate the
number of tuples in the index.

hp



-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Referncing a calculated column in a select?

2019-09-12 Thread Peter J. Holzer

On 2019-09-12 15:35:56 -0500, Ron wrote:
> On 9/12/19 2:23 PM, stan wrote:
> > I am creating some views, that have columns with fairly complex calculations
> > in them. The I want to do further calculations using the result of this
> > calculation. Right now, I am just duplicating the first calculation in the
> > select fro the 2nd calculated column. There must be a batter way to do
> > this, right?
> 
> I must be misunderstanding you, because you can just do more of what you're
> doing now.
> 
> Here's an example of doing some calculations in the creation of the view,
> and then more calculations when selecting from the view:

I think he wants to refer to other columns in the view. 

Something like

create or replace view v as
select a,
   b,
   abs(b - a)  as abserror,
   case when abs(a) > abs(b) then abs(a) else abs(b) end as mag,
   abserror / mag as relerror
from t;

Except that this doesn't work.

What you can do is nest views:

create or replace view v as
select a,
   b,
   abs(b - a)  as abserror,
   case when abs(a) > abs(b) then abs(a) else abs(b) end as mag
from t;

create or replace view v2 as
select a,
   b,
   abserror,
   mag,
   abserror / mag as relerror
from v;

wds=> select * from v2; 
╔══╤═╤══╤══╤═══╗
║  a   │b│   abserror   │ mag  │ relerror  ║
╟──┼─┼──┼──┼───╢
║2 │   3 │1 │3 │ 0.333 ║
║   -2 │   3 │5 │3 │  1.67 ║
║ 3.1415926536 │ 2.71828 │ 0.4233126536 │ 3.1415926536 │ 0.134744602587137 ║
╚══╧═╧══╧══╧═══╝
(3 rows)

No idea whether this is more or less efficient than writing the whole
formula for each column.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Referncing a calculated column in a select?

2019-09-13 Thread Peter J. Holzer

On 2019-09-13 11:49:28 +0900, Kyotaro Horiguchi wrote:
> At Thu, 12 Sep 2019 23:16:01 +0200, "Peter J. Holzer"  
> wrote in <20190912211601.ga3...@hjp.at>
> > On 2019-09-12 15:35:56 -0500, Ron wrote:
> > > On 9/12/19 2:23 PM, stan wrote:
> > > > I am creating some views, that have columns with fairly complex 
> > > > calculations
> > > > in them. The I want to do further calculations using the result of this
> > > > calculation. Right now, I am just duplicating the first calculation in 
> > > > the
> > > > select fro the 2nd calculated column. There must be a batter way to do
> > > > this, right?
[...]
> > I think he wants to refer to other columns in the view. 
> ..
> > What you can do is nest views:
> 
> Doesn't subquery work?

Yes, a subquery would work, too.

> Or CTE (WITH clause) might look better.

CTEs (which I quite like in general) are optimisation barriers, so using
them in a view may lead to inefficient plans (depending on what you do
with the view). So, yes, it would work, but you have to check the plans
of your queries (and be prepared to rewrite your view if they aren't
good enough), which is why I didn't suggest it.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Variable constants ?

2019-09-15 Thread Peter J. Holzer

On 2019-08-15 16:56:57 -0400, stan wrote:
> bossiness constants

On 2019-09-02 13:31:14 -0400, stan wrote:
> bossiness plan
> bossiness model

On 2019-09-13 05:57:33 -0400, stan wrote:
> bossiness work

I'm starting to think that this is not a typo :-)

SCNR,
    hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: PostgreSQL License

2019-09-17 Thread Peter J. Holzer

On 2019-09-17 14:56:30 +0300, Ashkar Dev wrote:
> but it is saying (without fee)
> if I create a database with it to work with Web Application if want to sell it
> so the buyer must have the PostgreSQL installed in his device to work offline
> right?
> "Permission to use, copy, modify, and distribute this software and its
> documentation for any purpose, without fee, and without a written agreement is
> hereby granted, provided that the above copyright notice and this paragraph 
> and
> the following two paragraphs appear in all copies."

This means that you don't have to pay a fee or sign a written agreement
to use, copy, modify, and distribute this software and its documentation
for any purpose. It doesn't say that you can't charge a fee for
distributing (although why anybody would pay you for something they can
download themselves for free I don't know).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Securing records using linux grou permissions

2019-10-19 Thread Peter J. Holzer

On 2019-10-15 13:10:13 -0400, David Gauthier wrote:
> I was hoping there was a way to integrate the user/permissions/groups in linux
> with the PG permissions functionality. 

You can at least map the OS users to DB roles by using the peer or ident
authentication schemes. This way the users won't have to enter their
passwords again. But I think this works only if the client and the
server are on the same host. And you still have to maintain the groups,
although that should be easy to automate.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: jsonb_set() strictness considered harmful to data

2019-10-22 Thread Peter J. Holzer

On 2019-10-20 13:20:23 -0700, Steven Pousty wrote:
> I would think though that raising an exception is better than a default
> behavior which deletes data.
> As an app dev I am quite used to all sorts of "APIs" throwing exceptions and
> have learned to deal with them.
> 
> This is my way of saying that raising an exception is an improvement over the
> current situation. May not be the "best" solution but definitely an
> improvement.

I somewhat disagree. SQL isn't in general a language which uses
exceptions a lot. It does have the value NULL to mean "unknown", and
generally unknown combined with something else results in an unknown
value again:

% psql wds
Null display is "(∅)".
Line style is unicode.
Border style is 2.
Unicode border line style is "double".
Timing is on.
Expanded display is used automatically.
psql (11.5 (Ubuntu 11.5-3.pgdg18.04+1))
Type "help" for help.

wds=> select 4 + NULL;
╔══╗
║ ?column? ║
╟──╢
║  (∅) ║
╚══╝
(1 row)

Time: 0.924 ms
wds=> select replace('steven', 'e', NULL);
╔═╗
║ replace ║
╟─╢
║ (∅) ║
╚═╝
(1 row)

Time: 0.918 ms

Throwing an exception for a pure function seems "un-SQLy" to me. In
particular, jsonb_set does something similar for json values as replace
does for strings, so it should behave similarly.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: jsonb_set() strictness considered harmful to data

2019-10-22 Thread Peter J. Holzer

On 2019-10-21 09:39:13 -0700, Steven Pousty wrote:
> Turning a JSON null into a SQL null  and thereby "deleting" the data
> is not the path of least surprises.

But it doesn't do that: A JSON null is perfectly fine:

wds=> select jsonb_set('{"a": 1, "b": 2}'::jsonb, '{c}', 'null'::jsonb);
╔═╗
║  jsonb_set  ║
╟─╢
║ {"a": 1, "b": 2, "c": null} ║
╚═╝
(1 row)


It is trying to replace a part of the JSON object with an SQL NULL (i.e.
unknown) which returns SQL NULL:

wds=> select jsonb_set('{"a": 1, "b": 2}'::jsonb, '{c}', NULL);
╔═══╗
║ jsonb_set ║
╟───╢
║ (∅)   ║
╚═══╝
(1 row)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: jsonb_set() strictness considered harmful to data

2019-10-22 Thread Peter J. Holzer

On 2019-10-22 09:16:05 +1100, raf wrote:
> Steven Pousty wrote:
> > In a perfect world I would agree with you. But often users do not read ALL
> > the documentation before they use the function in their code OR they are
> > not sure that the condition applies to them (until it does).
> 
> I'm well aware of that, hence the statement that this
> information needs to appear at the place in the
> documentation where the user is first going to
> encounter the function (i.e. in the table where its
> examples are).

I think this is a real weakness of the tabular format used for
documenting functions: While it is quite compact which is nice if you
just want to look up a function's name or parameters, it really
discourages explanations longer than a single paragraph.

Section 9.9 gets around this problem by limiting the in-table
description to a few words and "see Section 9.9.x". So you basically
have to read the text and not just the table. Maybe that would make
sense for the json functions, too?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

Re: jsonb_set() strictness considered harmful to data

2019-10-23 Thread Peter J. Holzer

On 2019-10-22 18:06:39 -0700, David G. Johnston wrote:
> On Tue, Oct 22, 2019 at 3:55 PM Peter J. Holzer  wrote:
> On 2019-10-20 13:20:23 -0700, Steven Pousty wrote:
> > I would think though that raising an exception is better than a
> > default behavior which deletes data.
> > As an app dev I am quite used to all sorts of "APIs" throwing
> > exceptions and have learned to deal with them.
> >
> > This is my way of saying that raising an exception is an
> > improvement over the current situation. May not be the "best"
> > solution but definitely an improvement.
> 
> I somewhat disagree. SQL isn't in general a language which uses
> exceptions a lot. It does have the value NULL to mean "unknown", and
> generally unknown combined with something else results in an unknown
> value again:
> 
> [...] 
> 
> 
> Throwing an exception for a pure function seems "un-SQLy" to me. In
> particular, jsonb_set does something similar for json values as replace
> does for strings, so it should behave similarly.
> 
> 
> Now if only the vast majority of users could have and keep this level of
> understanding in mind while writing complex queries so that they remember to
> always add protections to compensate for the unique design decision that SQL
> has taken here...

I grant that SQL NULL takes a bit to get used to. However, it is a core
part of the SQL language and everyone who uses SQL must understand it (I
don't remember when I first stumbled across "select * from t where c =
NULL" returning 0 rows, but it was probably within the first few days of
using a database). And personally I find it much easier to deal with
concept which are applied consistently across the whole language than
those which sometimes apply and sometimes don't seemingly at random,
just because a developer thought it would be convenient for the specific
use-case they had in mind.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: Is this a bug ?

2019-10-23 Thread Peter J. Holzer

On 2019-10-23 17:20:07 +0100, Geoff Winkless wrote:
> For what it's worth, I can see a value to having
> 
> SELECT 'this is quite a long string'
>'which I've joined together '
>'across multiple lines';
> 
> although the advantage of it vs using a concat operator is slim.

True. However, concatenation of string literals by juxtaposition isn't
specific to SQL. Two other well known languages where this works (even
without a newline) are C and Python.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>


signature.asc
Description: PGP signature

Re: 11 -> 12 upgrade on Debian Ubuntu

2019-11-07 Thread Peter J. Holzer

On 2019-11-07 10:59:37 -0500, stan wrote:
> On Thu, Nov 07, 2019 at 07:52:14AM -0800, Adrian Klaver wrote:
> > On 11/7/19 7:45 AM, stan wrote:
> > > I am in the middle of a project, and it looks like version 12 is now what
> > > the Debian/Ubuntu package managers want to update to.
> > 
> > This should be a dist-upgrade correct?
> 
> Correct.
> 
> > On a my Ubuntu instance that just installed the Postgres 12 version and
> > started it(also auto start in start.conf). This was just the template
> > databases and postgres db. No data was moved over from the 11 instance that
> > is running.
> > 
> Thanks, that is helpful to know.

To migrate the data to the new database, run pg_upgradecluster. This is
a wrapper around pg_upgrade which knows about details of a standard
Debian/Ubuntu PostgreSQL installation (like directory layout, etc.), so
it should be simpler and safer than invoking pg_upgrade yourself (and
pg_upgrade is hidden in /usr/lib/postgresql/*/bin to prevent you from
invoking it accidentally).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: type SERIAL in C host-struct

2019-11-09 Thread Peter J. Holzer

On 2019-11-07 20:14:47 +0100, Matthias Apitz wrote:
> At the end of the day we came up with the following solution:
> 
> strcpy(host_struct.name, "Sigrid");
> EXEC SQL select nextval('lina_lid_seq') into :host_struct.ser;
> 
> EXEC SQL INSERT INTO lina VALUES ( :host_struct );
> 
> which seems to work fine. Any comments about side effects?

You are performing two queries instead of one, so you have to wait for
one extra round trip. Not a problem if client and server are on the same
network or you insert a few host_structs per second. May be a problem if
client and server are in different datacenters and you want to do
hundreds of inserts per second.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: REINDEX VERBOSE unknown option

2019-11-18 Thread Peter J. Holzer

On 2019-11-18 12:24:40 +, Geoff Winkless wrote:
> On Mon, 18 Nov 2019 at 11:46, Michael Paquier  wrote:
> > On Mon, Nov 18, 2019 at 10:27:24AM +0100, Josef Šimánek wrote:
> > > This is clear once you understand what does it mean. I was aware of 
> > > VERBOSE
> > > option of EXPLAIN and tried to use it without needed parentheses (the same
> > > way EXPLAIN can understand it). In the parameter list of REINDEX, it is
> > > still called VERBOSE (not "( VERBOSE )") and there's no info
> > > that parentheses are needed.
[...]
> But if the parentheses are part of the parameter, I think putting the
> parentheses in the parameter list might be a good compromise.

The parentheses aren't part of the parameter. They are part of the
syntax of the command.

It might useful to distinguish better between symbols which must be
typed as shown ("REINDEX", "(", ")", etc. in this case) and those which
describe the grammar ("[", "]", "name" in this case).

Maybe print the square brackets in bold and italics like "name"? But
that is just the opposite of the usual convention.

How about this?

* Terminals (stuff that has to be typed as shown) in bold.

* Non-Terminals (stuff which has to be replaced) in italic.

* Meta-characters ([, ], |, ...) in regular type.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: mysysconf ?

2019-11-20 Thread Peter J. Holzer

On 2019-11-19 19:32:03 -0500, stan wrote:
> I am presently running on a Ubuntu 18.04 instance, and as you know
> Debian/Ubuntu have upgraded to version 12.

I don't think so. Neither Debian nor Ubuntu is in the habit of doing
major upgrades in the middle of a release and Ubuntu 18.04 was released
long before PostgreSQL 12.

Indeed, "apt policy" shows 10+190ubuntu0.1 to be the current version for
Ubuntu 18.04.

You have probably installed PostgreSQL from apt.postgresql.org, which
always has the newest version (but you can also install older versions
and stick with them).

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: REINDEX VERBOSE unknown option

2019-11-20 Thread Peter J. Holzer

On 2019-11-19 11:37:04 +, Geoff Winkless wrote:
> Even if you do that you're still requiring the user to parse syntax
> according to esoteric rules.

Oh, please. Those "esoteric rules" have been in wide-spread use for
decades. If you look at any manual which tries to explain the syntax of
a programming language, markup language or something similar in a
(semi-)formal way, it probably uses something very similar. (More formal
texts often use BNF (or a variant), which are more expressive, but
harder to read. Personally I like syntax diagrams (sometimes called
railroad diagrams, but they seem to have mostly fallen out of fashion)

And of course, like almost any manual, the PostgreSQL manual explains
the notation in the preface:
https://www.postgresql.org/docs/current/notation.html
(Paper books are at an advantage here that readers usually actually
start at the beginning)

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: REINDEX VERBOSE unknown option

2019-11-21 Thread Peter J. Holzer

On 2019-11-21 09:43:26 +, Geoff Winkless wrote:
> On Wed, 20 Nov 2019 at 22:48, Peter J. Holzer  wrote:
> >
> > On 2019-11-19 11:37:04 +, Geoff Winkless wrote:
> > > Even if you do that you're still requiring the user to parse syntax
> > > according to esoteric rules.
> >
> > Oh, please. Those "esoteric rules" have been in wide-spread use for
> > decades.
> 
> It wasn't meant to be insulting, I meant "esoteric" in the strict
> sense: that you need to have specific knowledge to parse them.

I didn't understand it as insulting (why would I?), but don't think this
convention is "requiring ... knowledge that is restricted to a small
group" (Merriam-Webster). This specific convention for conveying grammar
rules is in my experience by far the most common (before BNF and
diagrams). Anybody who has read any documentation about any formal
language (e.g., a programming language, a query language, a markup or
configuration language) has very likely encountered it before.

Yes, you need specific knowledge to understand the PostgreSQL
documentation. For starters, you need to know English (or one the
handful languages in which it has been translated). You need to know
what a relational database is and why and how you would use one. You
need some generic knowledge about computing (what is a "client/server
model"? How do I start a command line tool?), etc. The convention for
describing the grammar is probably the least concern, and besides, it is
explained in the manual (unlike some concepts which are assumed to be
known).

> My point was that modifying the rules (by making certain things bold
> or italic) wouldn't really solve the problem - if you don't know what
> the rules are, you're unlikely to be any better off if someone adds to
> them.

Man is a pattern-matching animal. Even without an explicit explanation,
humans are quite good at deriving meaning from repeated patterns. So if
the parts you have to type verbatim are always printed in bold and parts
which have a meta-meaning are always printed in italic and optional
parts are always enclosed in (italic) square brackets, people are very
likely to understand that
[ ( VERBOSE ) ]
means that "( VERBOSE )" must be typed as is, but is optional. Even if
they can't tell you the rules. Simply because they have seen it a few
dozen times before. There is a reason why almost any technical
documentation uses some typographical convention and why those
conventions are almost always very similar.

The PostgreSQL manual unfortunately uses the same typographic convention
for meta-characters ([], {}, |, ...) and terminals, which isn't as clear
as it could be.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: [SPAM] Remote Connection Help

2019-11-21 Thread Peter J. Holzer

On 2019-11-21 17:27:04 +0100, Moreno Andreo wrote:
> Connection refused means somthing has blocked it. If it was all OK and simply
> Postgres was not listening, you should've received a "connection timed out"
> (10060) message.

Almost exactly the other way around. If you try to connect to a port
where no server is listening, you get a connection refused message. 
If something is blocking the connection you may get a timeout.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: REINDEX VERBOSE unknown option

2019-11-21 Thread Peter J. Holzer

On 2019-11-21 16:48:14 +, Geoff Winkless wrote:
> On Thu, 21 Nov 2019 at 15:32, Peter J. Holzer  wrote:
> > On 2019-11-21 09:43:26 +, Geoff Winkless wrote:
> > > It wasn't meant to be insulting, I meant "esoteric" in the strict
> > > sense: that you need to have specific knowledge to parse them.
> >
> > I didn't understand it as insulting (why would I?),
> 
> I've absolutely no idea, but I couldn't imagine why on Earth you would
> apparently take such exception to it otherwise. Maybe writing
> sarcastic bombast in response to something that hasn't annoyed you is

I don't think we agree on the meaning of "sarcastic" and "bombast".


> > but don't think this convention is "requiring ... knowledge that is
> > restricted to a small group" (Merriam-Webster).
> 
> That's entirely the case here. I'd say the number of people able to
> understand something like BNF is vanishingly small in terms of the
> 7bn(?) world-population.

The number of people reading the PostgreSQL manual is also vanishingly
small in terms of the world-population. These two aren't independent.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Isolation of multiple databse instances provided by a single postgres server

2019-11-21 Thread Peter J. Holzer

On 2019-11-21 08:18:21 -0600, Ron wrote:
> It appears to me that, within this one Postgres "instance", there are 2
> levels of "isolation", which are database, and schemas. Is this correct?
[...]
> If so, how does this cores pond to physical on disk storage?
> 
> It corresponds not at all.

That's not quite true. Each database has a specific directory (per
tablespace) to keep its files in. Schemas on the other hand do not
correspond to anything on the filesystem.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Isolation of multiple databse instances provided by a single postgres server

2019-11-21 Thread Peter J. Holzer

On 2019-11-21 07:21:10 -0800, Adrian Klaver wrote:
> On 11/21/19 6:51 AM, Laurenz Albe wrote:
> > - The different databases in a cluster are physically located in
> >the same tablespace, but they are logically strictly separated.
> >You cannot connect to one database and access another database from 
> > there.
> 
> dblink(https://www.postgresql.org/docs/11/dblink.html) and
> FDW(https://www.postgresql.org/docs/11/sql-createforeigntable.html) would
> beg to differ.

I was thinking of FDW.

But I think Laurenz had the SQL syntax in mind which does include the
database in the fully qualified form of a tablename. Something like

db2=# select * from db2.public.t3;
╔╤═══╗
║ id │   t   ║
╟┼───╢
║  1 │ this is db2.public.t3 ║
╚╧═══╝
(1 row)

works. But when you try to access a table in a different database you
get an error:

db2=# select * from db1.public.t1;
ERROR:  cross-database references are not implemented: "db1.public.t1"
LINE 1: select * from db1.public.t1;

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Logging [RESOLVED]

2019-12-06 Thread Peter J. Holzer

On 2019-12-04 16:03:24 -0800, Rich Shepard wrote:
> On Wed, 4 Dec 2019, Stephen Eilert wrote:
> > Usually, this is done by logrotate or a similar mechanism in your system.
> > You’ll likely find that other logs in your system follow a similar
> > pattern, not just Postgresql.
> 
> I just checked /etc/logrotate.d/postgres and it was set at daily with rotate
> 7. I changed rotate to 4 but the dates are 3-4 days apart, not sequential.

Your log files are extremely small. At only a few hundred bytes every 3
or 4 days it is very likely that nothing is logged on most days. If your
log file is empty, logrotate won't rotate it if the option "notifempty"
is set (which is probably the case).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'

2019-12-08 Thread Peter J. Holzer

 scores > 90. This is basically a full index scan since that
index is ordered by length(letters). I'm surprised that PostgreSQL
can even do that :-). This is a separate index scan than the one for
length(letters) = 7, so separating the indexes should be no worse
and probably a lot better. Create an index on score (possibly
conditional on action = 'play').

 3. A lot of the conditions is fixed. So you might want to move them into
the condition of a partial index:

create index on words_moves(played) 
where action = 'play' and LENGTH(hand) = 7 and (LENGTH(letters) = 7 
OR score > 90);

Then the planner should recognize that it can use this index and the
index contains only interesting moves - no logical operations needed
at query time at all, only an index scan to find recent moves.

Be warned though that creating such ultra-specific indexes comes at
a cost: You may end up with a lot of them if you have many different
queries and maintaining them may make inserts noticably slower.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: server will not start (Debian)

2019-12-15 Thread Peter J. Holzer

On 2019-12-10 12:03:30 -0500, stan wrote:
> OK, here is the latets. If I do sudo systemctl start postgresql@12-main
> 
> the instance starts and works fine.
> 
> However, if I reboot the machine, the instance is not started.

Check whether the service is enabled in systemd:

# systemctl status postgresql@12-main.service 
● postgresql@12-main.service - PostgreSQL Cluster 12-main
   Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; 
vendor preset: enabled)
^^^

If it isn't, enable it:

# systemctl enable postgresql@12-main.service

    hp



-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Need support on tuning at the time of index creation

2020-01-27 Thread Peter J. Holzer

On 2020-01-27 11:10:36 +, Sandip Pradhan wrote:
> One of our ericsson product used backend db as postgresql 9. We are facing
> following performance issues where we need some support from your side.
> We are having 10 tables and we are inserting around 150 million to 250 million
> records on each of those tables. After that we need to create 29 indexes
> includung primary key index and other types of indexes. Currently it is taking
> almost 3.5 to 4 hrs.

It is hard to give advice on what you could do differently if you don't
tell us what you do.

How are you inserting those rows? 
How are you creating the indexes?

Especially: Are you doing things serially or in parallel?

Also performance depends a lot on hardware, so faster CPUs (or more
cores) and faster disks/ssds might help, too.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Performance Issue after upgrade from 9 to 11

2020-01-29 Thread Peter J. Holzer

On 2020-01-29 09:39:03 -0800, Perumal Raj wrote:
> We have recently upgraded postgres from 9.2 to 11.6 and started seeing
> performance issue immediately and  able to fix the performance issue after
> disabling parameter: enable_seqscan.

How did you upgrade?

If your upgrade involved a dump and restore, you should invoke ANALYZE
for each database (I think autovacuum will analyze all tables
eventually, but takes its time).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Need support on tuning at the time of index creation

2020-02-01 Thread Peter J. Holzer

On 2020-01-31 10:13:20 +, Sandip Pradhan wrote:
> Thanks for your time.
> 
> Out Postgresql version: 9.5.9.14.
> 
> We are using COPY command to insert rows into the tables. While running COPY 
> command, all indexes dropped.
> 

So I take it that the copy is fast, but the building of indexes is slow
(actually I'm not sure whether less than 50 seconds per index is slow
for tables of this size, but it's too slow for your application)

> After successfully inserting all the rows, we are trying to create all those 
> indexes.
> 
> Example of index creation script:
> CREATE INDEX cwt_sr_assocact_entityref
>   ON cwt_sr_associationact
>   USING btree
>   (entityapplicationcontext COLLATE pg_catalog."default", entitytype COLLATE 
> pg_catalog."default", entitydn COLLATE pg_catalog."default");
> 
> CREATE INDEX ix_sr_assoc_customerrelateddn
>   ON cwt_sr_associationact
>   USING btree
>   (customerrelateddn COLLATE pg_catalog."default");
> 
> Running indexes in parallel.
> 
> Please find the hardware detail of the server:
> OS: Red Hat Enterprise Linux 7.4
> RAM: 125 GB
> CPU Core: 36
> 
> Set maintenance_work_mem to 25GB

Ideally, you would want to use all the cores and each process should use
as much RAM as possible. However, if each process uses 25 GB, you can
only fit 4 of them (not 5 because other stuff needs memory, too), in RAM
at the same time, leaving 32 of your 36 cores idle. OTOH, for 36
processes, each would have to be restricted to about 3 GB, which doesn't
sound ideal, either. So there's probably a sweet spot somewhere in the
middle.

What I would do:

To save time, I would test only with a single table (there are 29
indexes on each table and that's almost certainly more than the optimal
degree of parallelism, so we won't get anything from processing several
tables at once).

To establish a baseline, I'd look for the optimal parallelism with your
current parameters: Build the indexes with 1, 2, 4, 8, 16, 29 processes in
parallel. Stop if it gets noticably slower, especially if it starts to
swap heavily. Then bisect between the two best ones. Note the results.

Then reduce maintenance_work_mem to 50 % and repeat the experiment.
Does it get better?

If it gets better (or at least not much worse), reduce
maintenance_work_mem again and repeat.

Eventually you should find the optimal combination. 

To reach your goal of 40 minutes for all 10 tables, your optimal run for
a single table must be about 4 minutes. If you don't get close to that
you will probably have to look for different solutions.

As already mentioned, the speed of I/O makes a lot of difference. If you
don't use SSDs yet, you should. If you do use SSDs, maybe you can get
faster ones? You might also investigate putting pgsql_tmp on a RAM disk.

You could also try changing synchronous_commit and/or
fsync/full_page_writes (the latter only if you can tolerate losing your
DB in a crash).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Restore is failing

2020-02-05 Thread Peter J. Holzer

On 2020-02-05 13:54:56 +0530, Sonam Sharma wrote:
> I am copying schema data to qa box from prod.its failing and giving below 
> error
> :
> 
> 
> WARNING:  terminating connection because of crash of another server process
> 
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited abnormally

> and possibly corrupted shared memory.

As the message states, the problem is with another server process. You
need to identify that process and why it crashed. Check the postgres
server log. It should contain an error message. You might also want to
check the syslog (on Linux or other Unixes) or equivalent for system
errors (e.g. out of memory, disk errors, etc.)

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Cannot connect to postgresql-11 from another machine after boot

2020-02-16 Thread Peter J. Holzer

On 2020-02-13 21:03:48 -0800, Adrian Klaver wrote:
> On 2/13/20 9:02 PM, Adrian Klaver wrote:
> > On 2/13/20 7:54 PM, Jason Swails wrote:
> > > The problem is that after my machine boots, I'm unable to connect to
> > > the server from anywhere except localhost.  Running a simple
> > > "systemctl restart postgresql" fixes the problem and allows me to
> > > connect from anywhere on my LAN.  Here is an example of this
> > > behavior:
[...]
> > > 
> > > So the first connection attempt fails.  But when I restart the
> > > service and try again (doing nothing else in between), the
> > > connection attempt succeeds.  My workaround has been to simply
> > > restart the service every time my machine reboots, but I'd really
> > > like to have a more reliable startup.
> > > 
> > > Any ideas how to start hunting down the root cause?  I think this
> > > started happening after I moved the data directory to another drive.
> > 
> > I would start by looking in the system log to see what it records when
> > the service tries to start on reboot.
> 
> Hit send to soon. At a guess the Postgres service is starting before the
> drive is mounted.

I don't think this has anything to do with the drive. If the drive
wasn't mounted he couldn't connect from localhost either.

What is probably happening is that postgresql is configured to listen on
localhost and the IP address of the ethernet interface and is starting
before the etherned interface is ready. So it is listening only on
localhost (there should be an error message regarding the other address
in the log). When he restarts postgresql some time later, the interface
is ready.

It should be possible to solve this by adding the right dependencies to 
systemd. 

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Cannot connect to postgresql-11 from another machine after boot

2020-02-22 Thread Peter J. Holzer

On 2020-02-17 10:17:41 -0500, Jason Swails wrote:
> I then added "network.target", "networking.service", and
> "network-online.target" to the After line of the postgresql.service systemd
> file, but it still didn't fix the problem.

After=network-online.target should be correct.

However, see
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ for an
explanation why "the network is online" is not as simple as it looks and
how to ensure that.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: DB running out of memory issues after upgrade

2020-02-23 Thread Peter J. Holzer

On 2020-02-18 18:10:08 +, Nagaraj Raj wrote:
> Below are the same configurations ins .conf file before and after updagrade
> 
> show max_connections; = 1743
[...]
> show work_mem = "4MB"

This is an interesting combination: So you expect a large number of
connections but each one should use very little RAM?

[...]

> here is some sys logs,
> 
> 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS
> due to excessive memory consumption. 
> 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS
> due to excessive memory consumption. 

The oom-killer produces a huge block of messages which you can find with
dmesg or in your syslog. It looks something like this:

Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer: 
gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, 
oom_score_adj=0
Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ 
mems_allowed=0-1
Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: 
platzangst Tainted: G   OE4.15.0-74-generic #84-Ubuntu
Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. 
PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016
Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace:
Feb 19 19:06:53 akran kernel: [3026711.344835]  dump_stack+0x6d/0x8e
Feb 19 19:06:53 akran kernel: [3026711.344839]  dump_header+0x71/0x285
...
Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:7fff187ef240 
EFLAGS: 0246 ORIG_RAX: 0038
Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffda RBX: 
7fff187ef240 RCX: 7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344896] RDX:  RSI: 
 RDI: 01200011
Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 7fff187ef2b0 R08: 
7f292d596740 R09: 009d43a0
Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 7f292d596a10 R11: 
0246 R12: 
Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0020 R14: 
 R15: 
Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info:
Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 
inactive_anon:1133875 isolated_anon:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  active_file:467 
inactive_file:371 isolated_file:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  unevictable:0 dirty:3 
writeback:0 unstable:0
...
Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ]   uid  tgid total_vm 
 rss pgtables_bytes swapents oom_score_adj name
Feb 19 19:06:53 akran kernel: [3026711.344997] [  823] 0   82344909 
   0   106496  121 0 lvmetad
Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354] 0  135411901 
   3   135168  112 0 rpcbind
Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485] 0  148569911 
  99   180224  159 0 accounts-daemon
...
Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 
25591 (postgres) score 697 or sacrifice child
Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) 
total-vm:71116948kB, anon-rss:52727552kB, file-rss:0kB, shmem-rss:3023196kB

The most interesting lines are usually the last two: In this case they
tell us that the process killed was a postgres process and it occupied
about 71 GB of virtual memory at that time. That was clearly the right
choice since the machine has only 64 GB of RAM. Sometimes it is less
clear and then you might want to scroll through the (usually long) list
of processes to see if there are other processes which need suspicious
amounts of RAM or maybe if there are just more of them than you would
expect.


> I identified one simple select which consuming more memory and here is the
> query plan,
> 
> 
> 
> "Result  (cost=0.00..94891854.11 rows=3160784900 width=288)"
> "  ->  Append  (cost=0.00..47480080.61 rows=3160784900 width=288)"
> "->  Seq Scan on msghist  (cost=0.00..15682777.12 rows=312949 
> width
> =288)"
> "  Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"
> "->  Seq Scan on msghist msghist_1  (cost=0.00..189454.50 
> rows=31294900
> width=288)"
> "  Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"

So: How much memory does that use? It produces a huge number of rows
(more than 3 billion) but it doesn't do much with them, so I wouldn't
expect the postgres process itself to use much memory. Are you sure its
the postgres process and not the application which uses a lot of memory?

Re: Can I trigger an action from a coalesce ?

2020-02-23 Thread Peter J. Holzer

On 2020-02-22 16:02:06 -0500, stan wrote:
> I have a case where if a value does not exist, I am going to use a default,
> which is easy with coalesce. But I would like to warn the user that a
> default has been supplied. The default value is reasonable, and could
> actually come from the source table, so I can't just check the value.
> I'd like to do a raise NOTICE, if the default portion of the coalesce fires.

Might I suggest a different approach?

Instead of raising a notice, add an additional column. Something like
this:

wds=> create table mytable (id serial, value int);
CREATE TABLE
Time: 127.124 ms
wds=> insert into mytable (value) values (2), (23), (null), (42), (78);
INSERT 0 5
Time: 48.223 ms
wds=> select * from mytable; 
╔╤═══╗
║ id │ value ║
╟┼───╢
║  1 │ 2 ║
║  2 │23 ║
║  3 │   (∅) ║
║  4 │42 ║
║  5 │78 ║
╚╧═══╝
(5 rows)

Time: 0.657 ms
wds=> select id, coalesce(value, 42) as value, value is null as value_was_null
wds-> from mytable;
╔╤═══╤╗
║ id │ value │ value_was_null ║
╟┼───┼╢
║  1 │ 2 │ f  ║
║  2 │23 │ f  ║
║  3 │42 │ t  ║
║  4 │42 │ f  ║
║  5 │78 │ f  ║
╚╧═══╧╝
(5 rows)

Time: 0.247 ms

This lets the user distinguish the real 42 with id 4 from the
substituted 42 with id 3. I think this would be awkward with a notice.
At most you could tell the user "some values were substituted", but not
which ones (at least not if the query can return a large number of
rows).

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Is is safe to use SPI in multiple threads?

2020-02-23 Thread Peter J. Holzer

On 2020-02-18 19:07:44 +, Tom Mercha wrote:
> On 23/12/2016 13:41, Peter J. Holzer wrote:
> > On 2016-12-09 16:52:05 +0800, Qiu Xiafei wrote:
> >> I'm new to PG and want to implement my domain-specific system based on PG. 
> >> I
> >> wish to arrange my data as several tables in database and translate my DSL 
> >> into
> >> SQL statements for query. Since one DSL statement may be mapped to several 
> >> SQL
> >> statements, it's better to push the DSL server as close to the PG server as
> >> possible. I found PG's backgroud worker meet my needs. I can setup a 
> >> background
> >> worker bounded to PG server and listen to a port for network requests.
> >>
> >> But I encounter a problem that the Server Programing Interfaces are not 
> >> THREAD
> >> SAFE. There are some global variables defined like: SPI_processed,
> >> SPI_tuptable, etc. This limit to my DSL server to work in single thread 
> >> mode
> >> which is quite inefficient.
> > 
> > I had a similar requirement. I solved it by moving the application logic
> > out of the stored procedures. All the stored procedure does is an RPC
> > call (I use ØMQ for that) to a server process and send the result back
> > to the client. The server process converts the request into multiple SQL
> > queries which can be processed in parallel.
> > 
> > The downside is of course that the communication overhead is much
> > higher (A minimum of 4 network messages per request). That's not a
> > problem in my case, but you mileage may vary.
> > 
> > The advantages in my opinion are:
> > 
> > * A standalone server process is easier to test and debug than a bunch
> >of stored procedures.
> > * I can easily scale out if necessary: Currently my database and server
> >process run on the same machine, but I could distribute them over
> >several machines with (almost) no change in logic.
> 
> Sorry to revive such an old topic. I am facing a similar requirement 
> where I am running multiple queries concurrently. Like Qiu Xiafei, I am 
> looking at SPI, and dynamic background workers. In particular, I am 
> using SPI_execq(...) on each dynamic background workers I spawn. What I 
> am experiencing is that I am not seeing a speedup, and I am beginning to 
> wonder if I have done something wrong, if the overheads are too big, or 
> if there are some limitations I am not aware of.
> 
> As I see that none of the comments here make much of a reference to 
> performance/speedup, would you be so kind as to tell me how satisfied 
> you were with performance? Any insights would be greatly appreciated.

The speedup depends very much on how evenly you can divide up your
workload between worker processes. Let's assume that we have a query
which takes 10 seconds and 8 worker processes.

If you can't split that query, there is no speedup. 1 worker will be
busy for 10 seconds, the others will be idle.

If you can split it into 20 queries which will run for 0.5 seconds each,
we will be finished in 1.5 seconds (run 8 queries in the first 0.5
seconds, another 8 in the second, and then the last 4). 

If you can split the query, but the runtimes of the subqueries will be
very different (e.g. they will run for 10, 5, 2, 1, 0.5 and 15 times 0.1
seconds), then the total run time will be close to the longest-running
subquery - in this case 10 seconds. All the workers will start busy
but at some point they run out of work and have to wait for that single
slow subquery.

So the speedup really depends a lot on how smartly you can divide up
your workload. And this is unfortunately something which varies a lot:
Firstly, our data is skewed. We have a lot more data about Austria than
Afghanistan, so simply partitioning by country doesn't give us equal
cost queries. Secondly, even with SSDs, access to permanent storage is
still a lot slower than access to RAM, so your access times may change
unpredictably depending on whether the data you access is hot or cold.

Which brings me to another blessing and/or curse of this approach:
Having a central server process outside of postgresql makes it easy to
cache (partial) results. So very often we already have the data in our
application cache and can generate a response in milliseconds. Which is
good. However, when we don't have the data and have to ask the database,
it is quite likely that it isn't in the cache of the database or even
the OS either, so we have to hit the disks - which is slow.

So in conclusion: We are seeing a substantial speedup. But it is
definitely not linear in the number of worker processes, and the effect
of caching is much greater than that of parallelization. And we pay for
that by a large variation in query times (between milliseconds and tens
of seconds).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Can I trigger an action from a coalesce ?

2020-02-24 Thread Peter J. Holzer

On 2020-02-24 05:20:49 +, sivapostg...@yahoo.com wrote:
> It could also be done by putting those values in square bracket, if 
> substituted
> with default values.
> eg. [0]
>4
>45
> [100]
> Values within square brackets are default values.

This would also work in many cases (especially if the values only have
to be displayed and not processed further).

In this case the OP wrote that "the default value is reasonable and
could actually come from the source table". I assumed that he had a
reason for this choice and wanted to preserve it.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Invoice increment

2020-02-26 Thread Peter J. Holzer

On 2020-02-26 09:38:57 +0100, Søren Frisk wrote:
> I'm trying to find a way to increment an invoice number. And i think it would
> be pretty straight forward just using a SERIAL int.

Be careful: Sequences may have gaps (e.g. because of rolled back
transactions, but maybe also for other reasons). In many jurisdictions,
invoice numbers must not have gaps. You may still be able to use
sequences, but you have to carefully consider when to increment them.

> But as we're selling across
> multiple countries, i need the invoice increment to be reset by a country_id.

By "reset by a country_id" you mean that the invoice number has to
increment independently for each country? So if you sell for example, to 
DK, DK, CN, DK, CN, US, DK, US, the invoice numbers should be
DK-1
DK-2
CN-1
DK-3
CN-2
US-1
DK-4
US-2
?

You can do that by having one sequence per country. The parameter to
nextval is just a string, so you can construct that on the fly:

select nextval('invoice_seq_' || country_code);

However, that means that you have to add a new sequence whenever you
start business in a new country. Maybe not a big deal, but easy to
forget since you don't do that every day.

The other way is of course to have a table with all the current invoice
numbers for each country. Basically sequences implemented in the
application. This has a slightly different set of problems, so you have
to be careful, too.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to install check_postgres on CentOS 8?

2020-02-27 Thread Peter J. Holzer

On 2020-02-26 15:11:38 +0100, rai...@ultra-secure.de wrote:
> I'm trying to install the check_postgres RPM from the official
> postgresql.org repository onto CentOS 8.1
> 
> It says:
> 
> Error:
>  Problem: cannot install the best candidate for the job
>   - nothing provides perl-DateTime-Format-DateParse needed by
> check_postgres-2.25.0-1.rhel8.noarch

So check_postgres depends on a package which isn't in any of your yum
repositories.

It's been some time that I last used RHEL (RHEL 6 was the last I used),
but there were some optional repositories (actually, I think they were
called "channels") with "application specific" packages. Among them was
a developer repo whith lots of additional Perl modules (and other stuff,
but the Perl modules were what is relevant here).

If RHEL 8 (and by extension, CentOS 8) still has that structure, you may
need to configure those repos.

There is also EPEL ("Extra Packages for Redhat Linux"), which contains
packages from Fedora. Maybe perl-DateTime-Format-DateParse is in EPEL?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Need to find the no. of connections for a database

2020-02-27 Thread Peter J. Holzer

On 2020-02-27 07:41:36 +, sivapostg...@yahoo.com wrote:
> Before taking a few reports, we need to ensure that only one connection is 
> made
> to the database and all other computers need to close the connection to that
> database.   This is to prevent any additional entry(ies) that could be made
> during the course of the report taking.

Do you have control over those reports or are they generated by a
third-party tool?

If the former, the best way is probably to just run them all in a single
REPEATABLE READ transaction. Then they will all reflect the state of the
database at the start of the transaction, regardless of what other
clients are doing in the meantime.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: trouble making PG use my Perl

2020-02-29 Thread Peter J. Holzer

On 2020-02-28 07:13:12 -0600, Steven Lembark wrote:
> If you use the centos pre-compiled glob then you'll get their 
> pre-compiled paths to their pre-compiled Perl which, among
> other things, is compiled with all optimization turned off, 
> with 5.00503 compatibility turned *on*, and a host of other
> pure idiocies that make their perl unsuitable for human use.

I don't have access to a current CentOS, but both claims are wrong for
even the ancient RHEL 6 systems we still have around for some reason.

I find it hard to believe that someone would find it necessary to turn
on bincompat5005 in 2019 when they already considered that obsolete in
2010.

Same for optimization. There are some optimizations that a general
purpose binary distribution like RHEL can't do (like optimizing for a
specific processor type), but frankly I doubt that perl's Configure
turns them on.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: trouble making PG use my Perl

2020-02-29 Thread Peter J. Holzer

On 2020-02-28 07:17:27 -0600, Steven Lembark wrote:
> RH and Debian distros are distriuted by heavy Python users who
> go out of their way to hamstring Perl at every step.

That's just complete bullshit, at least for Debian (I don't have current
experience with Redhat). The maintainer system makes that very unlikely.

> Unless things have changed massively, the Perl they distribute is not
> only out of date it's nearly broken out of the box.

I have been programming in Perl for 25 years, much of that time on
Redhat and Debian systems. Perl was sometimes a bit out of date (usually
because the OS was a bit out of date - that's the price you pay for
stability), but never as much as on commercial Unixes like HP-UX or
Solaris, and it was never broken. I have occasionally compiled perl
myself, usually because I wanted to try out some new features, sometimes
because I tried to make it faster. But for production use I've almost
always used the system-supplied perl.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Restrict user to create only one db with a specific name

2020-03-08 Thread Peter J. Holzer

On 2020-03-06 16:39:14 -0700, David G. Johnston wrote:
> On Fri, Mar 6, 2020 at 4:28 PM Tiffany Thang  wrote:
> Is there a way in PostgreSQL 12 to restrict user to creating a database
> with a specific database name?
[...]
> Why does userA need create database privileges?

Not speaking for the OP, but:

Some test frameworks (e.g. the one included in Django) like to create
their own test database to ensure that it only contains the test data.
So if you are using one of these as intended the user running the tests
needs to be able to create and drop databases.

If you are running a tests from multiple projects against the same
cluster, it might be a good idea to ensure that each job can only create
(and drop) their own test database and not those of other jobs (or -
worse - the production database).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Advice request : simultaneous function/data updates on many databases

2020-03-08 Thread Peter J. Holzer

On 2020-03-04 14:42:01 -0800, Guyren Howe wrote:
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange  
> wrote:
> Essentially we wish to reduce the window where the frontend and backend
> aren't synchronised.
> 
> If we have (for example) 200 databases which each take 2 seconds to
> update, a client could be on the wrong frontend code for over 6 minutes.
> 
> 
> Send each of the servers a PL/PGSQL method that executes all the things in a
> transaction and then waits until the same clock time to commit.

Last time I looked, some DDL commands (especially "drop table") took an
exclusive lock on the affected table. So you may want to keep
transactions which execute such commands very short to prevent them from
blocking other transactions for a noticeable amount of time.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Determining the type of an obkect in plperl

2020-03-08 Thread Peter J. Holzer

On 2020-03-05 05:59:10 -0500, stan wrote:
> On Wed, Mar 04, 2020 at 05:09:19PM -0700, David G. Johnston wrote:
> > On Wed, Mar 4, 2020 at 4:21 PM stan  wrote:
> > > I am in the process of writing a plperl function. In this function
> > > I need to compare the data in the NEW versus OLD structures. I am
> > > writing this as a generic subroutine, so I am looping through and
> > > comparing the 2 to see what is different. Problem is, that I need
> > > to know whether to us n != or a ne comparison.
> > >
> > > how can I determine what the data type of the value element is?
> > >
> > 
> > Not up to speed on Perl but you basically want everything to be done using
> > string equality - can't you just use "ne" everywhere and not worry about
> > comparing numbers using string comparison logic?  Might want to disabled
> > warnings...
[...]
> 
> Since I am just looking for differences, this may work.

It should work. In fact it is probably the most robust method. Perl
scalars don't have a type like "int" or "string". They can be both (or
neither) at the same time. Also, a numeric comparison may not be the
right thing even for values which look like numbers:

Consider:

#!/usr/bin/perl
use v5.12;
use warnings;
use Scalar::Util qw(looks_like_number);

my $x = "123456789012345678901";
my $y = "123456789012345678902";

say "eq(1): ", $x eq $y;

if (looks_like_number($x) && looks_like_number($y)) {
say "==(2): ", $x == $y;
} else {
say "eq(2): ", $x eq $y;
}

say "eq(3): ", $x eq $y;

This will print

eq(1):
==(2): 1
eq(3):

on my system: The string comparisons correctly determine $x and $y to be
not equal, but the numeric comparison says they are equal (the two
numbers are too large to fit into a 64 bit unsigned int, so perl will
use a 64 bit float for the comparison which will cause both to be
rounded to 123456789012345683968).

(I included the second string comparison to check that the numeric
comparison didn' affect the values, which I wasn't sure of).

> Presently I am getting some warnings, so I think I need to deal with the
> types. I already dealt with the columns that return NULL, these are
> undefined in the Perl hash, so I have to test for their existence before
> attempting the compare.

What do you mean by "undefined in the hash"?

* The keys exist but the values are undef
* The keys don't exist

Those are not the same thing, and you use different functions to test
for them (defined() and exists() respectively).

(I think it should be the former, but I'm too lazy to test it myself)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Real application clustering in postgres.

2020-03-08 Thread Peter J. Holzer

On 2020-03-06 15:55:27 +0100, Laurenz Albe wrote:
> On Fri, 2020-03-06 at 03:25 -0600, Ron wrote:
> > > RAC is not really a high availability solution: because of the shared
> > > storage, it has a sibgle point of failure.
> > 
> > This is utter nonsense.  Dual redundant storage controllers
> > connected to disks in RAID-10 configurations have been around for at
> > least 25 years.
> > 
> > Oracle got it's clustering technology from DEC, and I know
> > that works.  Cluster members, storage controllers and disks have all
> > gone down, while the database and application keep on humming along.
> 
> I am not saying that it is buggy, it is limited by design.
> 
> If you have mirrored disks, and you write junk (e.g, because of
> a flaw in a fibre channel cable, something I have witnessed),
> then you have two perfectly fine copies of the junk.

I have certainly seen enterprise SAN boxes go down (or deliver corrupted
data) because of controller or firmware problems or just because a
second disk in a RAID-5 failed before the spare could be brought online.

But to be fair, a master/slave setup a la patroni isn't immune against
"writing junk" either: Not on the hardware level (either of the nodes
may have faulty hardware, and you may not notice it until too late), and
more importantly, not on the software level. An erroneus DML statement
(because of a bug in the application, or because the user/admin made a
mistake) will cause the same wrong data to be distributed to all nodes
(of course this also applies to RAC).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Table with many NULLS for indexed column yields strange query plan

2020-03-08 Thread Peter J. Holzer

On 2020-03-05 18:08:53 -0700, greigwise wrote:
> I have a query like this:
> 
> SELECT  "table1".* FROM "table1"
> INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
> INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id"
> WHERE "table3"."number" = ''
> AND ("table2"."type") IN ('Standard') ;
> 
> table2 has a large number of NULLS in the column table3_id.  There is an
> index on this column.  Here is the result of explain analyze:
> 
> Merge Join  (cost=1001.20..4076.67 rows=17278 width=167) (actual 
> time=284.918..300.167 rows=2244 loops=1)
>Merge Cond: (table2.table3_id = table3.id)
>->  Gather Merge  (cost=1000.93..787825.78 rows=621995 width=175) (actual 
> time=5.786..283.269 rows=64397 loops=1)
>  Workers Planned: 4
>  Workers Launched: 4
>  ->  Nested Loop  (cost=0.87..712740.12 rows=155499 width=175) 
> (actual time=0.091..102.708 rows=13107 loops=5)
>->  Parallel Index Scan using index_table2_on_table3_id on 
> table2  (cost=0.43..489653.08 rows=155499 width=16) (actual 
> time=0.027..22.327 rows=13107 loops=5)
>  Filter: ((type)::text = 'Standard'::text)

It looks like it postgresql is scanning the index here to get the
entries in the right order for the merge join. It's strange that it
thinks this is a good strategy even though it has to visit every row in
the table (no index cond).

How is the selectivity of "type"? Would an index on that column help?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Real application clustering in postgres.

2020-03-10 Thread Peter J. Holzer

On 2020-03-09 09:57:37 +0100, Laurenz Albe wrote:
> On Sun, 2020-03-08 at 21:13 +0100, Peter J. Holzer wrote:
> > But to be fair, a master/slave setup a la patroni isn't immune against
> > "writing junk" either: Not on the hardware level (either of the nodes
> > may have faulty hardware, and you may not notice it until too late), and
> > more importantly, not on the software level. An erroneus DML statement
> > (because of a bug in the application, or because the user/admin made a
> > mistake) will cause the same wrong data to be distributed to all nodes
> > (of course this also applies to RAC).
> 
> Of course, nobody debates that.
> 
> A high-availability solution only protects you from certain, well-defined
> kinds of problems, usually related to hardware.

Right. And enterprise class SAN storage does this: It protects you from
failure of a single disk, a single cable, a single controller. Very
often you can physically spread out the components so that loss of a
whole rack (or server room) wouldn't affect availability. There are of
course limits: When a message sent over a single cable is corrupted in a
way that the checksum doesn't catch, corrupted data may be stored. But
then if data in RAM is corrupted that ECC doesn't catch, the same will
happen. A Patroni-based cluster isn't free of single points of failure
either.

So I don't buy the argument "X isn't a high availability solution
because it uses shared storage". While I have seen expensive SAN boxes
fail, I've also managed to wreck Patroni clusters. I'm not at all
convinced that the availability of a Patroni cluster is higher than that
of a failover cluster using shared storage. 

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: strange locks on PG 11 with Golang programs

2020-03-11 Thread Peter J. Holzer

On 2020-03-09 11:02:37 +0200, Achilleas Mantzios wrote:
> Fully review your programs for connection / xaction leaks. Do you use a
> connection pool?

Go's sql package encourages the use of connection pools (type DB) over
single connections (type Conn):

| Prefer running queries from DB unless there is a specific need for a
| continuous single database connection.

All tutorials I've seen follow this recommendation, so a Go programmer
might not even be aware that connections exist.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: PostgreSQL10.x client Vs. PostgreSQL 11.x server

2020-03-13 Thread Peter J. Holzer

On 2020-03-13 08:47:59 +, Jimmy Angelakos wrote:
> By using an older client, you will be missing out on the additional
> features that Postgres 11 (and its corresponding client) supports.

This is not correct. You can use all the additional features which only
affect the server. None of the major enhancements mentioned at the start
of the release notes (partitioning, parallelism, stored procedures, JIT,
...) should depend on the client.

I am talking about low level client libraries here, like libpq or a
ODBC, JDBC or DBD driver. An ORM or similar framework may well depend on
server-side enhancements. Also, client programs (e.g. psql) may also
have some enhancements.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Formatting output (was: Order by and timestamp)

2020-03-16 Thread Peter J. Holzer

On 2020-03-15 16:48:35 -0500, Steven Lembark wrote:
> On Sun, 15 Mar 2020 22:33:35 +0100:wq
> Björn Lundin  wrote:
> 
> > And to my surprise i get a result like this  (note the order of
> > column STARTTS)
> 
> (1) Suggest using "pastebin.com" for this kind of data. It may not
> look very pretty -- or readable at all -- on the viewer's end
> depending on their settings (see below for example).
[...]
> >   marketid   |marketname|   startts   |
> > eventid ….
> > ….
> > 
> >  1.127253880 | To Be Placed | 2016-09-29 16:10:00 |
> > 27951325 | 1.127275624 | 1m4f Hcap| 2016-09-30
> > 16:20:00 | 27953169 | 1.127275625 | To Be Placed |

This is weird. The output is correctly formatted in the text/plain part
and the HTML part looks reasonable, too: Every line is in a div of its
own, so it shouldn't be jumbled together like that (Out of curiosity:
What mail program did use to read this?). The only obvious problem I see
is the use of the "Menlo" font. Non-Mac Users won't have that and won't
know that they should substitute a monospace font, so the alignment will
be off.

As far as pasted output on this list goes, Björn's message looked
actually quite good.

Some problems I see frequently and find annoying:

* Wide output (especially explain plans) with wrapped lines. I find
  these almost impossible to read, so I have to save the mail to a file
  and manually undo the line breaks to read it. I rarely bother to do
  that.

* ASCII graphics which only line up in a certain proportional font

* text/plain messages with very long lines which really should be
  paragraphs.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Peter J. Holzer

On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> First, it sounds like you care about there being no gaps in the records you 
> end
> up saving.  If that is the case then sequences will not work for you.

I think (but I would love to be proven wrong), that *nothing* will work
reliably, if

1) you need gapless numbers which are strictly allocated in sequence
2) you have transactions
3) you don't want to block

Rationale:

Regardless of how you get the next number, the following scenario is
always possible:

Session1: get next number
Session2: get next nummber
Session1: rollback
Session2: commit

At this point you have a gap.

If you can afford to block, I think a simple approach like

create table s(id int, counter int);
...
begin;
...
update s set counter = counter + 1 where id = $whatever returning counter;
-- use counter
commit;

should work. But that effectively serializes your transactions and may
cause some to be aborted to prevent deadlocks.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Peter J. Holzer

On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:
> > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> >> First, it sounds like you care about there being no gaps in the records 
> >> you end
> >> up saving.  If that is the case then sequences will not work for you.
> > 
> > I think (but I would love to be proven wrong), that *nothing* will work
> > reliably, if
> > 
> > 1) you need gapless numbers which are strictly allocated in sequence
> > 2) you have transactions
> > 3) you don't want to block
> > 
> > Rationale:
> > 
> > Regardless of how you get the next number, the following scenario is
> > always possible:
[...]
> > At this point you have a gap.
> > 
> > If you can afford to block, I think a simple approach like
[...]
> > should work. But that effectively serializes your transactions and may
> > cause some to be aborted to prevent deadlocks.
> 
> OP  has said small gaps are ok.

Yes. This wasn't a response to the OP's requirements, but to David's
(rather knee-jerk, IMHO) "don't use sequences" response. Very often the
requirements which would preclude sequences also preclude any other
solution.

(In the case of the OP's problem, I'd agree that sequences are probably
a bad idea for the reasons he anticipates)

> To me that says the requirement

Which requirement? The OP's or the one I posed here?

> is capricious but we haven’t heard the rationale for the requirement
> yet (or I missed it)

The OP gave a rationale: He has to fit the counter into an 8-digit
field, and a global counter would overflow that. So he needs per-element
counters.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Peter J. Holzer

On 2020-03-20 17:53:11 -0700, Adrian Klaver wrote:
> On 3/20/20 4:29 PM, Peter J. Holzer wrote:
> > On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> > > On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:
> > > > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> > > > > First, it sounds like you care about there being no gaps in the 
> > > > > records you end
> > > > > up saving.  If that is the case then sequences will not work for you.
> > > > 
> > > > I think (but I would love to be proven wrong), that *nothing* will work
> > > > reliably, if
> > > > 
> > > > 1) you need gapless numbers which are strictly allocated in sequence
> > > > 2) you have transactions
> > > > 3) you don't want to block
[...]
> > Yes. This wasn't a response to the OP's requirements, but to David's
> > (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> > requirements which would preclude sequences also preclude any other
> > solution.
> 
> I don't see  a knee-jerk reaction in this:
> 
> https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com
> 
> The response was if you cared about gaps(not something the OP had specified
> at that point) then a sequence would not work.

And I think that "care about gaps -> sequence doesn't work" is a
knee-jerk reaction. It's similar to "can't parse HTML with regexps".
True in the general case, and therefore people tend to blurt it out
every time the topic comes up. But not necessarily true in specific
cases. As I wrote above, there is no perfect solution - so you have to
think about the actual requirements and the consequences of various
solutions - and maybe using a sequence is the best (or least bad)
solution.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-21 Thread Peter J. Holzer

On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
> To me the description of the ID smacks of database-in-the-name folly. I
> recognize that OP is likely unable to take another path. I’ll not push this 
> any
> further. 

Identifiers often have internal structure. In Austria for example, the
social security number contains the birth date. Invoice numbers, project
numbers or court case numbers often contain the year.

That's because they are used by *people*, and people like their
identifiers to make some kind of sense. The computer doesn't care.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-22 Thread Peter J. Holzer

On 2020-03-21 14:51:35 -0600, Rob Sargent wrote:
> > On Mar 21, 2020, at 1:13 PM, Peter J. Holzer  wrote:
> > 
> > On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
> >> To me the description of the ID smacks of database-in-the-name folly. I
> >> recognize that OP is likely unable to take another path. I’ll not push 
> >> this any
> >> further. 
> > 
> > Identifiers often have internal structure. In Austria for example, the
> > social security number contains the birth date. Invoice numbers, project
> > numbers or court case numbers often contain the year.
> > 
> > That's because they are used by *people*, and people like their
> > identifiers to make some kind of sense. The computer doesn't care.
> 
> Since OP said this was digital not paper, I see this as a presentation
> problem bleeding into database design (assuming your response was an
> invitation to push further).

Well, that's the old natural primary key vs. surrogate primary key
debate.

(Trigger warning: You probably already heard that a gazillion times)

A primary key is an attribute (or a combination of attributes) which
unambiguosly identifies each entity. 

If the entity in question already has such an atttribute (e.g. an SSN,
invoice number, username, ...) and you use that as the primary key, is
is called a natural primary key[1].

If you generate a new attribute not related to existing attributes it is
called a surrogate (primary) key.

If the entity already has an attribute suitable as a primary key, why
would you want to use another one?

* The key candidate may be unwieldy: It might be a combination of
  several attributes, it might be a very long string, etc.
* You may suspect that the key candidate is not in fact a key. There
  have been cases of duplicate SSNs in several countries, and of course
  not every person has an SSN.

Some people say you should always use surrogate keys, never natural
keys. Mostly because of the second reason, I think: When you design a
system you have limited knowledge, so you can never be absolutely sure
that what appears to be a key candidate actually is one.

OTOH, surrogate keys have downsides, too: The attributes which would
have been a natural key are still there and must be stored, input,
output, maybe generated, used in business rules, etc. So that's a
(usually modest) increase in application complexity. More importantly,
everywhere you would have had a (to a domain expert) meaningful key you
now have a meaningless key - which makes it harder to understand the
data and makes extra joins necessary. So to inspect the contents of a
table instead of a simple "select * from tablename" you need a join over
three or four tables.

To get back to the OP's problem: As I understand it he needs to generate
that compound identifier anyway, because it is used outside of the
application. He could use a surrogate key in addition to that, but I
don't see any advantage to that. Most importantly it doesn't solve the
problem he wants to discuss in this thread[2].

hp

[1] But note that these attributes are almost always already artificial:
Humans aren't born with an SSN - that is assigned by an authority to
uniquely identify their clients; courts have sat for centuries
without the need to number their cases; etc.

[2] I admit I haven't been very helpful in this regard, either, going
off on tangents at every opportunity.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-22 Thread Peter J. Holzer

On 2020-03-21 13:45:54 -0700, pabloa98 wrote:
> On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer  wrote:
> And I think that "care about gaps -> sequence doesn't work" is a
> knee-jerk reaction. It's similar to "can't parse HTML with regexps".
> True in the general case, and therefore people tend to blurt it out
> every time the topic comes up. But not necessarily true in specific
> cases. As I wrote above, there is no perfect solution - so you have to
> think about the actual requirements and the consequences of various
> solutions - and maybe using a sequence is the best (or least bad)
> solution.
> 
> OK. In that case, I will proceed with the millions of sequences. We will see.

Nonononononononono! (channelling Jean-Claude Laprie[1])

I was trying to say that in each case one should think about the pros
and cons of different approaches and be wary of dogmatic rules.

I wasn't recommending the use of sequences in your case. I would
actually be surprised if they work well for you.

But having said that - by all means try it. Maybe it does work, and
you'll give a talk about "The 50 million sequences database" at a
conference next year :-). Or maybe it doesn't, and then you'll have
learned something about where the actual limits are.

hp

[1] I don't expect you to know him :-)

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Loading 500m json files to database

2020-03-24 Thread Peter J. Holzer

On 2020-03-23 17:18:45 -0700, pinker wrote:
> Christopher Browne-3 wrote
> > Well, you're paying for a lot of overhead in that, as you're
> > establishing a psql command, connecting to a database, spawning a
> > backend process, starting a transactions, committing a transaction,
> > closing the backend process, disconnecting from the database, and
> > cleaning up after the launching of the psql command.  And you're
> > doing that 500 million times.
> > 
> > The one thing I left off that was the loading of a single tuple into
> > json_parts.
[...]
> > Reducing that overhead is the single most important thing you can do.
> 
> Yes, I was thinking about that but no idea now how to do it right now.

Do you know any programming language (Python, Perl, ...)? You'll
probably get a huge gain from writing a script that just opens the
connection once and then inserts each file. 

Copy usually is even faster by a fair amount, but since you have to read
the data for each row from a different file (and - if I understood you
correctly, a remote one at that), the additional speedup is probably not
that great in this case.

Splitting the work int batches and executing several batches in parallel
probably helps.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: core. files inside base directory

2020-03-26 Thread Peter J. Holzer

On 2020-03-26 10:42:46 +, Daulat Ram wrote:
> Hello, I have the core. Files inside the base directory of my postgres 10 
> setup
> . Would you please let me know the importance of these files . When they
> generate . Can we delete them ?

Core files are created by the kernel when a process crashes. Unless you
already know which program crashed and why, it is probably worth
investigating.

As a first step you could just use the file utility to find out which
program crashed.

file core.*

should print something like 

core.26166: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
SVR4-style, from 'sleep 120', real uid: ... execfn: '/bin/sleep' ...

for each file. (Of course the program won't be "sleep" in your case.

To analyze the coredumps further you would have to use a debugger (e.g.
gdb).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Idle sessions keep deleted files open

2020-03-29 Thread Peter J. Holzer

I noticed this with PostgreSQL 11.7 on Linux:

I did a «vacuum full» on a table which had become quite bloated (size
before the vacuum: 392 GB, after: 291 GB). I expected the disk usage to
increase as the new table was written and then drop sharply after the
vacuum was finished.

However, that didn't happen. While usage increased by about 300 GB
during the vacuum, it stayed the same afterwards (as reported by «df»).

However, «du» reported about 400 GB less than «df». So apparently, files
had been unlinked but were still in use by some process.

«lsof» confirmed this: There were some idle sessions which were keeping
the files open. As soon as I got those sessions to execute some request,
they closed the files, causing the disk space to be freed.

I think I noticed that before when dropping tables, but didn't draw the
right conclusion at the time.

So, if you are reorganizing large tables, keep an eye on idle sessions -
they may keep deleted files around for quite some time.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

1 2 3 4 5 6 7 >

1 - 100 of 632 matches

Mail list logo