Re: Rationale for aversion to the central database?
On 2018-04-08 14:39:49 -0700, Guyren Howe wrote: > I am a Rails developer at a medium-large size company. I’ve mostly worked at > smaller companies. I’ve some exposure to other web development communities. > > When it comes to databases, I have universally encountered the attitude that > one should treat the database as a dumb data bucket. There is a *very* strong > aversion to putting much of any business logic in the database. I encounter > substantial aversion to have multiple applications access one database, or > even > the reverse: all abstraction should be at the application layer. Like Tim I would like to make a case for finding a middle ground. The sweet spot may be different from project to project and it is certainly different for different teams or companies. I'll try to outline what I like to solve within the database and what I like to put into the application, and why: Everything which is easy to specify *declaratively* goes into the database: Tables (of course), views, constraints, etc. These are basically type declarations for the data which is stored in the database, so it belongs there. Simple stored procedures and triggers go into the database, too. They should make using the database easier and should not surprise the user. This is mostly optional. Anything I would consider "application logic" (when the user does X, then the system should do Y) goes into the application. There is often a distinction between a backend (or data access layer) and a frontend. The backend could theoretically be within the database and be called via stored procedures, but I find it much easier to test and debug code outside of the database (this may be just my lack of knowledge about available tools). In my applications I use SQL heavily. RDBMs are good at processing queries, so use them for that. If all you want is a key-value store, don't use PostgreSQL. I'm not very fond of ORMs. I know what I want to do and can express it in SQL. An ORM makes me translate that into a different (and usually inferior) query language, which is then translated back into SQL. That doesn't make things easier for me. > My best theory is that these communities developed at a time when Windows was > more dominant, and just generally it was *significantly* easier to use MySQL > than Postgres for many, particularly new, developers. I come from Oracle, not MySQL, But I have also used MySQL, and I guess the very wide gap in capabilities between Oracle and MySQL made me cautious about putting too much into the database. There is also the expectation that you should be able to use a different database engine (SQL is a standard, right?) just like you should be able to use a different C compiler, but in practice that never works. And of course I wasn't very impressed with PL/SQL. (PostgreSQL gives you a much wider range of languages for stored procedures than Oracle, but PL/PerlU still isn't quite the same as Perl (And I suspect it's the same for Python). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Rationale for aversion to the central database?
On 2018-04-08 19:39:43 -0400, Stephen Frost wrote: > * Alvaro Aguayo Garcia-Rada (aagu...@opensysperu.com) wrote: > > 1. Portability. Being tied to a single database engine is not always > > a good idea. When you write business logic in database, you have to > > write and maintain your store procedures for every database engine > > you want to support. That can be really complicated, and will surely > > take pretty much time, as programming languages for different > > databases are very different from each other. And it's permanent: > > Every time you make a change to a store procedure, you must make > > that change for every supported database. > > The portability claim tends to be both a false one and often, when > realized, results in a solution where you aren't using the database for > anything complicated and you'd be better off with a much simpler data > store. You also don't actually offer any justification for the claim > that being tied to a single database engine is not always a good idea- > why is that? With commercial databases it tends to be because you are > at the behest of some very large commercial company- but that isn't an > issue with PostgreSQL. Many companies don't like to support multiple database engines. There is the license issue, true, but having DBAs is even more important. If you are an Oracle shop you have your Oracle DBAs, and they know how to make backups (and how to restore them), how to monitor the database, how to track down performance issues, etc. If some application needs MS-SQL or PostgreSQL or MySQL, they won't have that expertise, so they will have to be trained, or you need additional DBAs (maybe on a part-time or support contract basis). So all other things being equal, companies will prefer applications which work with databases they already use. So if you are an application developer, it makes sense to develop your application to work with several databases. It vastly expands your potential clientele. At least that was the situation 10 years ago. These days much software is offered as a service. If the customer sees only a REST API and doesn't have to host the database on their own servers, they won't care about the RDBMS underneath. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Rationale for aversion to the central database?
On 2018-04-27 22:52:39 +, g...@luxsci.net wrote: > Perhaps I'm extreme. In my ideal world, developers might not even know table > names! I'm kidding ,sorta... If they don't know the table names, how can they write those stored procedures? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Rationale for aversion to the central database?
On 2018-04-28 09:54:27 -0500, Steven Lembark wrote: > On Sat, 28 Apr 2018 08:02:21 +0200 > "Peter J. Holzer" wrote: > > > On 2018-04-27 22:52:39 +, g...@luxsci.net wrote: > > > Perhaps I'm extreme. In my ideal world, developers might not even > > > know table names! I'm kidding ,sorta... > > > > If they don't know the table names, how can they write those stored > > procedures? > > One of the main reasons DBA's need to be pigs. Much spaghetti can be > avoided in ORM frameworks with well-designed views. Apparently my remark was too short (and perhaps with too little context) to be clear. As I understood g...@luxsci.net, they were arguing for writing lots of stored procedures so that developers wouldn't even have to know the table names (they would just call the procedures) [Although I now see that it was somebody else who proposed that as the only interface]. But those stored procedures don't write themselves. Somebody has to write them and a person who writes code is called a developer. So there needs to be at least one developer who knows the table names - the developer of the stored procedures. And that developer should better be good - API design is hard, and if you take away SQL from your fellow developers you should give them something better (for the task at hand), not something worse. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Domain based on TIMEZONE WITH TIME ZONE
On 2018-05-10 21:37:26 +0100, Ben Hood wrote: > On 10 May 2018, at 16:33, Francisco Olarte wrote: > > For what you want to do I think you'll have to parse the text value, > maybe by definig a view with a text columns and using some > rule/trigger magic for insert / updates. > > > Sorry for being unclear - the solution I have in production appears to work > with > > CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE > FROM VALUE) = 0); > > This raises an exception when an app doesn’t use UTC. I don't understand how this can work. As Francisco demonstrated, EXTRACT(TIMEZONE FROM ts) doesn't extract the time zone from the value ts, it reports the offset of the client's time zone. So, if my time zone is set to Europe/Vienna, extract(timezone from '2018-05-10 23:17:44+00'::timestamptz) will still return 7200, even though I have explicitely specified a UTC timestamp. What your check probably does is to enforce that the client's time zone is set to UTC. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Domain based on TIMEZONE WITH TIME ZONE
On 2018-05-13 12:46:42 -0700, Adrian Klaver wrote: > Not trying to trick anyone and no magic. The difference in the represented > values between ts_tz and ts_naive is the heart of my argument. Timestamptz > values are stored in manner that allows you to have the output with a time > zone offset. Timestamps w/notz are not. I disagree. The difference isn't in how they are *stored*. We have already established that they are stored in the same format. The difference is in their *semantics*. A timestamptz denotes a unique and unambiguous point in time. This point in time can be represented in various time zones. So the point in time when Apollo 11 launched can be represented as '1969-07-16 09:32:00-04' (local time) or '1969-07-16 13:32:00+00' (UTC) or '1969-07-16 14:32:00+01' (CET). These are just different ways to denote the same point in time - and in fact all three are stored as the same timestamptz value (-1455288000, I think). Only when displaying the value or doing certain operations on it is it converted to YMDhmsfz format. A timestamp without timezone does NOT denote an unambiguous point in time. It is just a compact form of representing a date and time. But without any additional context (the location or time zone) this doesn't tell you much. '2018-01-01 00:00' in Kiribati was 25 hours before '2018-01-01 00:00' in American Samoa. > > But timestamps do not have timezone. They are points in the time line. > > Points in earth surface have timezones, countries have timezones, but > > nor timestamp. > > I don't know about you but I am living on the earths surface:). That means > when I deal with timestamps they are with reference to a location. But when you store a timestamp as a timestamptz, you lose that reference to a location. All that is left is an abstract reference to a point in time. Only when you read that value again (and do certain operations with it) is that reference to a location added again - but the current location of the reader, not the the original locaton (that is lost forever, unless it was stored elsewhere). > I will agree that timestamptz is stored as number only. However that number > in Postgres has an implied time zone of UTC: > > https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT > > "For timestamp with time zone, the internally stored value is always in UTC > (Universal Coordinated Time, traditionally known as Greenwich Mean Time, > GMT)" This is not actually true. There is nothing in the storage format which depends on UTC (well, the epoch is at Midnight UTC, at if you say the epoch is at 08:00 Beijing time it is equally correct). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Out of memory error with PG10.3, 10.4 but not 9.3.19
On 2018-05-15 15:02:48 +, ChatPristi wrote: > I have a SELECT command (in partitionned tables) that failed with: > psql:/tmp/query.txt:1: ERROR: out of memory > DETAIL: Cannot enlarge string buffer containing 1073741818 bytes by 32 more > bytes. 1073741818 is a bit less than 1GB and 1073741818+32 is a bit more. So you are obviously hitting a 1GB limit here. Given that 1GB is the maximum length of a character type value in PostgreSQL and the error message mentions a "string buffer", I suspect that your query tries to construct a very long string. Try to rewrite the query so that it creates several shorter strings instead. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Renice on Postgresql process
On 2018-05-07 11:04:31 -0700, Ben Chobot wrote: > On May 7, 2018, at 7:46 AM, Ayappan P2 wrote: > We are using Postgresql in AIX. Unlike some other databases, Postgresql > has > lot of other process running in the background along with the main > process. > > We do "renice" only on the Postgres main process. Is it sufficient to have > higher priority only for the main process or we have to do "renice" for > all > the Postgresql related process ( like wal writer, logger , checkpointer > etc.,) ? > > > What do you hope to achieve with your renicing? There is a compelling school > of > thought which holds that nice database processes take longer to relinquish > their resources, which doesn't end up helping anything at all. I think you misunderstood Ayappan. He doesn't want to make the database processes nicer, he wants to make them less nice ("higher priority"). So in theory, they should be able to complete requests faster because they aren't interrupted by other processes so often. Whether that is true, depends on whether the processes are cpu or disk bound and what exactly the "nice value" affects. The best way to find out is probably to try it. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: case statement within insert
On 2018-05-25 17:04:25 +0800, tango ward wrote: > > > I want to insert data from mysql into a table in postgresql. I want to check > when the subjectcode contains PE or NSTP so I can assign True or False to > another column in destination DB. > > > # Source data: > > # Source data: MySQL > curr_msql.execute(''' SELECT code, subjectname > FROM test_subj ''') > > # Destination > for row in curr_msql: > curr_psql.execute(''' INSERT INTO subs ( > created, modified, > subjcode, subjname, > is_pe_or_nstp) > > VALUES (current_timestamp, > current_timestamp, > %s, %s, > CASE > WHEN code like '%%PE%%' or code like '%%NSTP%%' > THEN True > ELSE False > END) ''', (row['code'], row['subjectname'])) Others have already explained why that doesn't work and how you can fix it. But since you are inserting one row at a time in a Python loop, I don't see any advantage in writing the condition in SQL. Do it in Python instead: for row in curr_msql: is_pe_or_nstp = 'PE' in row['code'] or 'NSTP' in row['code'] curr_psql.execute(''' INSERT INTO subs ( created, modified, subjcode, subjname, is_pe_or_nstp) VALUES (current_timestamp, current_timestamp, %s, %s, %s) ''', (row['code'], row['subjectname'], is_pe_or_nstp,)) A bit more readable, IMHO. Alternatively, import the complete table *unchanged* from MySQL (you can use copy_from() for this which is much faster than individual inserts), and then convert it with a single SQL statement. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: execute block like Firebird does
On 2018-05-30 15:16:56 +0200, Félix GERZAGUET wrote: > On Wed, May 30, 2018 at 2:50 PM, PegoraroF10 wrote: > So, is that possible to change a DO structure is ran, to be possible to > return one or more values ? > It would be like ... > DO returns(ID Integer, Description Text) as > $$ > begin > select ... > insert ... > select ... into ID, Description > end > $$ > > Using this way would be possible to create that script on client, call it > just one time and have a result for that execution, exactly the way a > "execute block" does on Firebird. [...] > > Since you seems to be able to construct dynamically the statement from the > client application, I think it is already possible to do that in one SQL > statement using CTE. [...] > So you can construct arbitrary complex thing using any combination of SELECT, > UPDATE and DELETE. But note that: | The sub-statements in WITH are executed concurrently with each other and | with the main query. Therefore, when using data-modifying statements in | WITH, the order in which the specified updates actually happen is | unpredictable. All the statements are executed with the same snapshot | (see Chapter 13), so they cannot “see” one another's effects on the | target tables. This alleviates the effects of the unpredictability of | the actual order of row updates, and means that RETURNING data is the | only way to communicate changes between different WITH sub-statements | and the main query. -- https://www.postgresql.org/docs/10/static/queries-with.html#QUERIES-WITH-MODIFYING In a DO block the statements are processed sequentially and each statement sees the results of the previous statements. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Inefficient plan on 10.4
I have this table: Table "public.relation" Column |Type | Collation | Nullable | Default ---+-+---+--+-- parent| integer | | not null | child | integer | | not null | type | character varying | | | sortorder | integer | | | valid_from| timestamp without time zone | | not null | now() from_job_queue_id | integer | | | id| integer | | not null | nextval('relation_id_seq'::regclass) Indexes: "relation_pkey" PRIMARY KEY, btree (id) "relation_child_idx" btree (child) "relation_parent_idx" btree (parent) Foreign-key constraints: "relation_child_fkey" FOREIGN KEY (child) REFERENCES concept(id) DEFERRABLE "relation_parent_fkey" FOREIGN KEY (parent) REFERENCES concept(id) DEFERRABLE which has about 150 million rows: wdsah=> select count(*) from relation; count --- 147810590 (1 row) I'm trying to get to get the siblings of a node (including itself): select r2.parent, r2.type, r2.child from relation r1, relation r2 where r1.child=643541 and r2.parent=r1.parent order by r2.type This worked fine on 9.5, but on 10.4 it takes several seconds: QUERY PLAN - Gather Merge (cost=12032221.68..25195781.12 rows=112822632 width=15) (actual time=4086.255..4086.257 rows=2 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=12031221.66..12172249.95 rows=56411316 width=15) (actual time=4080.862..4080.862 rows=1 loops=3) Sort Key: r2.type Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.57..1876228.04 rows=56411316 width=15) (actual time=3036.547..4080.826 rows=1 loops=3) -> Parallel Seq Scan on relation r1 (cost=0.00..1856722.83 rows=1 width=4) (actual time=3036.525..4080.802 rows=0 loops=3) Filter: (child = 643541) Rows Removed by Filter: 49270196 -> Index Scan using relation_parent_idx on relation r2 (cost=0.57..17041.69 rows=246351 width=15) (actual time=0.053..0.055 rows=2 loops=1) Index Cond: (parent = r1.parent) Planning time: 0.418 ms Execution time: 4090.442 ms If I disable sequential scans, it works fine again: QUERY PLAN --- Sort (cost=25246497.92..25584965.81 rows=135387158 width=15) (actual time=0.119..0.119 rows=2 loops=1) Sort Key: r2.type Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1.14..19513.79 rows=135387158 width=15) (actual time=0.101..0.104 rows=2 loops=1) -> Index Scan using relation_child_idx on relation r1 (cost=0.57..8.59 rows=1 width=4) (actual time=0.079..0.080 rows=1 loops=1) Index Cond: (child = 643541) -> Index Scan using relation_parent_idx on relation r2 (cost=0.57..17041.69 rows=246351 width=15) (actual time=0.018..0.020 rows=2 loops=1) Index Cond: (parent = r1.parent) Planning time: 0.446 ms Execution time: 0.165 ms There are several points about these plans that I don't understand: * Why does the nested loop expect 56E6 or even 135E6 rows? It expects 1 row for the outer table and then 246351 rows in the inner table for each of them. 1 * 246351 == 246351. So it should expect 246351 rows. (246351 itself is way too high, but the table has a very skewed distribution, and I've already set the statistics target to the maximum of 1, so there's not much I can do about that) * Why does the Parallel Seq Scan report actual rows=0? It did return 1 row (or is that the average per worker? That would be < 1, and probably rounded down to 0) hp PS: The obvious workaround is to remove "order by r2.type". I can easily get the required partial order in the application. But I'd like to understand what the optimizer is doing here. -- _ | Peter J. Holzer| we build much bigger, bett
Re: User documentation vs Official Docs
On 2018-07-18 08:09:35 +1000, Tim Cross wrote: > If using web widgets to author content on the wiki is the main > impediment for contributing content, maybe we should see if the wiki > provides alternative access methods. I've used wikis in the past which > allowed users to upload content via xmlrpc, api etc. Perhaps something > similar could be made available for those making significant > contributions or to a select few 'curators' who could accept content > from others. There are also browser plugins like It's all text, textern, wasavi, etc. which allow the user to use a real text editor instead of a text area. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: User documentation vs Official Docs
On 2018-07-19 11:43:18 -0600, Rob Sargent wrote: > On 07/19/2018 11:04 AM, Peter J. Holzer wrote: > > On 2018-07-18 08:09:35 +1000, Tim Cross wrote: > > > If using web widgets to author content on the wiki is the main > > > impediment for contributing content, maybe we should see if the wiki > > > provides alternative access methods. I've used wikis in the past which > > > allowed users to upload content via xmlrpc, api etc. Perhaps something > > > similar could be made available for those making significant > > > contributions or to a select few 'curators' who could accept content > > > from others. > > There are also browser plugins like It's all text, textern, wasavi, etc. > > which allow the user to use a real text editor instead of a text area. > > > Keep in mind that Chrome broke "It's all text" compatibility, at least with > emacs and now it's done via "Edit with Emacs". This in my experience is a > step backwards from "It's all text". Oh, Chrome, too? Firefox also broke compatibility with a lot of add-ons recently ("It's all text" among them). So I switched to textern, which was a bit more complicated to set up, but otherwise works almost the same. But yeah, browser add-ons have a certain tendency to succumb to bit-rot, so they are nice tools for a user but not something a service provider should depend on. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Connections on cluster not being logged
On 2018-07-24 06:46:18 -0700, Adrian Klaver wrote: > On 07/24/2018 06:25 AM, Sandy Becker wrote: > > There is only one set of logs since it's a hardware cluster. The two > > nodes share the underlying database storage. Not sure why, but when the > > The community Postgres can't do that, have two instances share the same data > storage, at least AFAIK. So are you using some fork of Postgres or are there > actually two data directories? Maybe Postgresql only runs on the active node? I.e. a classic failover cluster. (I'm still not sure what a "hardware cluster" is. Probably some kind of appliance which packages two nodes, some storage and the HA software.) hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Duplicating data folder without tablespace, for read access
On 2018-08-14 11:57:38 -0400, Jack Cushman wrote: > I have a large database of text, with a 600GB table and a 100GB table > connected > by a join table. They both see occasional updates throughout the week. Once a > week I want to "cut a release," meaning I will clone just the 100GB table and > copy it to a "release" server for read-only web access. > > My procedure would be: > > - keep the 600GB table on a separate tablespace > - cleanly stop postgres on both servers > - copy the data folder to the release server > - delete pg_tblspc/* on the release server > - start postgres on both servers If you copy the whole database anyway before deleting the tablespace: Why don't you just drop the 600 GB table on the release server? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: PG8.3->10 migration data differences
On 2018-09-10 11:57:42 +0300, Achilleas Mantzios wrote: > On 10/09/2018 11:22, Csaba Ragasits wrote: > Hello, > > We would like to migrate our old databases to new postgres server, > with the simple backup-restore process. The process run fine > without errors but when we're checking the source and the > migrated datas as field level, we found the following differences: [...] > > Field type: REAL > pg93: 2.2 > pg10: 2.2005 > > > Those have to do with rounding. Precision for real is 6 decimal > digits. Your difference is on the 8-th digit. Nitpick: Precision is 24 binary digits which is *about* 6 decimal digits. The number which is actually stored (on both pg93 and pg10) is actually 2.200476837158203125 (1.00011001100110011001101 * 2^1 in binary). Apparently Pg9.3 rounds this to "2.2" on output while Pg10 prints the more precise (but still not exact) "2.2005". (I would argue that the Pg9.3 output is better, since it represents the same value in fewer digits, but always printing the minimum number of digits necessary is surprisingly difficult.) hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Out of Memory
On 2018-09-28 07:23:59 +0200, Laurenz Albe wrote: > Rob Sargent wrote: > > > Christoph Moench-Tegeder wrote: > > > > ## Laurenz Albe (laurenz.a...@cybertec.at): > > > > > > > > > vm.overcommit_memory = 2 > > > > > vm_overcommit_ratio = 100 > > > > > > > > > > Linux commits (swap * overcommit_ratio * RAM / 100), > > > > > > > > ^ > > > > That should be a "+". > > > > > > Yes; shame on me for careless typing, and thank you for the > > > correction. > > > > Are there any parentheses needed in that formula? > > No. It is swap space plus a certain percentage of RAM. > > I don't know how the Linux developers came up with that > weird formula. I suspect they copied it from some other Unix. Traditionally, Unix allocated all memory in the swap space. You could say that the swap space was the "real memory" and RAM was a cache for that (Hence the rule that swap must be at least as large as RAM and should preferrably be 2 or 4 times the size of RAM). So, when Unixes started to allow allocating more virtual memory than swap space, they were "overcommitting". But for Linux that doesn't make much sense, since a page lived either in RAM /or/ in swap right from the start, so the limit was always RAM+swap, not swap alone, and you are only overcommitting if you exceeded the size of the sum. The overcommitment in Linux is of a different kind: Linux uses copy on write whereever it can (e.g. when forking processes, but also when mallocing memory), and a CoW page may or may not be written in the future. It only needs additional space when it's actually written, so by counting the page only once (hoping that there will be enough space if and when that page is written) the kernel is overcommitting memory. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: How to maintain the csv log files in pg_log directory only for past 30 days
On 2018-09-29 07:12:32 -0700, Adrian Klaver wrote: > On 9/28/18 7:05 PM, Raghavendra Rao J S V wrote: > > Hope you all are recommending below settings to maintain only max 30 > > days logs in *pg_log* directory. Please correct me if I am wrong. > > Well it would actually be 31 days as: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html > > %d > Replaced by the day of the month as a decimal number [01,31]. [ tm_mday] Additionally, the log file for the 31st of a month is usually only overwritten after 61 days. > Not sure if that matters or not. It may or may not. Personally I prefer to use find -mtime (or logrotate, or cleandir, or keepfree, ...) to avoid the irregularities of the Gregorian calendar. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: COPY threads
On 2018-10-10 17:19:50 -0400, Ravi Krishna wrote: > > On Oct 10, 2018, at 17:18 , Andres Freund wrote: > > On October 10, 2018 2:15:19 PM PDT, Ravi Krishna wrote: > >> If I have a large file with say 400 million rows, can I first split it > >> into 10 files of 40 million rows each and then fire up 10 different > >> COPY sessions , each reading from a split file, but copying into the > >> same table. I thought not. It will be great if we can do this. > > > > Yes, you can. > > > Thank you. Let me test it and see the benefit. We have a use case for this. You should of course test this on your own hardware with your own data, but here are the results of a simple benchmark (import 1 million rows into a table without indexes via different methods) I ran a few weeks ago on one of our servers: https://github.com/hjp/dbbench/blob/master/import_pg_comparison/results/claudrin.2018-09-22/results.png y axis is rows per second. x axis are different runs, sorted from slowest to fastest (so 2 is the median). As you can see it doesn't parallelize perfectly: 2 copy processes are only about 50 % faster than 1, and 4 are about 33 % faster than 2. But there is a still quite a respectable performance boost. hp PS: The script is of course in the same repo, but I didn't include the test data because I don't think I'm allowed to include that. -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: convert real to numeric.
On 2018-10-18 10:15:40 -0400, Tom Lane wrote: > Alessandro Aste writes: > > I need to convert an SQL field from real to numeric, but I’m getting a > > strange behavior. > > select amount, amount::numeric, amount::numeric(16,4), > > amount::varchar::numeric from mytable where id = 32560545; > > Result: > > 17637.75, 17637.8, 17637.8000, 17637.75 > > You realize of course that "real" (a/k/a float4) can only be trusted > to six decimal digits in the first place. When I try this, I get 24 bits, actually. Using decimal digits when talking about binary numbers is misleading. [...] > You could ju-jitsu the system into duplicating that behavior by casting > to text (which invokes float4out) and then to numeric: > > regression=# set extra_float_digits to 1; > SET > regression=# select '17637.75'::real, '17637.75'::real::numeric; > float4 | numeric > --+- > 17637.75 | 17637.8 > (1 row) > > regression=# select '17637.75'::real, '17637.75'::real::text::numeric; > float4 | numeric > --+-- > 17637.75 | 17637.75 > (1 row) I suggest casting first to float8 and then to numeric. The conversion from float4 to float8 is exact, and any rounding error introduced by the float8->numeric conversion is certainly much smaller than the uncertainty of the original float4 value. > I think, however, that you ought to spend some time contemplating > the fact that that extra digit is partially garbage. If we assume that 17637.75 was the result of rounding a more precise value to a float4, then the real value was somewhere between 17637.7490234375 and 17637.7509765625. Rounding to 17637.8 introduces an error almost 50 times larger. > I'm not really convinced that doing it like this rather than doing the > standard conversion is a good idea. You can't manufacture precision > where there is none It may be that the real value of that number is only known to +/- 0.1. Or maybe only to +/- 100. But postgresql can't know that, and gratuitously adding additional rounding errors doesn't help. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: convert real to numeric.
On 2018-10-18 18:58:13 -0400, Tom Lane wrote: > "Peter J. Holzer" writes: > > On 2018-10-18 10:15:40 -0400, Tom Lane wrote: > >> You could ju-jitsu the system into duplicating that behavior by casting > >> to text (which invokes float4out) and then to numeric: > > > I suggest casting first to float8 and then to numeric. The conversion > > from float4 to float8 is exact, and any rounding error introduced by the > > float8->numeric conversion is certainly much smaller than the > > uncertainty of the original float4 value. > > I do not think that the OP will care for the results of that. The problem > is that now the output function will think that the result is worth > printing to 16 digits, and the last ten or so of those will be garbage. You are thinking about "printing" (i.e., presentation to a user), I am thinking about "converting" (i.e. storing the value as accurately as possible in a different presentation). These are different things, You should only think about "printing" when you actually print a value. Depending on the inherent accuracy of the value and the purpose of the display the best way to display the float4 value 17637.740234375 may be '17637.740234375' (the exact value), '17637.74' (the minimum number of decimal digits necessary to reconstruct the exact value), '17638' (we don't care about fractions), '17,700' (three digits ought to be enough for anybody), '17.7k' (the same, but more compact) or anything between. It is for the application programmer to decide how to display a value, because the programmer knows what it means, where it comes from and what the user is supposed to do with that information. The runtime environment doesn't know this. So it shouldn't throw away accuracy. Often even parts of the application don't know this. So they shouldn't either. > As an example, even though the cited value happens to work nicely: > > regression=# select '17637.75'::float4::float8; > float8 > -- > 17637.75 > (1 row) > > nearby ones don't: > > regression=# select '17637.74'::float4::float8; > float8 > - > 17637.740234375 > (1 row) > > Yeah, in some sense that's a valid representation of the stored float4, It is indeed the precise value which is stored (9030523 / 512). > but it likely has little to do with the originally presented value. It is much closer to the original value than 17637.7 (the relative errors are about 1.33e-8 and 2.27e-6, so the error introduced by PostgreSQLs default rounding is 170 times greater). I think using FLT_DIG and DBL_DIG for converting from binary to decimal is wrong. They represent the "number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits" (ISO/IEC 9899:2011), which is not the same as the number of decimal digits sufficient to convert a binary to decimal and back again. Unfortunately, the latter is not a constant and converting binary to decimal isn't trivial - nevertheless the problem has been solved (at least for IEEE-754 arithmetic), so I might take a stab at it (even with my limited numerical skills). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Why does jsonb_set() remove non-mentioned keys?
On 2019-07-05 10:59:31 +0200, Thomas Kellerer wrote: > Gianni Ceccarelli schrieb am 05.07.2019 um 10:00: > >> strict functions with sql null inputs yield sql null output without > >> even executing the function > > > > So when the SQL-level executor sees a call to any function declared > > strict with some NULL parameters, it doesn't call the function at > > all. `whatever_my_function('a string',1234,NULL)` is always `NULL` > > > > Ah, I see. Thanks for the clarification > > Then I would question if declaring jsonb_set as "strict" makes sense I think it does but I raise the same question for to_jsonb. It's defined on anyelement and the documentation says: | Returns the value as json or jsonb. Arrays and composites are converted | (recursively) to arrays and objects; otherwise, if there is a cast from | the type to json, the cast function will be used to perform the | conversion; otherwise, a scalar value is produced. For any scalar type | other than a number, a Boolean, or a null value, the text representation | will be used, in such a fashion that it is a valid json or jsonb value. The documentation explicitely singles out "a number, a Boolean, or a null value", but doesn't specify how they are treated. I would expect that they are treated equivalently, though: An SQL number is converted to a JSON number, an SQL boolean is converted to JSON true or false and an SQL null is converted to JSON null. Returning SQL null instead of a JSON null breaks that expectation for no discernible reason. It also isn't consistent, since an SQL null in an array or composite is converted to a JSON null, as I would expect. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Rearchitecting for storage
On 2019-07-19 10:41:31 -0400, Matthew Pounsett wrote: > Okay. So I guess the short answer is no, nobody really knows how to > judge how much space is required for an upgrade? :) As I understand it, a pg_upgrade --link uses only negligible extra space. It duplicates a bit of householding information, but not your data tables or indexes. Your 18 TB table will definitely not be duplicated during the upgrade if you can use --link. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Rearchitecting for storage
On 2019-07-19 11:37:52 -0400, Matthew Pounsett wrote: > On Fri, 19 Jul 2019 at 11:25, Peter J. Holzer wrote: > > On 2019-07-19 10:41:31 -0400, Matthew Pounsett wrote: > > Okay. So I guess the short answer is no, nobody really knows how to > > judge how much space is required for an upgrade? :) > > As I understand it, a pg_upgrade --link uses only negligible extra > space. It duplicates a bit of householding information, but not your > data tables or indexes. Your 18 TB table will definitely not be duplicated > during the upgrade if you can use --link. > > > The documentation for pg_upgrade --link says that the old copy is no longer > usable, which means it's modifying files that are linked. If it were only > modifying small housekeeping files, then it would be most efficient not to > link > those, which would keep both copies of the db usable. This was discussed recently: The old database is made intentionally unusable to prevent accidentally starting both (which would result in data corruption). > That seems incompatible with your suggestion that it doesn't need to > modify the data files. Depending on how it goes about doing that, it > could mean a significant short-term increase in storage requirements > while the data is being converted. > > Going back to our recent 'reindex database' attempt, pgsql does not > necessarily do these things in the most storage-efficient manner; it > seems entirely likely that it would choose to use links to duplicate > the data directory, then create copies of each data file as it > converts them over, then link that back to the original for an atomic > replacement. That could eat up a HUGE amount of storage during the > conversion process without the start and end sizes being very > different at all. I can't really think of a scenario in which this would be the best (or even a good) strategy to convert the database. I am quite confident that pg_upgrade doesn't do that at present and reasonably confident that it won't do it in the future. > Sorry, but I can't reconcile your use of "as I understand it" with > your use of "definitely". It sounds like you're guessing, rather than > speaking from direct knowledge of how the internals of pg_upgrade. I don't have direct knowledge of the internals of pg_upgrade, but I have upgraded a database of about 1 TB at least twice with --link. Since I had much less than 1 TB of free space and the upgrade completed very quickly, I am very confident that no user defined tables are copied. I have also been on this mailing list for a few years and read quite a few discussions about the usage of pg_upgrade in that time (though I may not always have paid much attention to them). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: How to run a task continuously in the background
On 2019-07-17 12:34:41 +0100, Rory Campbell-Lange wrote: > We make extensive use of postgresql 'contacting' an external process, > but our use case involves many databases in a cluster rather than many > schemas. Also we don't have to deal with cancelling the external > process. We chose this architecture to avoid many postgresql > connections for LISTEN/NOTIFY. > > We use a pgmemcache interface trigger to update memcached with a > notification of the database holding items to be flushed. Memcached is a cache. It will delete old items if storage is full (or if they expire). Is this not a problem in your case or did you make sure that this cannot happen? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Default ordering option
On 2019-07-26 09:52:52 +0200, Cyril Champier wrote: > No, the code I pasted was an existing production bug: the last_name > should have been unique, so the selected patient would always be the > same. This should have been detected in tests, but since the order was > "almost always the same", our test was green 99% of the time, so we > discarded it as flaky. > > Fuzzy testing could be an option, but this would go too far, as for > Peter extension suggestion. We have huge existing codebase with more > than 10K tests, and I do not want to modify our whole testing > strategy. > > Meanwhile, I went for an ORM patch (ActiveRecord) and forbid usages > that can workaround it. Another idea: How do ypu prepare your test data? Do you have a (possibly large) test database or do you populate a test database with test-specific data in a fixture? If you do the latter, you might be able insert the data in random order. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Why does backend send buffer size hardcoded at 8KB?
On 2019-07-27 19:10:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-07-27 18:34:50 -0400, Tom Lane wrote: > >> Yeah. The existing commentary about that is basically justifying 8K > >> as being large enough to avoid performance issues; if somebody can > >> show that that's not true, I wouldn't have any hesitation about > >> kicking it up. > > > You think that unnecessary fragmentation, which I did show, isn't good > > enough? That does have cost on the network level, even if it possibly > > doesn't show up that much in timing. > > I think it is worth doing some testing, rather than just blindly changing > buffer size, because we don't know how much we'd have to change it to > have any useful effect. I did a little test with nttcp between two of our servers (1 Gbit to different switches, switches connected by 10 Gbit). The difference between a 1024 byte buffer and a 1460 byte buffer is small but measurable. Anything larger doesn't make a difference. So increasing the buffer beyond 8 kB probably doesn't improve performance on a 1 Gbit LAN. I didn't test 10 Gbit LAN or WAN - those might be different. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Tablespace column value null on select * from pg_tables
On 2019-07-16 19:20:24 +, Alex Williams wrote: > 1. postgres=# ALTER DATABASE mydatabase SET TABLESPACE pg_default; > > ERROR: some relations of database "mydatabase" are already in tablespace > "pg_default" > > HINT: You must move them back to the database's default tablespace before > using this command. What's the reason for this error? Wouldn't it be simpler to leave relations alone which are already in the correct tablespace? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: How do I create a Backup Operator account ?
On 2019-07-31 07:48:36 +0200, Luca Ferrari wrote: > On Wed, Jul 31, 2019 at 2:48 AM Marcos Aurelio Nobre > wrote: > > But I don't know how to implement this on Linux, nor how to write this > > entry in the pg_hba.conf file. > > I would start with an entry in pg_hba.conf like the following: > > hostall pg_backup_usernamelocalhost md5 > > or > > hostall pg_backup_usernamelocalhost md5 > > > The problem then comes on how to prevent the operating system user to > run psql. This problem can be solved by not granting anyone shell access as that user. It is only used as a target for sudo, and sudo is configured to run only pg_dump and pg_restore as that user (plus maybe other programs to list available backups, review logs, remove old backups, ...) A web interface might be used as an alternative to sudo. > Even something like the following (untested) in /etc/sudoers will NOT > prevent the user to access the database: > > > User_Alias PGBACKUPUSERS = pg_backup_username > Cmd_Alias PGBACKUP = /usr/local/bin/pg_dump, > /usr/local/bin/pg_restore, ! /usr/local/bin/psql > PGBACKUPUSERS backup_host = PGBACKUP This is the wrong way around. It should be something like alice, bob = (pg_backup_username) /usr/local/bin/pg_dump (Apologies if I didn't get the syntax right. Slogging through the sudoes manual reminded me why I wrote xssd 15 years ago). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Understanding PostgreSQL installer debug log
On 2019-08-09 16:55:13 +0530, Ramesh Maddi wrote: > performing post-bootstrap initialization ... Failed to initialize the database > cluster with initdb with the error "invalid byte sequence for encoding "UTF8" > > Script stderr: > FATAL: invalid byte sequence for encoding "UTF8": 0xeb 0x2f 0xdb > child process exited with exit code 1 > initdb: removing contents of data directory "/home/postgres/9.6/data" > > Error running /home/postgres/9.6/installer/server/initcluster.sh "postgres" > "postgres" "/home/postgres/9.6" "/home/postgres/9.6/data" 5432 DEFAULT: FATAL: > invalid byte sequence for encoding "UTF8": 0xeb 0x2f 0xdb 0xeb 0x2f 0xdb is indeed not valid UTF-8. So whereever this sequence comes from isn't UTF-8 encoded. In ISO-8859-1 that sequence would be "ë/Û". Does this ring any bell? Any subdirectory with a name that ends in "ë", for example? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: How to check if a field exists in NEW in trigger
On 2019-08-07 15:54:33 -0700, Igal @ Lucee.org wrote: > There is no reason, however, for an email address to be not-lower-cased, so > while in some cases (no pun intended) it makes sense to keep the original > CaSe while performing a case insensitive comparison, when it comes to email > addresses I rather collapse the value upon insertion/update. You should be aware that according to the specs, the local part of an email address is case-sensitive. So joesm...@example.com and joesm...@example.com might be two distinct email addresses. Admittedly I've never seen that in practice (although I've seen a lot of other problems caused by people who made unwarranted assumptions about email addresses). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Quoting style (was: Bulk Inserts)
On 2019-08-10 21:01:50 -0600, Rob Sargent wrote: > On Aug 10, 2019, at 8:47 PM, Souvik Bhattacherjee > wrote: > > Hi Adrian, > > Thanks for the response. > > > Yes, but you will some code via client or function that batches the > > inserts for you. > > Could you please elaborate a bit on how EXP 1 could be performed such that > it uses bulk inserts? > > Best, > -SB > > On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver > wrote: > [70 lines of full quote removed] > Top-posting (i.e. putting your reply at the top is discouraged here) He didn't really top-post. He quoted the relevant part of Adrian's posting and then wrote his reply below that. This is the style I prefer, because it makes it really clear what one is replying to. After his reply, he quoted Adrian's posting again, this time completely. I think this is unnecessary and confusing (you apparently didn't even see that he quoted something above his reply). But it's not as bad as quoting everything below the answer (or - as you did - quoting everything before the answer which I think is even worse: If I don't see any original content within the first 100 lines or so I usually skip the rest). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Recomended front ends?
On 2019-08-08 10:47:47 -0700, Rich Shepard wrote: > On Thu, 8 Aug 2019, Adrian Klaver wrote: > > Unfortunately it does not: > > https://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys > > > > Given that the issue: > > https://code.djangoproject.com/ticket/373 > > is 14 years old does not inspire confidence that it will change anytime > > soon. > > Adrian, > > That's really interesting. I don't see how a framework cannot implement > multi-column PKs. You are a database guy. You have a database and want to write an application for it. Even in a greenfield project, you probably do the database design first. (I know I do) The Django framwork comes from the opposite direction. The programmers aren't expected to know much about relational databases[1]. They are expected to write their application and Django provides (among other things like session and form handling) a persitence layer which happens to be backed by a relational database. So Django's ORM layer isn't intended to deal with any possible (or even any reasonable) database model - just with database models generated by Django. Django lets you use "unmanaged" tables, but it is quite noticeable that this isn't the primary use case. > Many databases I have include tables for samples (geochemical, biological, > physical) where the PK for each row is location, date, parameter. Good thing > I don't like browser user interfaces, eh? :-) Django isn't the only web framework. hp [1] It should be noted, however, that the Django ORM is what Joel Spolsky calls a leaky abstraction. If you are ignorant about how RDBMSs work, your application will probably be horrendously slow. Even if you do know this, you often have to bend over backwards to get reasonable performance. -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Databases and servers
On 2019-08-20 10:33:17 +, Karl Martin Skoldebrand wrote: > I just discovered that a client has done this: > > They have two web applications A1 and A2. They have seperate hostnames/URLs. > Both have a production and a test database A1p and A1t/ A2p and A2t. > > What they've done is have both A1p and A2p on the same actual databaser server > and A1t and A2t on the same server. > > So, I'm thinking - if a bug in application A1 crashes the application and > database badly it will risk bringing down both services A1 and A2. The same > risk would be evident on a successful security breach. > > I would prefer to A1p and A2p on seperate servers, maybe keeping A1t and A2t > on > the same. (This is what seems to be happening when the database servers are > being repladed). On rereading this I notice that I'm not sure what that means. If you propose replacing the two servers with three (two production, one test) or even four (two production and two test), I agree. If you want to keep two servers, but rearrange them so that one server has both the production and the test database for each app on the same server, see below. > What is the general thought on the current setup? Without knowing the details I think I would side with your client here: It seems to me that the risk of accidentally clobbering a production database which is in the same host as the test database is higher than the risk of two different production databases interfering with each other. Also, if you have the test and production database on the same host, there are some procedures which you can't safely test (e.g. an OS upgrade). I would think about putting each database in virtual machine or at least a container, though. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Permission for not Django app to do Write and Read
On 2019-08-24 13:22:38 +0200, Luca Ferrari wrote: > On Sat, Aug 24, 2019 at 11:53 AM Peter Wainaina wrote: > > > > Thanks much for the response. This is what I mean am a database ^ > > administrator for a production company and the product owner doesn't ^ > > want me to be able to either read or write information that will > > come from Django application. > > I personally hate this kind of setup, because it does not make any > sense to me that a developer must setup an application that must > interact with a database that the developer himself cannot interact > with. Peter is the DBA here, not the developer. I don't think you can prevent the DBA from seeing the data in the database - except by encrypting it (as you suggested below). I guess it would be possible to write a Django backend which implements an encryption layer, but getting decent performance out of it might be ... challenging. > However, keep it simple: define a django user, assign each object to > such user, revoke any permission from public. > And then let's the product owner setup a password. You can still access the tables as postgres. Of course you could set up another database instance where you don't have DBA privileges, but then the product owner has to trust the system administrator instead of the DBA. At some point you have to trust somebody or do everything yourself. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Recomended front ends?
On 2019-08-27 16:04:02 +0100, Daniele Varrazzo wrote: > Using the Django ORM to create complex queries is a joy (especially > nesting subqueries), Not for me. I usually know what SQL I want to execute. Then I have to convert that SQL into a weird[1] and limited query language composed of method calls so that the ORM can turn it into SQL again (hopefully something close to the SQL I had in mind). It hurts my brain. hp [1] Well, SQL is weird, too. But it is weird in an "invented 40+ years ago and grown organically since" way, not a "constrained by the syntax of a different language" way. -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Recomended front ends?
On 2019-08-27 08:16:08 -0700, Adrian Klaver wrote: > Django takes Postgres as it's reference database which makes things easier, > especially when you add in > contrib.postgres(https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/) Looks nice. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: floating point output
On 2019-09-09 01:03:06 -0400, Tom Lane wrote: > Rob Sargent writes: > > Below I show a floating point column from a recent query. > > > Here's my guess at what's happening in the formatting: > > 1) scientific notation if smaller than 1.0e-04 (I’m not in the > > very-much-larger-than-one world) > > 2) 15 digits after the most signicant 0 > > 3) remove trailing zeros > > > I may be alone in this but I find the presentation messy and that's > > unusual in the PostgreSQL world. > > In released versions of Postgres, float8out basically devolves to > sprintf with "%.15g" format, and the above is what POSIX says that > should do. > > As of v12, we'll use the "Ryu" output algorithm, which adjusts > the number of displayed digits to ensure that the value will > re-load exactly by default. Cool. I looked into that some time ago, but never got around to implement it. > I'm afraid that will probably make your complaint about variable > numbers of digits worse not better. It's unlikely anybody will > entertain an argument to undo it, though, because (a) Ryu is faster > than the old code, and (b) it's hard to argue against ensuring that > output/input preserves values. I agree in general, but I wonder whether it would be worthwhile to add display formatting options to psql (something like "COLUMN foo FORMAT ..." in Oracle's sqlplus), so that the user can decide to display a specific column (or maybe all float numbers) as (for example) "%8.3f" or ".6e". This is of course already possible by using to_char in the query (e.g. to_char(f, '.999') or to_char(f, '9.99')) but that is a bit cumbersome. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Permissions on postgresql.conf, psql and patroni
Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1 from http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main). Today I noticed that I couldn't invoke psql as an unprivileged user anymore: % psql Error: Invalid data directory for cluster 11 main Further investigation showed that the read permissions on /etc/.../postgresql.conf were revoked at the time of the upgrade. Either by a post-install script or maybe by patroni itself, when it started up again. This leads me to two questions: 1) Is there a reason to restrict read access to postgresql.conf to the user postgres? AFAIK this file doesn't normally contain sensitive data (unlike pg_hba.conf and pg_ident.conf which are restricted by default). 2) Why does psql need to read postgresql.conf, and more specifically, why does it care about the location of the data directory? It shouldn't access files directly, just talk to the server via the socket. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Permissions on postgresql.conf, psql and patroni
On 2019-09-09 13:29:38 +0200, Daniel Verite wrote: > Peter J. Holzer wrote: > > 2) Why does psql need to read postgresql.conf, and more specifically, > > why does it care about the location of the data directory? It > > shouldn't access files directly, just talk to the server via the > > socket. > > It's not psql itself, it's pg_wrapper. > > $ ls -l /usr/bin/psql > lrwxrwxrwx 1 root root 37 Aug 8 12:48 /usr/bin/psql -> > ../share/postgresql-common/pg_wrapper > > See http://manpages.ubuntu.com/manpages/bionic/man1/pg_wrapper.1.html Ah, thanks. On 2019-09-09 10:03:57 -0400, Tom Lane wrote: > "Peter J. Holzer" writes: > > Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1 from > > http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main). > > Today I noticed that I couldn't invoke psql as an unprivileged user > > anymore: > > % psql > > Error: Invalid data directory for cluster 11 main > > FYI, there's no such error string in the core Postgres sources. > I imagine that it's coming from some wrapper script. Yep. Daniel shoved me into the right direction. The message is from PgCommon.pm, used by pg_wrapper. It's a good thing both packages are maintained by the same team. Saves the discussion who's going to fix it :-). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Permissions on postgresql.conf, psql and patroni
On 2019-09-09 19:15:19 +0200, Peter J. Holzer wrote: > On 2019-09-09 10:03:57 -0400, Tom Lane wrote: > > "Peter J. Holzer" writes: > > > Yesterday I "apt upgrade"d patroni (to version 1.6.0-1.pgdg18.04+1 > > > from http://apt.postgresql.org/pub/repos/apt bionic-pgdg/main). > > > Today I noticed that I couldn't invoke psql as an unprivileged user > > > anymore: > > > % psql > > > Error: Invalid data directory for cluster 11 main > > > > FYI, there's no such error string in the core Postgres sources. > > I imagine that it's coming from some wrapper script. > > Yep. Daniel shoved me into the right direction. The message is from > PgCommon.pm, used by pg_wrapper. > > It's a good thing both packages are maintained by the same team. Saves > the discussion who's going to fix it :-). Since this issue is specific to the debian/ubuntu packages, I've opened a ticket in redmine: https://redmine.postgresql.org/issues/4717 (as suggested in the wiki). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Strange Behaviour with multicolumn indexes
[PostgreSQL 11.5 (Ubuntu 11.5-1.pgdg18.04+1) on x86_64-pc-linux-gnu] I have a table with many columns and many indexes (actually many tables with many columns and many indexes), but for the sake of this posting, we'll consider just three columns, which we unimaginatively call a, b, and c. There are also three indexes: t_a_idx btree (a) WHERE a IS NOT NULL t_b_idx btree (b) WHERE b IS NOT NULL t_a_b_idx btree (a, b) WHERE a IS NOT NULL AND b IS NOT NULL Nowe I have a query select c from t where a='A' and b='B'; This uses t_b_idx, not - as I expected - t_a_b_idx. If I create an index with the columns swapped: t_b_a_idx btree (b, a) WHERE b IS NOT NULL and a IS NOT NULL this index will be used. The distribution of values in columns a and b is quite different: a has 10 different values of similar frequency (and no null values). b has only a single non-null value which with a frequency of about 1 %. So I definitely understand why it would prefer t_b_idx to t_a_idx, but certainly t_a_b_idx should be even better? After all it would have to read only 1/1000 of the rows instead of 1/100. it would also have to scan much less of the index, so the fact the fact that the index is a bit larger shouldn't make a difference. Explain shows that the row estimates are spot on, but the cost for using t_a_b_idx is higher than for t_b_idx (which is in turn higher than for t_b_a_idx). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Strange Behaviour with multicolumn indexes
On 2019-09-12 12:54:55 -0400, Tom Lane wrote: > "Peter J. Holzer" writes: > > we'll consider just three columns, which we unimaginatively call a, b, > > and c. There are also three indexes: > > > t_a_idx btree (a) WHERE a IS NOT NULL > > t_b_idx btree (b) WHERE b IS NOT NULL > > t_a_b_idx btree (a, b) WHERE a IS NOT NULL AND b IS NOT NULL > > > Nowe I have a query > > select c from t where a='A' and b='B'; Just in case somebody wants to play along at home: A narrow table with just these three columns and synthetic data shows the problem but doesn't use the flipped multicolumn index either. I guess you have to pad the table a bit to sufficiently change the costs. > > This uses t_b_idx, not - as I expected - t_a_b_idx. > > > The distribution of values in columns a and b is quite different: a has > > 10 different values of similar frequency (and no null values). b has > > only a single non-null value which with a frequency of about 1 %. > > > So I definitely understand why it would prefer t_b_idx to t_a_idx, but > > certainly t_a_b_idx should be even better? > > Not necessarily --- t_a_b_idx is (presumably) physically bigger than > t_b_idx, which makes it more expensive to search. The additional > selectivity gain apparently doesn't outweigh that. The index is about 20 % larger, but only 1/10 of it has to be searched, so it should take 1 - (1.2 / 10) = 88 % less time. > > If I create an index with the columns swapped: > > t_b_a_idx btree (b, a) WHERE b IS NOT NULL and a IS NOT NULL > > this index will be used. > > Hmm. Probably that has something to do with a calculation about > the selectivity of the leading index column, ie do you have to > scan 10% of the index or 1% of the index. Does it use only the selectivity of the first column? That would explain it. The total selectivity of both columns is obviously the same. > It's not taking the partial-index filter into account in that, I > suspect, which skews the results in this case --- but that would be > hard to account for accurately. Hmm. Wouldn't that be a problem for partial indexes in general? They usually cover only a small portion of the table and if the selectivity is computed relative to the whole table the result may be way off. I think it should be possible to adjust for a "WHERE column IS NOT NULL" filter, because null_frac is in the statistics. For the general case you would need an estimate of the number of rows covered by the index, which I don't think we have. > Anyway I can't get excited about optimizing for a single non-null > value. That's certainly an extreme case, but I've seen bad plans for similar queries every now and then. At least now I have an idea what causes it and maybe a solution/workaround. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Strange Behaviour with multicolumn indexes
On 2019-09-12 21:04:25 +0200, Peter J. Holzer wrote: > On 2019-09-12 12:54:55 -0400, Tom Lane wrote: > > It's not taking the partial-index filter into account in that, I > > suspect, which skews the results in this case --- but that would be > > hard to account for accurately. > > Hmm. Wouldn't that be a problem for partial indexes in general? They > usually cover only a small portion of the table and if the selectivity > is computed relative to the whole table the result may be way off. > > I think it should be possible to adjust for a "WHERE column IS NOT NULL" > filter, because null_frac is in the statistics. For the general case you > would need an estimate of the number of rows covered by the index, which > I don't think we have. Looking through the source I see that the planner does estimate the number of tuples in the index. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Referncing a calculated column in a select?
On 2019-09-12 15:35:56 -0500, Ron wrote: > On 9/12/19 2:23 PM, stan wrote: > > I am creating some views, that have columns with fairly complex calculations > > in them. The I want to do further calculations using the result of this > > calculation. Right now, I am just duplicating the first calculation in the > > select fro the 2nd calculated column. There must be a batter way to do > > this, right? > > I must be misunderstanding you, because you can just do more of what you're > doing now. > > Here's an example of doing some calculations in the creation of the view, > and then more calculations when selecting from the view: I think he wants to refer to other columns in the view. Something like create or replace view v as select a, b, abs(b - a) as abserror, case when abs(a) > abs(b) then abs(a) else abs(b) end as mag, abserror / mag as relerror from t; Except that this doesn't work. What you can do is nest views: create or replace view v as select a, b, abs(b - a) as abserror, case when abs(a) > abs(b) then abs(a) else abs(b) end as mag from t; create or replace view v2 as select a, b, abserror, mag, abserror / mag as relerror from v; wds=> select * from v2; ╔══╤═╤══╤══╤═══╗ ║ a │b│ abserror │ mag │ relerror ║ ╟──┼─┼──┼──┼───╢ ║2 │ 3 │1 │3 │ 0.333 ║ ║ -2 │ 3 │5 │3 │ 1.67 ║ ║ 3.1415926536 │ 2.71828 │ 0.4233126536 │ 3.1415926536 │ 0.134744602587137 ║ ╚══╧═╧══╧══╧═══╝ (3 rows) No idea whether this is more or less efficient than writing the whole formula for each column. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Referncing a calculated column in a select?
On 2019-09-13 11:49:28 +0900, Kyotaro Horiguchi wrote: > At Thu, 12 Sep 2019 23:16:01 +0200, "Peter J. Holzer" > wrote in <20190912211601.ga3...@hjp.at> > > On 2019-09-12 15:35:56 -0500, Ron wrote: > > > On 9/12/19 2:23 PM, stan wrote: > > > > I am creating some views, that have columns with fairly complex > > > > calculations > > > > in them. The I want to do further calculations using the result of this > > > > calculation. Right now, I am just duplicating the first calculation in > > > > the > > > > select fro the 2nd calculated column. There must be a batter way to do > > > > this, right? [...] > > I think he wants to refer to other columns in the view. > .. > > What you can do is nest views: > > Doesn't subquery work? Yes, a subquery would work, too. > Or CTE (WITH clause) might look better. CTEs (which I quite like in general) are optimisation barriers, so using them in a view may lead to inefficient plans (depending on what you do with the view). So, yes, it would work, but you have to check the plans of your queries (and be prepared to rewrite your view if they aren't good enough), which is why I didn't suggest it. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Variable constants ?
On 2019-08-15 16:56:57 -0400, stan wrote: > bossiness constants On 2019-09-02 13:31:14 -0400, stan wrote: > bossiness plan > bossiness model On 2019-09-13 05:57:33 -0400, stan wrote: > bossiness work I'm starting to think that this is not a typo :-) SCNR, hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: PostgreSQL License
On 2019-09-17 14:56:30 +0300, Ashkar Dev wrote: > but it is saying (without fee) > if I create a database with it to work with Web Application if want to sell it > so the buyer must have the PostgreSQL installed in his device to work offline > right? > "Permission to use, copy, modify, and distribute this software and its > documentation for any purpose, without fee, and without a written agreement is > hereby granted, provided that the above copyright notice and this paragraph > and > the following two paragraphs appear in all copies." This means that you don't have to pay a fee or sign a written agreement to use, copy, modify, and distribute this software and its documentation for any purpose. It doesn't say that you can't charge a fee for distributing (although why anybody would pay you for something they can download themselves for free I don't know). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Securing records using linux grou permissions
On 2019-10-15 13:10:13 -0400, David Gauthier wrote: > I was hoping there was a way to integrate the user/permissions/groups in linux > with the PG permissions functionality. You can at least map the OS users to DB roles by using the peer or ident authentication schemes. This way the users won't have to enter their passwords again. But I think this works only if the client and the server are on the same host. And you still have to maintain the groups, although that should be easy to automate. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: jsonb_set() strictness considered harmful to data
On 2019-10-20 13:20:23 -0700, Steven Pousty wrote: > I would think though that raising an exception is better than a default > behavior which deletes data. > As an app dev I am quite used to all sorts of "APIs" throwing exceptions and > have learned to deal with them. > > This is my way of saying that raising an exception is an improvement over the > current situation. May not be the "best" solution but definitely an > improvement. I somewhat disagree. SQL isn't in general a language which uses exceptions a lot. It does have the value NULL to mean "unknown", and generally unknown combined with something else results in an unknown value again: % psql wds Null display is "(∅)". Line style is unicode. Border style is 2. Unicode border line style is "double". Timing is on. Expanded display is used automatically. psql (11.5 (Ubuntu 11.5-3.pgdg18.04+1)) Type "help" for help. wds=> select 4 + NULL; ╔══╗ ║ ?column? ║ ╟──╢ ║ (∅) ║ ╚══╝ (1 row) Time: 0.924 ms wds=> select replace('steven', 'e', NULL); ╔═╗ ║ replace ║ ╟─╢ ║ (∅) ║ ╚═╝ (1 row) Time: 0.918 ms Throwing an exception for a pure function seems "un-SQLy" to me. In particular, jsonb_set does something similar for json values as replace does for strings, so it should behave similarly. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: jsonb_set() strictness considered harmful to data
On 2019-10-21 09:39:13 -0700, Steven Pousty wrote: > Turning a JSON null into a SQL null and thereby "deleting" the data > is not the path of least surprises. But it doesn't do that: A JSON null is perfectly fine: wds=> select jsonb_set('{"a": 1, "b": 2}'::jsonb, '{c}', 'null'::jsonb); ╔═╗ ║ jsonb_set ║ ╟─╢ ║ {"a": 1, "b": 2, "c": null} ║ ╚═╝ (1 row) It is trying to replace a part of the JSON object with an SQL NULL (i.e. unknown) which returns SQL NULL: wds=> select jsonb_set('{"a": 1, "b": 2}'::jsonb, '{c}', NULL); ╔═══╗ ║ jsonb_set ║ ╟───╢ ║ (∅) ║ ╚═══╝ (1 row) hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: jsonb_set() strictness considered harmful to data
On 2019-10-22 09:16:05 +1100, raf wrote: > Steven Pousty wrote: > > In a perfect world I would agree with you. But often users do not read ALL > > the documentation before they use the function in their code OR they are > > not sure that the condition applies to them (until it does). > > I'm well aware of that, hence the statement that this > information needs to appear at the place in the > documentation where the user is first going to > encounter the function (i.e. in the table where its > examples are). I think this is a real weakness of the tabular format used for documenting functions: While it is quite compact which is nice if you just want to look up a function's name or parameters, it really discourages explanations longer than a single paragraph. Section 9.9 gets around this problem by limiting the in-table description to a few words and "see Section 9.9.x". So you basically have to read the text and not just the table. Maybe that would make sense for the json functions, too? hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: jsonb_set() strictness considered harmful to data
On 2019-10-22 18:06:39 -0700, David G. Johnston wrote: > On Tue, Oct 22, 2019 at 3:55 PM Peter J. Holzer wrote: > On 2019-10-20 13:20:23 -0700, Steven Pousty wrote: > > I would think though that raising an exception is better than a > > default behavior which deletes data. > > As an app dev I am quite used to all sorts of "APIs" throwing > > exceptions and have learned to deal with them. > > > > This is my way of saying that raising an exception is an > > improvement over the current situation. May not be the "best" > > solution but definitely an improvement. > > I somewhat disagree. SQL isn't in general a language which uses > exceptions a lot. It does have the value NULL to mean "unknown", and > generally unknown combined with something else results in an unknown > value again: > > [...] > > > Throwing an exception for a pure function seems "un-SQLy" to me. In > particular, jsonb_set does something similar for json values as replace > does for strings, so it should behave similarly. > > > Now if only the vast majority of users could have and keep this level of > understanding in mind while writing complex queries so that they remember to > always add protections to compensate for the unique design decision that SQL > has taken here... I grant that SQL NULL takes a bit to get used to. However, it is a core part of the SQL language and everyone who uses SQL must understand it (I don't remember when I first stumbled across "select * from t where c = NULL" returning 0 rows, but it was probably within the first few days of using a database). And personally I find it much easier to deal with concept which are applied consistently across the whole language than those which sometimes apply and sometimes don't seemingly at random, just because a developer thought it would be convenient for the specific use-case they had in mind. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: Is this a bug ?
On 2019-10-23 17:20:07 +0100, Geoff Winkless wrote: > For what it's worth, I can see a value to having > > SELECT 'this is quite a long string' >'which I've joined together ' >'across multiple lines'; > > although the advantage of it vs using a concat operator is slim. True. However, concatenation of string literals by juxtaposition isn't specific to SQL. Two other well known languages where this works (even without a newline) are C and Python. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: 11 -> 12 upgrade on Debian Ubuntu
On 2019-11-07 10:59:37 -0500, stan wrote: > On Thu, Nov 07, 2019 at 07:52:14AM -0800, Adrian Klaver wrote: > > On 11/7/19 7:45 AM, stan wrote: > > > I am in the middle of a project, and it looks like version 12 is now what > > > the Debian/Ubuntu package managers want to update to. > > > > This should be a dist-upgrade correct? > > Correct. > > > On a my Ubuntu instance that just installed the Postgres 12 version and > > started it(also auto start in start.conf). This was just the template > > databases and postgres db. No data was moved over from the 11 instance that > > is running. > > > Thanks, that is helpful to know. To migrate the data to the new database, run pg_upgradecluster. This is a wrapper around pg_upgrade which knows about details of a standard Debian/Ubuntu PostgreSQL installation (like directory layout, etc.), so it should be simpler and safer than invoking pg_upgrade yourself (and pg_upgrade is hidden in /usr/lib/postgresql/*/bin to prevent you from invoking it accidentally). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: type SERIAL in C host-struct
On 2019-11-07 20:14:47 +0100, Matthias Apitz wrote: > At the end of the day we came up with the following solution: > > strcpy(host_struct.name, "Sigrid"); > EXEC SQL select nextval('lina_lid_seq') into :host_struct.ser; > > EXEC SQL INSERT INTO lina VALUES ( :host_struct ); > > which seems to work fine. Any comments about side effects? You are performing two queries instead of one, so you have to wait for one extra round trip. Not a problem if client and server are on the same network or you insert a few host_structs per second. May be a problem if client and server are in different datacenters and you want to do hundreds of inserts per second. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: REINDEX VERBOSE unknown option
On 2019-11-18 12:24:40 +, Geoff Winkless wrote: > On Mon, 18 Nov 2019 at 11:46, Michael Paquier wrote: > > On Mon, Nov 18, 2019 at 10:27:24AM +0100, Josef Šimánek wrote: > > > This is clear once you understand what does it mean. I was aware of > > > VERBOSE > > > option of EXPLAIN and tried to use it without needed parentheses (the same > > > way EXPLAIN can understand it). In the parameter list of REINDEX, it is > > > still called VERBOSE (not "( VERBOSE )") and there's no info > > > that parentheses are needed. [...] > But if the parentheses are part of the parameter, I think putting the > parentheses in the parameter list might be a good compromise. The parentheses aren't part of the parameter. They are part of the syntax of the command. It might useful to distinguish better between symbols which must be typed as shown ("REINDEX", "(", ")", etc. in this case) and those which describe the grammar ("[", "]", "name" in this case). Maybe print the square brackets in bold and italics like "name"? But that is just the opposite of the usual convention. How about this? * Terminals (stuff that has to be typed as shown) in bold. * Non-Terminals (stuff which has to be replaced) in italic. * Meta-characters ([, ], |, ...) in regular type. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: mysysconf ?
On 2019-11-19 19:32:03 -0500, stan wrote: > I am presently running on a Ubuntu 18.04 instance, and as you know > Debian/Ubuntu have upgraded to version 12. I don't think so. Neither Debian nor Ubuntu is in the habit of doing major upgrades in the middle of a release and Ubuntu 18.04 was released long before PostgreSQL 12. Indeed, "apt policy" shows 10+190ubuntu0.1 to be the current version for Ubuntu 18.04. You have probably installed PostgreSQL from apt.postgresql.org, which always has the newest version (but you can also install older versions and stick with them). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: REINDEX VERBOSE unknown option
On 2019-11-19 11:37:04 +, Geoff Winkless wrote: > Even if you do that you're still requiring the user to parse syntax > according to esoteric rules. Oh, please. Those "esoteric rules" have been in wide-spread use for decades. If you look at any manual which tries to explain the syntax of a programming language, markup language or something similar in a (semi-)formal way, it probably uses something very similar. (More formal texts often use BNF (or a variant), which are more expressive, but harder to read. Personally I like syntax diagrams (sometimes called railroad diagrams, but they seem to have mostly fallen out of fashion) And of course, like almost any manual, the PostgreSQL manual explains the notation in the preface: https://www.postgresql.org/docs/current/notation.html (Paper books are at an advantage here that readers usually actually start at the beginning) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: REINDEX VERBOSE unknown option
On 2019-11-21 09:43:26 +, Geoff Winkless wrote: > On Wed, 20 Nov 2019 at 22:48, Peter J. Holzer wrote: > > > > On 2019-11-19 11:37:04 +, Geoff Winkless wrote: > > > Even if you do that you're still requiring the user to parse syntax > > > according to esoteric rules. > > > > Oh, please. Those "esoteric rules" have been in wide-spread use for > > decades. > > It wasn't meant to be insulting, I meant "esoteric" in the strict > sense: that you need to have specific knowledge to parse them. I didn't understand it as insulting (why would I?), but don't think this convention is "requiring ... knowledge that is restricted to a small group" (Merriam-Webster). This specific convention for conveying grammar rules is in my experience by far the most common (before BNF and diagrams). Anybody who has read any documentation about any formal language (e.g., a programming language, a query language, a markup or configuration language) has very likely encountered it before. Yes, you need specific knowledge to understand the PostgreSQL documentation. For starters, you need to know English (or one the handful languages in which it has been translated). You need to know what a relational database is and why and how you would use one. You need some generic knowledge about computing (what is a "client/server model"? How do I start a command line tool?), etc. The convention for describing the grammar is probably the least concern, and besides, it is explained in the manual (unlike some concepts which are assumed to be known). > My point was that modifying the rules (by making certain things bold > or italic) wouldn't really solve the problem - if you don't know what > the rules are, you're unlikely to be any better off if someone adds to > them. Man is a pattern-matching animal. Even without an explicit explanation, humans are quite good at deriving meaning from repeated patterns. So if the parts you have to type verbatim are always printed in bold and parts which have a meta-meaning are always printed in italic and optional parts are always enclosed in (italic) square brackets, people are very likely to understand that [ ( VERBOSE ) ] means that "( VERBOSE )" must be typed as is, but is optional. Even if they can't tell you the rules. Simply because they have seen it a few dozen times before. There is a reason why almost any technical documentation uses some typographical convention and why those conventions are almost always very similar. The PostgreSQL manual unfortunately uses the same typographic convention for meta-characters ([], {}, |, ...) and terminals, which isn't as clear as it could be. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: [SPAM] Remote Connection Help
On 2019-11-21 17:27:04 +0100, Moreno Andreo wrote: > Connection refused means somthing has blocked it. If it was all OK and simply > Postgres was not listening, you should've received a "connection timed out" > (10060) message. Almost exactly the other way around. If you try to connect to a port where no server is listening, you get a connection refused message. If something is blocking the connection you may get a timeout. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: REINDEX VERBOSE unknown option
On 2019-11-21 16:48:14 +, Geoff Winkless wrote: > On Thu, 21 Nov 2019 at 15:32, Peter J. Holzer wrote: > > On 2019-11-21 09:43:26 +, Geoff Winkless wrote: > > > It wasn't meant to be insulting, I meant "esoteric" in the strict > > > sense: that you need to have specific knowledge to parse them. > > > > I didn't understand it as insulting (why would I?), > > I've absolutely no idea, but I couldn't imagine why on Earth you would > apparently take such exception to it otherwise. Maybe writing > sarcastic bombast in response to something that hasn't annoyed you is I don't think we agree on the meaning of "sarcastic" and "bombast". > > but don't think this convention is "requiring ... knowledge that is > > restricted to a small group" (Merriam-Webster). > > That's entirely the case here. I'd say the number of people able to > understand something like BNF is vanishingly small in terms of the > 7bn(?) world-population. The number of people reading the PostgreSQL manual is also vanishingly small in terms of the world-population. These two aren't independent. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Isolation of multiple databse instances provided by a single postgres server
On 2019-11-21 08:18:21 -0600, Ron wrote: > It appears to me that, within this one Postgres "instance", there are 2 > levels of "isolation", which are database, and schemas. Is this correct? [...] > If so, how does this cores pond to physical on disk storage? > > It corresponds not at all. That's not quite true. Each database has a specific directory (per tablespace) to keep its files in. Schemas on the other hand do not correspond to anything on the filesystem. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Isolation of multiple databse instances provided by a single postgres server
On 2019-11-21 07:21:10 -0800, Adrian Klaver wrote: > On 11/21/19 6:51 AM, Laurenz Albe wrote: > > - The different databases in a cluster are physically located in > >the same tablespace, but they are logically strictly separated. > >You cannot connect to one database and access another database from > > there. > > dblink(https://www.postgresql.org/docs/11/dblink.html) and > FDW(https://www.postgresql.org/docs/11/sql-createforeigntable.html) would > beg to differ. I was thinking of FDW. But I think Laurenz had the SQL syntax in mind which does include the database in the fully qualified form of a tablename. Something like db2=# select * from db2.public.t3; ╔╤═══╗ ║ id │ t ║ ╟┼───╢ ║ 1 │ this is db2.public.t3 ║ ╚╧═══╝ (1 row) works. But when you try to access a table in a different database you get an error: db2=# select * from db1.public.t1; ERROR: cross-database references are not implemented: "db1.public.t1" LINE 1: select * from db1.public.t1; hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Logging [RESOLVED]
On 2019-12-04 16:03:24 -0800, Rich Shepard wrote: > On Wed, 4 Dec 2019, Stephen Eilert wrote: > > Usually, this is done by logrotate or a similar mechanism in your system. > > You’ll likely find that other logs in your system follow a similar > > pattern, not just Postgresql. > > I just checked /etc/logrotate.d/postgres and it was set at daily with rotate > 7. I changed rotate to 4 but the dates are 3-4 days apart, not sequential. Your log files are extremely small. At only a few hundred bytes every 3 or 4 days it is very likely that nothing is logged on most days. If your log file is empty, logrotate won't rotate it if the option "notifempty" is set (which is probably the case). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'
scores > 90. This is basically a full index scan since that index is ordered by length(letters). I'm surprised that PostgreSQL can even do that :-). This is a separate index scan than the one for length(letters) = 7, so separating the indexes should be no worse and probably a lot better. Create an index on score (possibly conditional on action = 'play'). 3. A lot of the conditions is fixed. So you might want to move them into the condition of a partial index: create index on words_moves(played) where action = 'play' and LENGTH(hand) = 7 and (LENGTH(letters) = 7 OR score > 90); Then the planner should recognize that it can use this index and the index contains only interesting moves - no logical operations needed at query time at all, only an index scan to find recent moves. Be warned though that creating such ultra-specific indexes comes at a cost: You may end up with a lot of them if you have many different queries and maintaining them may make inserts noticably slower. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: server will not start (Debian)
On 2019-12-10 12:03:30 -0500, stan wrote: > OK, here is the latets. If I do sudo systemctl start postgresql@12-main > > the instance starts and works fine. > > However, if I reboot the machine, the instance is not started. Check whether the service is enabled in systemd: # systemctl status postgresql@12-main.service ● postgresql@12-main.service - PostgreSQL Cluster 12-main Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled) ^^^ If it isn't, enable it: # systemctl enable postgresql@12-main.service hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Need support on tuning at the time of index creation
On 2020-01-27 11:10:36 +, Sandip Pradhan wrote: > One of our ericsson product used backend db as postgresql 9. We are facing > following performance issues where we need some support from your side. > We are having 10 tables and we are inserting around 150 million to 250 million > records on each of those tables. After that we need to create 29 indexes > includung primary key index and other types of indexes. Currently it is taking > almost 3.5 to 4 hrs. It is hard to give advice on what you could do differently if you don't tell us what you do. How are you inserting those rows? How are you creating the indexes? Especially: Are you doing things serially or in parallel? Also performance depends a lot on hardware, so faster CPUs (or more cores) and faster disks/ssds might help, too. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Performance Issue after upgrade from 9 to 11
On 2020-01-29 09:39:03 -0800, Perumal Raj wrote: > We have recently upgraded postgres from 9.2 to 11.6 and started seeing > performance issue immediately and able to fix the performance issue after > disabling parameter: enable_seqscan. How did you upgrade? If your upgrade involved a dump and restore, you should invoke ANALYZE for each database (I think autovacuum will analyze all tables eventually, but takes its time). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Need support on tuning at the time of index creation
On 2020-01-31 10:13:20 +, Sandip Pradhan wrote: > Thanks for your time. > > Out Postgresql version: 9.5.9.14. > > We are using COPY command to insert rows into the tables. While running COPY > command, all indexes dropped. > So I take it that the copy is fast, but the building of indexes is slow (actually I'm not sure whether less than 50 seconds per index is slow for tables of this size, but it's too slow for your application) > After successfully inserting all the rows, we are trying to create all those > indexes. > > Example of index creation script: > CREATE INDEX cwt_sr_assocact_entityref > ON cwt_sr_associationact > USING btree > (entityapplicationcontext COLLATE pg_catalog."default", entitytype COLLATE > pg_catalog."default", entitydn COLLATE pg_catalog."default"); > > CREATE INDEX ix_sr_assoc_customerrelateddn > ON cwt_sr_associationact > USING btree > (customerrelateddn COLLATE pg_catalog."default"); > > Running indexes in parallel. > > Please find the hardware detail of the server: > OS: Red Hat Enterprise Linux 7.4 > RAM: 125 GB > CPU Core: 36 > > Set maintenance_work_mem to 25GB Ideally, you would want to use all the cores and each process should use as much RAM as possible. However, if each process uses 25 GB, you can only fit 4 of them (not 5 because other stuff needs memory, too), in RAM at the same time, leaving 32 of your 36 cores idle. OTOH, for 36 processes, each would have to be restricted to about 3 GB, which doesn't sound ideal, either. So there's probably a sweet spot somewhere in the middle. What I would do: To save time, I would test only with a single table (there are 29 indexes on each table and that's almost certainly more than the optimal degree of parallelism, so we won't get anything from processing several tables at once). To establish a baseline, I'd look for the optimal parallelism with your current parameters: Build the indexes with 1, 2, 4, 8, 16, 29 processes in parallel. Stop if it gets noticably slower, especially if it starts to swap heavily. Then bisect between the two best ones. Note the results. Then reduce maintenance_work_mem to 50 % and repeat the experiment. Does it get better? If it gets better (or at least not much worse), reduce maintenance_work_mem again and repeat. Eventually you should find the optimal combination. To reach your goal of 40 minutes for all 10 tables, your optimal run for a single table must be about 4 minutes. If you don't get close to that you will probably have to look for different solutions. As already mentioned, the speed of I/O makes a lot of difference. If you don't use SSDs yet, you should. If you do use SSDs, maybe you can get faster ones? You might also investigate putting pgsql_tmp on a RAM disk. You could also try changing synchronous_commit and/or fsync/full_page_writes (the latter only if you can tolerate losing your DB in a crash). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Restore is failing
On 2020-02-05 13:54:56 +0530, Sonam Sharma wrote: > I am copying schema data to qa box from prod.its failing and giving below > error > : > > > WARNING: terminating connection because of crash of another server process > > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited abnormally > and possibly corrupted shared memory. As the message states, the problem is with another server process. You need to identify that process and why it crashed. Check the postgres server log. It should contain an error message. You might also want to check the syslog (on Linux or other Unixes) or equivalent for system errors (e.g. out of memory, disk errors, etc.) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Cannot connect to postgresql-11 from another machine after boot
On 2020-02-13 21:03:48 -0800, Adrian Klaver wrote: > On 2/13/20 9:02 PM, Adrian Klaver wrote: > > On 2/13/20 7:54 PM, Jason Swails wrote: > > > The problem is that after my machine boots, I'm unable to connect to > > > the server from anywhere except localhost. Running a simple > > > "systemctl restart postgresql" fixes the problem and allows me to > > > connect from anywhere on my LAN. Here is an example of this > > > behavior: [...] > > > > > > So the first connection attempt fails. But when I restart the > > > service and try again (doing nothing else in between), the > > > connection attempt succeeds. My workaround has been to simply > > > restart the service every time my machine reboots, but I'd really > > > like to have a more reliable startup. > > > > > > Any ideas how to start hunting down the root cause? I think this > > > started happening after I moved the data directory to another drive. > > > > I would start by looking in the system log to see what it records when > > the service tries to start on reboot. > > Hit send to soon. At a guess the Postgres service is starting before the > drive is mounted. I don't think this has anything to do with the drive. If the drive wasn't mounted he couldn't connect from localhost either. What is probably happening is that postgresql is configured to listen on localhost and the IP address of the ethernet interface and is starting before the etherned interface is ready. So it is listening only on localhost (there should be an error message regarding the other address in the log). When he restarts postgresql some time later, the interface is ready. It should be possible to solve this by adding the right dependencies to systemd. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Cannot connect to postgresql-11 from another machine after boot
On 2020-02-17 10:17:41 -0500, Jason Swails wrote: > I then added "network.target", "networking.service", and > "network-online.target" to the After line of the postgresql.service systemd > file, but it still didn't fix the problem. After=network-online.target should be correct. However, see https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ for an explanation why "the network is online" is not as simple as it looks and how to ensure that. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: DB running out of memory issues after upgrade
On 2020-02-18 18:10:08 +, Nagaraj Raj wrote: > Below are the same configurations ins .conf file before and after updagrade > > show max_connections; = 1743 [...] > show work_mem = "4MB" This is an interesting combination: So you expect a large number of connections but each one should use very little RAM? [...] > here is some sys logs, > > 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS > due to excessive memory consumption. > 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS > due to excessive memory consumption. The oom-killer produces a huge block of messages which you can find with dmesg or in your syslog. It looks something like this: Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=0 Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ mems_allowed=0-1 Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: platzangst Tainted: G OE4.15.0-74-generic #84-Ubuntu Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016 Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace: Feb 19 19:06:53 akran kernel: [3026711.344835] dump_stack+0x6d/0x8e Feb 19 19:06:53 akran kernel: [3026711.344839] dump_header+0x71/0x285 ... Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:7fff187ef240 EFLAGS: 0246 ORIG_RAX: 0038 Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffda RBX: 7fff187ef240 RCX: 7f292d076b1c Feb 19 19:06:53 akran kernel: [3026711.344896] RDX: RSI: RDI: 01200011 Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 7fff187ef2b0 R08: 7f292d596740 R09: 009d43a0 Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 7f292d596a10 R11: 0246 R12: Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0020 R14: R15: Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info: Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 inactive_anon:1133875 isolated_anon:0 Feb 19 19:06:53 akran kernel: [3026711.344905] active_file:467 inactive_file:371 isolated_file:0 Feb 19 19:06:53 akran kernel: [3026711.344905] unevictable:0 dirty:3 writeback:0 unstable:0 ... Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Feb 19 19:06:53 akran kernel: [3026711.344997] [ 823] 0 82344909 0 106496 121 0 lvmetad Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354] 0 135411901 3 135168 112 0 rpcbind Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485] 0 148569911 99 180224 159 0 accounts-daemon ... Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 25591 (postgres) score 697 or sacrifice child Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) total-vm:71116948kB, anon-rss:52727552kB, file-rss:0kB, shmem-rss:3023196kB The most interesting lines are usually the last two: In this case they tell us that the process killed was a postgres process and it occupied about 71 GB of virtual memory at that time. That was clearly the right choice since the machine has only 64 GB of RAM. Sometimes it is less clear and then you might want to scroll through the (usually long) list of processes to see if there are other processes which need suspicious amounts of RAM or maybe if there are just more of them than you would expect. > I identified one simple select which consuming more memory and here is the > query plan, > > > > "Result (cost=0.00..94891854.11 rows=3160784900 width=288)" > " -> Append (cost=0.00..47480080.61 rows=3160784900 width=288)" > "-> Seq Scan on msghist (cost=0.00..15682777.12 rows=312949 > width > =288)" > " Filter: (((data -> 'info'::text) ->> 'status'::text) = > 'CLOSE'::text)" > "-> Seq Scan on msghist msghist_1 (cost=0.00..189454.50 > rows=31294900 > width=288)" > " Filter: (((data -> 'info'::text) ->> 'status'::text) = > 'CLOSE'::text)" So: How much memory does that use? It produces a huge number of rows (more than 3 billion) but it doesn't do much with them, so I wouldn't expect the postgres process itself to use much memory. Are you sure its the postgres process and not the application which uses a lot of memory?
Re: Can I trigger an action from a coalesce ?
On 2020-02-22 16:02:06 -0500, stan wrote: > I have a case where if a value does not exist, I am going to use a default, > which is easy with coalesce. But I would like to warn the user that a > default has been supplied. The default value is reasonable, and could > actually come from the source table, so I can't just check the value. > I'd like to do a raise NOTICE, if the default portion of the coalesce fires. Might I suggest a different approach? Instead of raising a notice, add an additional column. Something like this: wds=> create table mytable (id serial, value int); CREATE TABLE Time: 127.124 ms wds=> insert into mytable (value) values (2), (23), (null), (42), (78); INSERT 0 5 Time: 48.223 ms wds=> select * from mytable; ╔╤═══╗ ║ id │ value ║ ╟┼───╢ ║ 1 │ 2 ║ ║ 2 │23 ║ ║ 3 │ (∅) ║ ║ 4 │42 ║ ║ 5 │78 ║ ╚╧═══╝ (5 rows) Time: 0.657 ms wds=> select id, coalesce(value, 42) as value, value is null as value_was_null wds-> from mytable; ╔╤═══╤╗ ║ id │ value │ value_was_null ║ ╟┼───┼╢ ║ 1 │ 2 │ f ║ ║ 2 │23 │ f ║ ║ 3 │42 │ t ║ ║ 4 │42 │ f ║ ║ 5 │78 │ f ║ ╚╧═══╧╝ (5 rows) Time: 0.247 ms This lets the user distinguish the real 42 with id 4 from the substituted 42 with id 3. I think this would be awkward with a notice. At most you could tell the user "some values were substituted", but not which ones (at least not if the query can return a large number of rows). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Is is safe to use SPI in multiple threads?
On 2020-02-18 19:07:44 +, Tom Mercha wrote: > On 23/12/2016 13:41, Peter J. Holzer wrote: > > On 2016-12-09 16:52:05 +0800, Qiu Xiafei wrote: > >> I'm new to PG and want to implement my domain-specific system based on PG. > >> I > >> wish to arrange my data as several tables in database and translate my DSL > >> into > >> SQL statements for query. Since one DSL statement may be mapped to several > >> SQL > >> statements, it's better to push the DSL server as close to the PG server as > >> possible. I found PG's backgroud worker meet my needs. I can setup a > >> background > >> worker bounded to PG server and listen to a port for network requests. > >> > >> But I encounter a problem that the Server Programing Interfaces are not > >> THREAD > >> SAFE. There are some global variables defined like: SPI_processed, > >> SPI_tuptable, etc. This limit to my DSL server to work in single thread > >> mode > >> which is quite inefficient. > > > > I had a similar requirement. I solved it by moving the application logic > > out of the stored procedures. All the stored procedure does is an RPC > > call (I use ØMQ for that) to a server process and send the result back > > to the client. The server process converts the request into multiple SQL > > queries which can be processed in parallel. > > > > The downside is of course that the communication overhead is much > > higher (A minimum of 4 network messages per request). That's not a > > problem in my case, but you mileage may vary. > > > > The advantages in my opinion are: > > > > * A standalone server process is easier to test and debug than a bunch > >of stored procedures. > > * I can easily scale out if necessary: Currently my database and server > >process run on the same machine, but I could distribute them over > >several machines with (almost) no change in logic. > > Sorry to revive such an old topic. I am facing a similar requirement > where I am running multiple queries concurrently. Like Qiu Xiafei, I am > looking at SPI, and dynamic background workers. In particular, I am > using SPI_execq(...) on each dynamic background workers I spawn. What I > am experiencing is that I am not seeing a speedup, and I am beginning to > wonder if I have done something wrong, if the overheads are too big, or > if there are some limitations I am not aware of. > > As I see that none of the comments here make much of a reference to > performance/speedup, would you be so kind as to tell me how satisfied > you were with performance? Any insights would be greatly appreciated. The speedup depends very much on how evenly you can divide up your workload between worker processes. Let's assume that we have a query which takes 10 seconds and 8 worker processes. If you can't split that query, there is no speedup. 1 worker will be busy for 10 seconds, the others will be idle. If you can split it into 20 queries which will run for 0.5 seconds each, we will be finished in 1.5 seconds (run 8 queries in the first 0.5 seconds, another 8 in the second, and then the last 4). If you can split the query, but the runtimes of the subqueries will be very different (e.g. they will run for 10, 5, 2, 1, 0.5 and 15 times 0.1 seconds), then the total run time will be close to the longest-running subquery - in this case 10 seconds. All the workers will start busy but at some point they run out of work and have to wait for that single slow subquery. So the speedup really depends a lot on how smartly you can divide up your workload. And this is unfortunately something which varies a lot: Firstly, our data is skewed. We have a lot more data about Austria than Afghanistan, so simply partitioning by country doesn't give us equal cost queries. Secondly, even with SSDs, access to permanent storage is still a lot slower than access to RAM, so your access times may change unpredictably depending on whether the data you access is hot or cold. Which brings me to another blessing and/or curse of this approach: Having a central server process outside of postgresql makes it easy to cache (partial) results. So very often we already have the data in our application cache and can generate a response in milliseconds. Which is good. However, when we don't have the data and have to ask the database, it is quite likely that it isn't in the cache of the database or even the OS either, so we have to hit the disks - which is slow. So in conclusion: We are seeing a substantial speedup. But it is definitely not linear in the number of worker processes, and the effect of caching is much greater than that of parallelization. And we pay for that by a large variation in query times (between milliseconds and tens of seconds). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Can I trigger an action from a coalesce ?
On 2020-02-24 05:20:49 +, sivapostg...@yahoo.com wrote: > It could also be done by putting those values in square bracket, if > substituted > with default values. > eg. [0] >4 >45 > [100] > Values within square brackets are default values. This would also work in many cases (especially if the values only have to be displayed and not processed further). In this case the OP wrote that "the default value is reasonable and could actually come from the source table". I assumed that he had a reason for this choice and wanted to preserve it. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Invoice increment
On 2020-02-26 09:38:57 +0100, Søren Frisk wrote: > I'm trying to find a way to increment an invoice number. And i think it would > be pretty straight forward just using a SERIAL int. Be careful: Sequences may have gaps (e.g. because of rolled back transactions, but maybe also for other reasons). In many jurisdictions, invoice numbers must not have gaps. You may still be able to use sequences, but you have to carefully consider when to increment them. > But as we're selling across > multiple countries, i need the invoice increment to be reset by a country_id. By "reset by a country_id" you mean that the invoice number has to increment independently for each country? So if you sell for example, to DK, DK, CN, DK, CN, US, DK, US, the invoice numbers should be DK-1 DK-2 CN-1 DK-3 CN-2 US-1 DK-4 US-2 ? You can do that by having one sequence per country. The parameter to nextval is just a string, so you can construct that on the fly: select nextval('invoice_seq_' || country_code); However, that means that you have to add a new sequence whenever you start business in a new country. Maybe not a big deal, but easy to forget since you don't do that every day. The other way is of course to have a table with all the current invoice numbers for each country. Basically sequences implemented in the application. This has a slightly different set of problems, so you have to be careful, too. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: How to install check_postgres on CentOS 8?
On 2020-02-26 15:11:38 +0100, rai...@ultra-secure.de wrote: > I'm trying to install the check_postgres RPM from the official > postgresql.org repository onto CentOS 8.1 > > It says: > > Error: > Problem: cannot install the best candidate for the job > - nothing provides perl-DateTime-Format-DateParse needed by > check_postgres-2.25.0-1.rhel8.noarch So check_postgres depends on a package which isn't in any of your yum repositories. It's been some time that I last used RHEL (RHEL 6 was the last I used), but there were some optional repositories (actually, I think they were called "channels") with "application specific" packages. Among them was a developer repo whith lots of additional Perl modules (and other stuff, but the Perl modules were what is relevant here). If RHEL 8 (and by extension, CentOS 8) still has that structure, you may need to configure those repos. There is also EPEL ("Extra Packages for Redhat Linux"), which contains packages from Fedora. Maybe perl-DateTime-Format-DateParse is in EPEL? hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Need to find the no. of connections for a database
On 2020-02-27 07:41:36 +, sivapostg...@yahoo.com wrote: > Before taking a few reports, we need to ensure that only one connection is > made > to the database and all other computers need to close the connection to that > database. This is to prevent any additional entry(ies) that could be made > during the course of the report taking. Do you have control over those reports or are they generated by a third-party tool? If the former, the best way is probably to just run them all in a single REPEATABLE READ transaction. Then they will all reflect the state of the database at the start of the transaction, regardless of what other clients are doing in the meantime. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: trouble making PG use my Perl
On 2020-02-28 07:13:12 -0600, Steven Lembark wrote: > If you use the centos pre-compiled glob then you'll get their > pre-compiled paths to their pre-compiled Perl which, among > other things, is compiled with all optimization turned off, > with 5.00503 compatibility turned *on*, and a host of other > pure idiocies that make their perl unsuitable for human use. I don't have access to a current CentOS, but both claims are wrong for even the ancient RHEL 6 systems we still have around for some reason. I find it hard to believe that someone would find it necessary to turn on bincompat5005 in 2019 when they already considered that obsolete in 2010. Same for optimization. There are some optimizations that a general purpose binary distribution like RHEL can't do (like optimizing for a specific processor type), but frankly I doubt that perl's Configure turns them on. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: trouble making PG use my Perl
On 2020-02-28 07:17:27 -0600, Steven Lembark wrote: > RH and Debian distros are distriuted by heavy Python users who > go out of their way to hamstring Perl at every step. That's just complete bullshit, at least for Debian (I don't have current experience with Redhat). The maintainer system makes that very unlikely. > Unless things have changed massively, the Perl they distribute is not > only out of date it's nearly broken out of the box. I have been programming in Perl for 25 years, much of that time on Redhat and Debian systems. Perl was sometimes a bit out of date (usually because the OS was a bit out of date - that's the price you pay for stability), but never as much as on commercial Unixes like HP-UX or Solaris, and it was never broken. I have occasionally compiled perl myself, usually because I wanted to try out some new features, sometimes because I tried to make it faster. But for production use I've almost always used the system-supplied perl. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Restrict user to create only one db with a specific name
On 2020-03-06 16:39:14 -0700, David G. Johnston wrote: > On Fri, Mar 6, 2020 at 4:28 PM Tiffany Thang wrote: > Is there a way in PostgreSQL 12 to restrict user to creating a database > with a specific database name? [...] > Why does userA need create database privileges? Not speaking for the OP, but: Some test frameworks (e.g. the one included in Django) like to create their own test database to ensure that it only contains the test data. So if you are using one of these as intended the user running the tests needs to be able to create and drop databases. If you are running a tests from multiple projects against the same cluster, it might be a good idea to ensure that each job can only create (and drop) their own test database and not those of other jobs (or - worse - the production database). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Advice request : simultaneous function/data updates on many databases
On 2020-03-04 14:42:01 -0800, Guyren Howe wrote: > On Mar 4, 2020, at 14:33 , Rory Campbell-Lange > wrote: > Essentially we wish to reduce the window where the frontend and backend > aren't synchronised. > > If we have (for example) 200 databases which each take 2 seconds to > update, a client could be on the wrong frontend code for over 6 minutes. > > > Send each of the servers a PL/PGSQL method that executes all the things in a > transaction and then waits until the same clock time to commit. Last time I looked, some DDL commands (especially "drop table") took an exclusive lock on the affected table. So you may want to keep transactions which execute such commands very short to prevent them from blocking other transactions for a noticeable amount of time. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Determining the type of an obkect in plperl
On 2020-03-05 05:59:10 -0500, stan wrote: > On Wed, Mar 04, 2020 at 05:09:19PM -0700, David G. Johnston wrote: > > On Wed, Mar 4, 2020 at 4:21 PM stan wrote: > > > I am in the process of writing a plperl function. In this function > > > I need to compare the data in the NEW versus OLD structures. I am > > > writing this as a generic subroutine, so I am looping through and > > > comparing the 2 to see what is different. Problem is, that I need > > > to know whether to us n != or a ne comparison. > > > > > > how can I determine what the data type of the value element is? > > > > > > > Not up to speed on Perl but you basically want everything to be done using > > string equality - can't you just use "ne" everywhere and not worry about > > comparing numbers using string comparison logic? Might want to disabled > > warnings... [...] > > Since I am just looking for differences, this may work. It should work. In fact it is probably the most robust method. Perl scalars don't have a type like "int" or "string". They can be both (or neither) at the same time. Also, a numeric comparison may not be the right thing even for values which look like numbers: Consider: #!/usr/bin/perl use v5.12; use warnings; use Scalar::Util qw(looks_like_number); my $x = "123456789012345678901"; my $y = "123456789012345678902"; say "eq(1): ", $x eq $y; if (looks_like_number($x) && looks_like_number($y)) { say "==(2): ", $x == $y; } else { say "eq(2): ", $x eq $y; } say "eq(3): ", $x eq $y; This will print eq(1): ==(2): 1 eq(3): on my system: The string comparisons correctly determine $x and $y to be not equal, but the numeric comparison says they are equal (the two numbers are too large to fit into a 64 bit unsigned int, so perl will use a 64 bit float for the comparison which will cause both to be rounded to 123456789012345683968). (I included the second string comparison to check that the numeric comparison didn' affect the values, which I wasn't sure of). > Presently I am getting some warnings, so I think I need to deal with the > types. I already dealt with the columns that return NULL, these are > undefined in the Perl hash, so I have to test for their existence before > attempting the compare. What do you mean by "undefined in the hash"? * The keys exist but the values are undef * The keys don't exist Those are not the same thing, and you use different functions to test for them (defined() and exists() respectively). (I think it should be the former, but I'm too lazy to test it myself) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Real application clustering in postgres.
On 2020-03-06 15:55:27 +0100, Laurenz Albe wrote: > On Fri, 2020-03-06 at 03:25 -0600, Ron wrote: > > > RAC is not really a high availability solution: because of the shared > > > storage, it has a sibgle point of failure. > > > > This is utter nonsense. Dual redundant storage controllers > > connected to disks in RAID-10 configurations have been around for at > > least 25 years. > > > > Oracle got it's clustering technology from DEC, and I know > > that works. Cluster members, storage controllers and disks have all > > gone down, while the database and application keep on humming along. > > I am not saying that it is buggy, it is limited by design. > > If you have mirrored disks, and you write junk (e.g, because of > a flaw in a fibre channel cable, something I have witnessed), > then you have two perfectly fine copies of the junk. I have certainly seen enterprise SAN boxes go down (or deliver corrupted data) because of controller or firmware problems or just because a second disk in a RAID-5 failed before the spare could be brought online. But to be fair, a master/slave setup a la patroni isn't immune against "writing junk" either: Not on the hardware level (either of the nodes may have faulty hardware, and you may not notice it until too late), and more importantly, not on the software level. An erroneus DML statement (because of a bug in the application, or because the user/admin made a mistake) will cause the same wrong data to be distributed to all nodes (of course this also applies to RAC). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Table with many NULLS for indexed column yields strange query plan
On 2020-03-05 18:08:53 -0700, greigwise wrote: > I have a query like this: > > SELECT "table1".* FROM "table1" > INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id" > INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" > WHERE "table3"."number" = '' > AND ("table2"."type") IN ('Standard') ; > > table2 has a large number of NULLS in the column table3_id. There is an > index on this column. Here is the result of explain analyze: > > Merge Join (cost=1001.20..4076.67 rows=17278 width=167) (actual > time=284.918..300.167 rows=2244 loops=1) >Merge Cond: (table2.table3_id = table3.id) >-> Gather Merge (cost=1000.93..787825.78 rows=621995 width=175) (actual > time=5.786..283.269 rows=64397 loops=1) > Workers Planned: 4 > Workers Launched: 4 > -> Nested Loop (cost=0.87..712740.12 rows=155499 width=175) > (actual time=0.091..102.708 rows=13107 loops=5) >-> Parallel Index Scan using index_table2_on_table3_id on > table2 (cost=0.43..489653.08 rows=155499 width=16) (actual > time=0.027..22.327 rows=13107 loops=5) > Filter: ((type)::text = 'Standard'::text) It looks like it postgresql is scanning the index here to get the entries in the right order for the merge join. It's strange that it thinks this is a good strategy even though it has to visit every row in the table (no index cond). How is the selectivity of "type"? Would an index on that column help? hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Real application clustering in postgres.
On 2020-03-09 09:57:37 +0100, Laurenz Albe wrote: > On Sun, 2020-03-08 at 21:13 +0100, Peter J. Holzer wrote: > > But to be fair, a master/slave setup a la patroni isn't immune against > > "writing junk" either: Not on the hardware level (either of the nodes > > may have faulty hardware, and you may not notice it until too late), and > > more importantly, not on the software level. An erroneus DML statement > > (because of a bug in the application, or because the user/admin made a > > mistake) will cause the same wrong data to be distributed to all nodes > > (of course this also applies to RAC). > > Of course, nobody debates that. > > A high-availability solution only protects you from certain, well-defined > kinds of problems, usually related to hardware. Right. And enterprise class SAN storage does this: It protects you from failure of a single disk, a single cable, a single controller. Very often you can physically spread out the components so that loss of a whole rack (or server room) wouldn't affect availability. There are of course limits: When a message sent over a single cable is corrupted in a way that the checksum doesn't catch, corrupted data may be stored. But then if data in RAM is corrupted that ECC doesn't catch, the same will happen. A Patroni-based cluster isn't free of single points of failure either. So I don't buy the argument "X isn't a high availability solution because it uses shared storage". While I have seen expensive SAN boxes fail, I've also managed to wreck Patroni clusters. I'm not at all convinced that the availability of a Patroni cluster is higher than that of a failover cluster using shared storage. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: strange locks on PG 11 with Golang programs
On 2020-03-09 11:02:37 +0200, Achilleas Mantzios wrote: > Fully review your programs for connection / xaction leaks. Do you use a > connection pool? Go's sql package encourages the use of connection pools (type DB) over single connections (type Conn): | Prefer running queries from DB unless there is a specific need for a | continuous single database connection. All tutorials I've seen follow this recommendation, so a Go programmer might not even be aware that connections exist. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: PostgreSQL10.x client Vs. PostgreSQL 11.x server
On 2020-03-13 08:47:59 +, Jimmy Angelakos wrote: > By using an older client, you will be missing out on the additional > features that Postgres 11 (and its corresponding client) supports. This is not correct. You can use all the additional features which only affect the server. None of the major enhancements mentioned at the start of the release notes (partitioning, parallelism, stored procedures, JIT, ...) should depend on the client. I am talking about low level client libraries here, like libpq or a ODBC, JDBC or DBD driver. An ORM or similar framework may well depend on server-side enhancements. Also, client programs (e.g. psql) may also have some enhancements. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Formatting output (was: Order by and timestamp)
On 2020-03-15 16:48:35 -0500, Steven Lembark wrote: > On Sun, 15 Mar 2020 22:33:35 +0100:wq > Björn Lundin wrote: > > > And to my surprise i get a result like this (note the order of > > column STARTTS) > > (1) Suggest using "pastebin.com" for this kind of data. It may not > look very pretty -- or readable at all -- on the viewer's end > depending on their settings (see below for example). [...] > > marketid |marketname| startts | > > eventid …. > > …. > > > > 1.127253880 | To Be Placed | 2016-09-29 16:10:00 | > > 27951325 | 1.127275624 | 1m4f Hcap| 2016-09-30 > > 16:20:00 | 27953169 | 1.127275625 | To Be Placed | This is weird. The output is correctly formatted in the text/plain part and the HTML part looks reasonable, too: Every line is in a div of its own, so it shouldn't be jumbled together like that (Out of curiosity: What mail program did use to read this?). The only obvious problem I see is the use of the "Menlo" font. Non-Mac Users won't have that and won't know that they should substitute a monospace font, so the alignment will be off. As far as pasted output on this list goes, Björn's message looked actually quite good. Some problems I see frequently and find annoying: * Wide output (especially explain plans) with wrapped lines. I find these almost impossible to read, so I have to save the mail to a file and manually undo the line breaks to read it. I rarely bother to do that. * ASCII graphics which only line up in a certain proportional font * text/plain messages with very long lines which really should be paragraphs. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-19 16:48:19 -0700, David G. Johnston wrote: > First, it sounds like you care about there being no gaps in the records you > end > up saving. If that is the case then sequences will not work for you. I think (but I would love to be proven wrong), that *nothing* will work reliably, if 1) you need gapless numbers which are strictly allocated in sequence 2) you have transactions 3) you don't want to block Rationale: Regardless of how you get the next number, the following scenario is always possible: Session1: get next number Session2: get next nummber Session1: rollback Session2: commit At this point you have a gap. If you can afford to block, I think a simple approach like create table s(id int, counter int); ... begin; ... update s set counter = counter + 1 where id = $whatever returning counter; -- use counter commit; should work. But that effectively serializes your transactions and may cause some to be aborted to prevent deadlocks. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-20 17:11:42 -0600, Rob Sargent wrote: > On Mar 20, 2020, at 4:59 PM, Peter J. Holzer wrote: > > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote: > >> First, it sounds like you care about there being no gaps in the records > >> you end > >> up saving. If that is the case then sequences will not work for you. > > > > I think (but I would love to be proven wrong), that *nothing* will work > > reliably, if > > > > 1) you need gapless numbers which are strictly allocated in sequence > > 2) you have transactions > > 3) you don't want to block > > > > Rationale: > > > > Regardless of how you get the next number, the following scenario is > > always possible: [...] > > At this point you have a gap. > > > > If you can afford to block, I think a simple approach like [...] > > should work. But that effectively serializes your transactions and may > > cause some to be aborted to prevent deadlocks. > > OP has said small gaps are ok. Yes. This wasn't a response to the OP's requirements, but to David's (rather knee-jerk, IMHO) "don't use sequences" response. Very often the requirements which would preclude sequences also preclude any other solution. (In the case of the OP's problem, I'd agree that sequences are probably a bad idea for the reasons he anticipates) > To me that says the requirement Which requirement? The OP's or the one I posed here? > is capricious but we haven’t heard the rationale for the requirement > yet (or I missed it) The OP gave a rationale: He has to fit the counter into an 8-digit field, and a global counter would overflow that. So he needs per-element counters. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-20 17:53:11 -0700, Adrian Klaver wrote: > On 3/20/20 4:29 PM, Peter J. Holzer wrote: > > On 2020-03-20 17:11:42 -0600, Rob Sargent wrote: > > > On Mar 20, 2020, at 4:59 PM, Peter J. Holzer wrote: > > > > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote: > > > > > First, it sounds like you care about there being no gaps in the > > > > > records you end > > > > > up saving. If that is the case then sequences will not work for you. > > > > > > > > I think (but I would love to be proven wrong), that *nothing* will work > > > > reliably, if > > > > > > > > 1) you need gapless numbers which are strictly allocated in sequence > > > > 2) you have transactions > > > > 3) you don't want to block [...] > > Yes. This wasn't a response to the OP's requirements, but to David's > > (rather knee-jerk, IMHO) "don't use sequences" response. Very often the > > requirements which would preclude sequences also preclude any other > > solution. > > I don't see a knee-jerk reaction in this: > > https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com > > The response was if you cared about gaps(not something the OP had specified > at that point) then a sequence would not work. And I think that "care about gaps -> sequence doesn't work" is a knee-jerk reaction. It's similar to "can't parse HTML with regexps". True in the general case, and therefore people tend to blurt it out every time the topic comes up. But not necessarily true in specific cases. As I wrote above, there is no perfect solution - so you have to think about the actual requirements and the consequences of various solutions - and maybe using a sequence is the best (or least bad) solution. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-21 12:55:33 -0600, Rob Sargent wrote: > To me the description of the ID smacks of database-in-the-name folly. I > recognize that OP is likely unable to take another path. I’ll not push this > any > further. Identifiers often have internal structure. In Austria for example, the social security number contains the birth date. Invoice numbers, project numbers or court case numbers often contain the year. That's because they are used by *people*, and people like their identifiers to make some kind of sense. The computer doesn't care. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-21 14:51:35 -0600, Rob Sargent wrote: > > On Mar 21, 2020, at 1:13 PM, Peter J. Holzer wrote: > > > > On 2020-03-21 12:55:33 -0600, Rob Sargent wrote: > >> To me the description of the ID smacks of database-in-the-name folly. I > >> recognize that OP is likely unable to take another path. I’ll not push > >> this any > >> further. > > > > Identifiers often have internal structure. In Austria for example, the > > social security number contains the birth date. Invoice numbers, project > > numbers or court case numbers often contain the year. > > > > That's because they are used by *people*, and people like their > > identifiers to make some kind of sense. The computer doesn't care. > > Since OP said this was digital not paper, I see this as a presentation > problem bleeding into database design (assuming your response was an > invitation to push further). Well, that's the old natural primary key vs. surrogate primary key debate. (Trigger warning: You probably already heard that a gazillion times) A primary key is an attribute (or a combination of attributes) which unambiguosly identifies each entity. If the entity in question already has such an atttribute (e.g. an SSN, invoice number, username, ...) and you use that as the primary key, is is called a natural primary key[1]. If you generate a new attribute not related to existing attributes it is called a surrogate (primary) key. If the entity already has an attribute suitable as a primary key, why would you want to use another one? * The key candidate may be unwieldy: It might be a combination of several attributes, it might be a very long string, etc. * You may suspect that the key candidate is not in fact a key. There have been cases of duplicate SSNs in several countries, and of course not every person has an SSN. Some people say you should always use surrogate keys, never natural keys. Mostly because of the second reason, I think: When you design a system you have limited knowledge, so you can never be absolutely sure that what appears to be a key candidate actually is one. OTOH, surrogate keys have downsides, too: The attributes which would have been a natural key are still there and must be stored, input, output, maybe generated, used in business rules, etc. So that's a (usually modest) increase in application complexity. More importantly, everywhere you would have had a (to a domain expert) meaningful key you now have a meaningless key - which makes it harder to understand the data and makes extra joins necessary. So to inspect the contents of a table instead of a simple "select * from tablename" you need a join over three or four tables. To get back to the OP's problem: As I understand it he needs to generate that compound identifier anyway, because it is used outside of the application. He could use a surrogate key in addition to that, but I don't see any advantage to that. Most importantly it doesn't solve the problem he wants to discuss in this thread[2]. hp [1] But note that these attributes are almost always already artificial: Humans aren't born with an SSN - that is assigned by an authority to uniquely identify their clients; courts have sat for centuries without the need to number their cases; etc. [2] I admit I haven't been very helpful in this regard, either, going off on tangents at every opportunity. -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Could postgres12 support millions of sequences? (like 10 million)
On 2020-03-21 13:45:54 -0700, pabloa98 wrote: > On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer wrote: > And I think that "care about gaps -> sequence doesn't work" is a > knee-jerk reaction. It's similar to "can't parse HTML with regexps". > True in the general case, and therefore people tend to blurt it out > every time the topic comes up. But not necessarily true in specific > cases. As I wrote above, there is no perfect solution - so you have to > think about the actual requirements and the consequences of various > solutions - and maybe using a sequence is the best (or least bad) > solution. > > OK. In that case, I will proceed with the millions of sequences. We will see. Nonononononononono! (channelling Jean-Claude Laprie[1]) I was trying to say that in each case one should think about the pros and cons of different approaches and be wary of dogmatic rules. I wasn't recommending the use of sequences in your case. I would actually be surprised if they work well for you. But having said that - by all means try it. Maybe it does work, and you'll give a talk about "The 50 million sequences database" at a conference next year :-). Or maybe it doesn't, and then you'll have learned something about where the actual limits are. hp [1] I don't expect you to know him :-) -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Loading 500m json files to database
On 2020-03-23 17:18:45 -0700, pinker wrote: > Christopher Browne-3 wrote > > Well, you're paying for a lot of overhead in that, as you're > > establishing a psql command, connecting to a database, spawning a > > backend process, starting a transactions, committing a transaction, > > closing the backend process, disconnecting from the database, and > > cleaning up after the launching of the psql command. And you're > > doing that 500 million times. > > > > The one thing I left off that was the loading of a single tuple into > > json_parts. [...] > > Reducing that overhead is the single most important thing you can do. > > Yes, I was thinking about that but no idea now how to do it right now. Do you know any programming language (Python, Perl, ...)? You'll probably get a huge gain from writing a script that just opens the connection once and then inserts each file. Copy usually is even faster by a fair amount, but since you have to read the data for each row from a different file (and - if I understood you correctly, a remote one at that), the additional speedup is probably not that great in this case. Splitting the work int batches and executing several batches in parallel probably helps. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: core. files inside base directory
On 2020-03-26 10:42:46 +, Daulat Ram wrote: > Hello, I have the core. Files inside the base directory of my postgres 10 > setup > . Would you please let me know the importance of these files . When they > generate . Can we delete them ? Core files are created by the kernel when a process crashes. Unless you already know which program crashed and why, it is probably worth investigating. As a first step you could just use the file utility to find out which program crashed. file core.* should print something like core.26166: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'sleep 120', real uid: ... execfn: '/bin/sleep' ... for each file. (Of course the program won't be "sleep" in your case. To analyze the coredumps further you would have to use a debugger (e.g. gdb). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Idle sessions keep deleted files open
I noticed this with PostgreSQL 11.7 on Linux: I did a «vacuum full» on a table which had become quite bloated (size before the vacuum: 392 GB, after: 291 GB). I expected the disk usage to increase as the new table was written and then drop sharply after the vacuum was finished. However, that didn't happen. While usage increased by about 300 GB during the vacuum, it stayed the same afterwards (as reported by «df»). However, «du» reported about 400 GB less than «df». So apparently, files had been unlinked but were still in use by some process. «lsof» confirmed this: There were some idle sessions which were keeping the files open. As soon as I got those sessions to execute some request, they closed the files, causing the disk space to be freed. I think I noticed that before when dropping tables, but didn't draw the right conclusion at the time. So, if you are reorganizing large tables, keep an eye on idle sessions - they may keep deleted files around for quite some time. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature