Is pg_restore in 10.6 working?
I have some experience with different versions of Postgres, but I'm just getting around to using pg_restore, and it's not working for me at all. I can connect with psql, and pg_dump works, but pg_restore never does anything when I call it. It never even prompts for a password. Here is my pg_hba.conf: # "local" is for Unix domain socket connections only local all all ident map=super # IPv4 local connections: hostall all 0.0.0.0/0 md5 # IPv6 local connections: hostall all ::1/128 ident # Allow replication connections from localhost, by a user with the # replication privilege. local replication all peer hostreplication all 127.0.0.1/32ident hostreplication all ::1/128 ident And my pg_ident.conf # MAPNAME SYSTEM-USERNAME PG-USERNAME super ec2-userpostgres I can connect with psql either of these two ways: psql -U postgres or psql -h ip-172-31-62-127.ec2.internal -p 5432 -U postgres -W postgres (Yes, it's an AWS server) This pg_dump command works: pg_dump -U postgres -f predata.sql -F p -v -d prod_data But a matching pg_restore command does nothing. pg_restore -U postgres -f predata.sql -v I'm running 10.6. thank you
Re: Is pg_restore in 10.6 working?
I'm not following your question. The pre-data and post-data sections each go to an individual file, but the data section goes to a directory. I can restore the files using psql, but it is the restore of the directory that is hanging. On Mon, Nov 12, 2018 at 2:28 PM Rob Sargent wrote: > > On 11/12/18 11:44 AM, Tom Lane wrote: > > David writes: > >> I have some experience with different versions of Postgres, but I'm just > >> getting around to using pg_restore, and it's not working for me at all. > >> ... > >> But a matching pg_restore command does nothing. > >> pg_restore -U postgres -f predata.sql -v > > This command expects to read from stdin and write to predata.sql, so > > it's not surprising that it's just sitting there. What you want > > is something along the lines of > > > > pg_restore -U postgres -d dbname -v > > > regards, tom lane > > > > In this case, does the "General options" -f make sense? restoring to a > file? > > >
Re: Is pg_restore in 10.6 working?
Thanks, Adrian. It's running now. On Mon, Nov 12, 2018 at 5:05 PM Adrian Klaver wrote: > On 11/12/18 1:58 PM, David wrote: > > Please reply to list also. > Ccing list > > > Yes, that's what I get for writing emails while I'm doing 5 other things > > at the same time. So, let me try this again. > > pg_dump is working when I use the following: > > pg_dump -U postgres -F d -f /pgbackup/prod/data -v --section=data > prod_data > > To be clear, prod_data is the name of the database. > > This works fine, I get /pgbackup/prod/data created and populated by > > compressed files, as advertised. > > > > How to I restore this? Is there a specific combination of command line > > options for this format? > > I've tried -d prod_data and -f /pgbackup/prod/data -F d, but I get an > error: > > options -d and -f cannot be used together. > > > > So I take -d off the command line and I get > > [directory archiver] no output directory specified. > > and if I use this I get nothing at all > > pg_restore -U postgres -f /pgbackup/prod/data -v > > > > So I'm confused. > > Enlightenment: > > https://www.postgresql.org/docs/10/app-pgrestore.html > > -f filename > --file=filename > > Specify output file for generated script, or for the listing when > used with -l. Default is the standard output. > > -d dbname > --dbname=dbname > > Connect to database dbname and restore directly into the database. > > > filename > > Specifies the location of the archive file (or directory, for a > directory-format archive) to be restored. If not specified, the standard > input is used. > > > So something like: > > pg_restore -U postgres -v -d prod_data /pgbackup/prod/data > > > > > thanks again. > > > > > > > > On Mon, Nov 12, 2018 at 4:39 PM Adrian Klaver > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 11/12/18 11:39 AM, David wrote: > > > I'm not following your question. The pre-data and post-data > > sections > > > each go to an individual file, but the data section goes to a > > > directory. I can restore the files using psql, but it is the > > restore of > > > the directory that is hanging. > > > > That is not what you showed in your OP: > > > > This pg_dump command works: > > pg_dump -U postgres -f predata.sql -F p -v -d prod_data > > > > But a matching pg_restore command does nothing. > > pg_restore -U postgres -f predata.sql -v > > > > We would need to see the commands for data section to be able to > > comment > > further. > > > > > > > > On Mon, Nov 12, 2018 at 2:28 PM Rob Sargent > > mailto:robjsarg...@gmail.com> > > > <mailto:robjsarg...@gmail.com <mailto:robjsarg...@gmail.com>>> > wrote: > > > > > > > > > On 11/12/18 11:44 AM, Tom Lane wrote: > > > > David mailto:dlbarro...@gmail.com> > > <mailto:dlbarro...@gmail.com <mailto:dlbarro...@gmail.com>>> writes: > > > >> I have some experience with different versions of > > Postgres, but > > > I'm just > > > >> getting around to using pg_restore, and it's not working > > for me > > > at all. > > > >> ... > > > >> But a matching pg_restore command does nothing. > > > >> pg_restore -U postgres -f predata.sql -v > > > > This command expects to read from stdin and write to > > predata.sql, so > > > > it's not surprising that it's just sitting there. What > > you want > > > > is something along the lines of > > > > > > > > pg_restore -U postgres -d dbname -v > > > > > > > regards, tom lane > > > > > > > > > > In this case, does the "General options" -f make sense? > > restoring to > > > a file? > > > > > > > > > > > > -- > > Adrian Klaverfile:///usr/share/applications/thunderbird.desktop > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Why is tuple_percent so low?
On 27 February 2018 at 18:03, Sam Saffron wrote: > 1. Where is all my missing space, is this in page alignment stuff and > per-page overhead? Yes, tuples are MAXALIGNed when copied onto the page. That appears to be 8 bytes on your system. You're just 2 bytes over the 8-byte boundary. You'll notice the table becoming more compact if you somehow could do away with the dsecs column. > 2. Is there any other schemes I can look at for storing this data to > have a more efficient yet easily queryable / updateable table. Hard to say without learning about the use case. Some sort of de-normalisation may help to allow fewer tuples, e.g storing one column in an array, but may be more painful from a design point of view. It may also cause pain from a dead-tuple point of view if you have to UPDATE large tuples frequently. > Keep in mind these tables get huge and in many of our cases will span > 10-20GB just to store this information. I guess it's all relative, but that does not seem that large. Is this causing some sort of performance problems you're trying to solve? or is this completely down to disk space consumption? If it's down to performance then you might be able to solve that problem with an index. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: index-only-scan when there is an index on all columns
On 28 February 2018 at 11:11, Andres Freund wrote: > I'm fairly certain that only vacuum and table rewrites like cluster sets > all-visible, I don't think the pages are set all visible again after a rewrite. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Is there a continuous backup for pg ?
Hi Gary, On 3/2/18 2:05 PM, Gary M wrote: > Hi, > > I have an unusual requirement for schema based, live backup of a 24/7 > database processing 100K inserts/updates per hour. The data store is > around 100TB. > > The requirement is supporting an incremental backup of 10 minute > windows. Replication is not considered backup from malicious action. > > Are there any best practices or solutions that can meet these > requirements ? pgBackRest is specifically designed to handle very large clusters and high WAL rates. Backup, restore, and archiving can be run in parallel to speed operations. The new version 2 has been optimized to make archive-push even faster than version 1 and we will be releasing an optimized archive-get soon. You would be best off achieving your 10-minute windows with daily incremental backups and then recovery with PITR to the required time. PITR allows you to specify any time for recovery. Regards, -- -David da...@pgmasters.net
Best options for new PG instance
Hi: I'm going to be requesting a PG DB instance (v9.6.7) from an IT dept in a large corp setting. I was wondering if anyone could comment on the pros/cons of getting this put on a virtual machine vs hard metal ? Locally mounted disk vs nfs ? Thanks !
Can I grant permissions to specific set of uids (linux) ?
Hi: I'd like to grant select, insert, update, delete to a table for a specific set of uids (linux). All others get select only. Can the DB authenticate the current linux user and grant access based on the fact that they are logged in ()IOW, no passwords ?Is this possible ? If so, how ? Thanks for any help
Re: Barman versus pgBackRest
Hi Thomas, On 3/6/18 2:53 PM, Thomas Poty wrote: > Hello Community, > > I hesitate to use barman or pgBackRest. I have found a lot of common > points between them and a few differences: To be clear, I'm the original author and primary maintainer of pgBackRest. I'll let the Barman folks speak to their strengths, but I'm happy to address your points below. > About pgBarman, I like : > - be able restore on a remote server from the backup server This a good feature, and one that has been requested for pgBackRest. You can do this fairly trivially with ssh, however, so it generally hasn't been a big deal for people. Is there a particular reason you need this feature? > - use replication slots for backingup wal on the backup server. Another good feature. We have not added it yet because pgBackRest was originally written for very high-volume clusters (100K+ WAL per day) and our parallel async feature answers that need much better. We recommend a replicated standby for more update-to-date data. Even so, we are looking at adding support for replication slots to pgBackRest. We are considering a hybrid scheme that will use replication to keep the WAL archive as up to date as possible, while doing bulk transfer with archive_command. > About pgBackRest, I like : > > - real differential backup. > - lots of options > - option for backingup if PostgreSQL is already in backup mode > > > I would like to have : > - advices or feedbach about using pgBackrest or barman. > - pros and cons of these solutions I'll stick with some of the major pgBackRest pros: - Parallel backup including compression and checksums - Encryption - S3 support - Parallel archive - Delta restore - Page checksum validation - Backup resume More about features here: https://pgbackrest.org > - differences that I would not have seen. pgBackRest is used in some very demanding environments and we are constantly answering the needs of our users with features and performance improvements, e.g. the enormous improvements to archive-push speed in the 2.0 release. I'd be happy to answer any specific questions you have about pgBackRest. Regards, -- -David da...@pgmasters.net
Re: Is there a way to create a functional index that tables tableoid column as an arg?
On 11 March 2018 at 07:53, Ryan Murphy wrote: > I am using table inheritance and have e.g. the following tables: > > create table animal ( ... ); > create table dog ( ... ) inherits (animal); > create table cat ( ... ) inherits (animal); > create table person ( ... ) inherits (animal); > create table musician ( ... ) inherits (person); > create table politician ( ... ) inherits (person); > > Now I have a query that gets all the "animal"'s except for those that are > "person"'s. > > select * from only animal > > won't cut it, because it leaves out the dogs and cats. > > select *, tableoid::regclass relname from animal > where relname != 'person'::regclass > > also won't cut it because it leaves out the musicians and politicians. > > So I have created an immutable function is_a_kind_of(tbl regclass, > parent_tbl regclass) that returns true iff tbl is identical with, or > directly or indirectly inherits from, parent_tbl. For example: > > is_a_kind_of('person','person') => true > is_a_kind_of('person','animal') => true > is_a_kind_of('musician','person') => true > is_a_kind_of('animal','person') => false > is_a_kind_of('dog','person') => false > > No problems so far. Now my query works: > > select *,tableoid from "animal" > where not is_a_kind_of(tableoid::regclass::text, 'person') > > This query is somewhat slow though - I'd like to index the is_a_kind_of() > call. And Postgres supports functional indexes! So I try: > create index animal_is_person on animal ( > is_a_kind_of(tableoid::regclass, 'person') ); > > ERROR: index creation on system columns is not supported > > I see that this is because "tableoid" is a system column. Does anyone know > any workaround for this? So close yet so far away! I don't think having an index on the tableoid would help you here anyway. Previous versions did allow indexes on system columns, but that was disallowed recently due to the fact that the value of most system columns are changed internally by postgres and the indexes wouldn't correctly be updated. tableoid might be an exception to this, but it does not really seem like a useful column to index, giving it would be indexing the same value for each record in the table. The reason it's slow is that PostgreSQL is executing the function once per input row. In this case, that must be done because you're passing in tableoid as a function parameter, and that could change from one row to the next (once the Append moves to the next subplan). What you'd want to do instead is create a set-returning function that finds all the inherited children and returns them all. This will allow PostgreSQL to execute the function just once at the start of the query, then join the results to this function. Something like: create or replace function get_inherited_tables (prelid oid) returns setof oid as $$ with recursive c as ( select inhrelid,inhparent from pg_inherits where inhparent = prelid union all select i.inhrelid,i.inhparent from pg_inherits i inner join c on i.inhparent = c.inhrelid ) select prelid union all select inhrelid from c; $$ language sql stable; then have your queries do: -- get all animals that are persons select ... from animal where tableoid in (select get_inherited_tables('person'::regclass); -- get all animals that are not persons select ... from animal where tableoid not in (select get_inherited_tables('person'::regclass); Just be careful around search_paths and your use of regclass. In this case, if "animal" was not in the first schema in search_path, but someone created another table called "person" that was in the first schema listed in search_path, then the query would not do what you want. You might want to consider prefixing the input parameter into get_inherited_tables with the schema name too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: primary key and unique index
On 23 March 2018 at 20:55, Thomas Poty wrote: > In MySQL a "show create table" gives me : > ... > PRIMARY KEY (`ID`,`CountryCode`,`LanguageCode`), > UNIQUE KEY `unique_my table_4` (`ID`,`CountryCode`,`LanguageCode`), > ... > > So, In PostgreSQL, does it make sense to create a primary key AND a unique > index based on the same columns? > Is PostgreSQL smart enough to use the unique index created for the primary > key. Doing this makes no sense in PostgreSQL. I'm struggling to imagine why it would in MySQL. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_stat_statements: password in command is not obfuscated
On 24 March 2018 at 10:30, legrand legrand wrote: > It seems that passwords used in commands are not removed when caught by > pg_stat_statements > (they are not "normalized" being utility statements) > > exemple: > alter role tt with password '123'; > > select query from public.pg_stat_statements > where query like '%password%'; > > query > > alter role tt with password '123'; > > Do you think its a bug ? If it is, then it's not a bug in pg_stat_statements. log_statement = 'ddl' would have kept a record of the same thing. Perhaps the best fix would be a documentation improvement to mention the fact and that it's best not to use plain text passwords in CREATE/ALTER ROLE. Passwords can be md5 encrypted. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bad Query Plans on 10.3 vs 9.6
On 29 March 2018 at 18:26, Cory Tucker wrote: > Hello all. I'm migrating a database from PG 9.6 to 10.3 and have noticed a > particular query that is performing very badly compared to its 9.6 > counterpart. > > The plan on 9.6 v 10.3 are effectively identical except in 9.6 the planner > decides to use an index only scan on the primary key and in 10.3 it does a > sequential scan. The problem is the sequential scan is for a table of 75M > rows and 25 columns so its quiet a lot of pages it has to traverse. How certain are you that all the indexes match on each instance? It would be useful to see psql's \d output for each table in question. Another option for you to consider would be to get rid of the OR clause in the query entirely and have two separate CTEs and INSERT INTO your orphaned_matches table having SELECTed from both CTEs with a UNION ALL. A record already deleted won't appear in the 2nd branch of the UNION ALL result. However, that still might not fix your issue with the index not being used, but you may find the anti-joined version of the query is faster anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bad Query Plans on 10.3 vs 9.6
'On 30 March 2018 at 03:21, Cory Tucker wrote: >> Another possibility is that 10.3 sees the index-only scan as too expensive >> because it thinks most of the table isn't all-visible. Comparing >> pg_class.relallvisible values might be informative. > I'm happy to try to dig into this one more, however, I'm not familiar with > this value. What should I be looking for here? Each table in your database has an entry in the pg_class table. Something like: SELECT relallvisible from pg_class where oid = 'build.household'::regclass; would show you the value, however, I think a problem here is unlikely since that would just control the likelihood of an index-only-scan vs an index-scan. You're getting a Seq-scan, which I imagine is going to be quite a bit more expensive than even an index scan. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
hardcode password in connect string
Hi: PG v9.5.2 on RHEL I like to use an alias to connect to my favorite DBs but don't like to enter passwords. I used to just disable passwords (working behind a firewall), but this one is different. I see nothing in the interactive connect string where I can enter the password... psql -h thehost -U memyselfi mydb Missing something like "-p mypassword" Is there a way I can stick the pw in the linux alias definition ? Just thinking something like this may be available since hardcoded passwords are supported in perl/dbi, pg driver Thanks !
Re: hardcode password in connect string
PGPASSWORD env var works fine. Thanks ! On Fri, Apr 13, 2018 at 2:46 PM, James Keener wrote: > Is setting it as an environment variable an option? > https://www.postgresql.org/docs/9.1/static/libpq-envars.html > > Alternatively, a service file? https://www.postgresql.org/ > docs/9.0/static/libpq-pgservice.html > > Jim > > > On April 13, 2018 2:43:01 PM EDT, David Gauthier > wrote: >> >> Hi: >> >> PG v9.5.2 on RHEL >> >> I like to use an alias to connect to my favorite DBs but don't like to >> enter passwords. I used to just disable passwords (working behind a >> firewall), but this one is different. I see nothing in the interactive >> connect string where I can enter the password... >> >> psql -h thehost -U memyselfi mydb >> >> Missing something like "-p mypassword" >> >> Is there a way I can stick the pw in the linux alias definition ? >> >> Just thinking something like this may be available since hardcoded >> passwords are supported in perl/dbi, pg driver >> >> Thanks ! >> > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. >
Re: Barman versus pgBackRest
Hi Thomas, On 4/11/18 3:14 AM, Thomas Poty wrote: Sorry for answering only now but I just saw you answer only now. > To be clear, I'm the original author and primary maintainer of pgBackRest. I am very happy to see guys like you to take time to answer me. Thank you You are welcome. Users are the reason I work on this project. >This a good feature, and one that has been requested for pgBackRest. You >can do this fairly trivially with ssh, however, so it generally hasn't >been a big deal for people. Is there a particular reason you need this >feature? The reason is probably a psychologic matter but I like the idea of a unique connecting point to restore DBs of different location. I am very impatient to see "replication slot" support and "remote restore" feature added. Remote restore is likely to land first, though neither feature is currently at the top of the list. Unfortunately, we have limited resources and must prioritize. Regards, -- -David da...@pgmasters.net
Can PostgreSQL create new WAL files instead of reusing old ones?
tl;dr: We've found that under many conditions, PostgreSQL's re-use of old WAL files appears to significantly degrade query latency on ZFS. The reason is complicated and I have details below. Has it been considered to make this behavior tunable, to cause PostgreSQL to always create new WAL files instead of re-using old ones? Context: we're running a large fleet of PostgreSQL shards. Each shard consists of a primary, a synchronous standby, and an asynchronous standby using chained replication. For this problem, we can consider only the primary and synchronous standby. PostgreSQL: 9.6.3 OS: illumos (SmartOS, mixed versions, but all from 2017 or later) FS: ZFS over mirrored HDDs (not SSDs), using a record size of 8K to match PostgreSQL's record size. We have an SSD log device for completing synchronous writes very quickly. WAL files are 16MB each, and we're keeping a very large number of segments. (There's likely a lot of improvement for WAL tuning here.) Since we're using an 8K record size, when PostgreSQL writes small (or non-aligned) records to the WAL files, ZFS has to read the old contents in order to write the new 8K record (read-modify-write). If that's in cache, that's not a big deal. But when PostgreSQL decides to reuse an old WAL file whose contents have been evicted from the cache (because they haven't been used in hours), this turns what should be a workload bottlenecked by synchronous write performance (that's well-optimized with our SSD log device) into a random read workload (that's much more expensive for any system). What's significantly worse is that we saw this on synchronous standbys. When that happened, the WAL receiver was blocked on a random read from disk, and since it's single-threaded, all write queries on the primary stop until the random read finishes. This is particularly bad for us when the sync is doing other I/O (e.g., for an autovacuum or a database backup) that causes disk reads to take hundreds of milliseconds. Reusing old WAL files seems like an optimization intended for filesystems that allocate disk blocks up front. With copy-on-write, it doesn't seem to make much sense. If instead of using an old WAL file, PostgreSQL instead just created a new one, there would be no random reads required to complete these operations, and we believe we'd completely eliminate our latency outliers. Thanks, Dave
Re: Can PostgreSQL create new WAL files instead of reusing old ones?
On Tue, Apr 17, 2018 at 11:57 AM, Tom Lane wrote: > Alvaro Herrera writes: > > David Pacheco wrote: > >> tl;dr: We've found that under many conditions, PostgreSQL's re-use of > old > >> WAL > >> files appears to significantly degrade query latency on ZFS. The > reason is > >> complicated and I have details below. Has it been considered to make > this > >> behavior tunable, to cause PostgreSQL to always create new WAL files > >> instead of re-using old ones? > > > I don't think this has ever been proposed, because there was no use case > > for it. Maybe you want to work on a patch for it? > > I think possibly the OP doesn't understand why it's designed that way. > The point is not really to "recycle old WAL files", it's to avoid having > disk space allocation occur during the critical section where we must > PANIC on failure. Now, of course, that doesn't really work if the > filesystem is COW underneath, because it's allocating fresh disk space > anyway even though semantically we're overwriting existing data. > But what I'd like to see is a fix that deals with that somehow, rather > than continue to accept the possibility of ENOSPC occurring inside WAL > writes on these file systems. I have no idea what such a fix would > look like :-( I think I do understand, but as you've observed, recycling WAL files to avoid allocation relies on the implementation details of the filesystem -- details that I'd expect not to be true of any copy-on-write filesystem. On such systems, there may not be a way to avoid ENOSPC in special critical sections. (And that's not necessarily such a big deal -- to paraphrase a colleague, ensuring that the system doesn't run out of space does not seem like a particularly surprising or heavy burden for the operator. It's great that PostgreSQL can survive this event better on some systems, but the associated tradeoffs may not be worthwhile for everybody.) And given that, it seems worthwhile to provide the operator an option where they take on the risk that the database might crash if it runs out of space (assuming the result isn't data corruption) in exchange for a potentially tremendous improvement in tail latency and overall throughput. To quantify this: in a recent incident, transaction latency on the primary was degraded about 2-3x (from a p90 of about 45ms to upwards of 120ms, with outliers exceeding 1s). Over 95% of the outliers above 1s spent over 90% of their time blocked on synchronous replication (based on tracing with DTrace). On the synchronous standby, almost 10% of the WAL receiver's wall clock time was spent blocked on disk reads in this read-modify-write path. The rest of the time was essentially idle -- there was plenty of headroom in other dimensions (CPU, synchronous write performance). Thanks, Dave
Re: Inconsistent compilation error
On 19 April 2018 at 13:28, Melvin Davidson wrote: > It is not fine. You have specifed TWO percent signs (%) which requires TWO > argumenrts, > but you have only provided ONE -> r. I'm confused about this statement. Did you perhaps overlook the fact that there are in fact two arguments? I'm unsure why you're considering 'blah_history.original_id' not to be an argument here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Backup Strategy Advise
Hi: I need some advise on how best to backup a PG DB. PG 9.5.2 on RHEL6 The requirement is to be able to restore the DB after catastrophic failure and lose no more than the last 15 minutes worth of data. Also, we would like to be able to do the backups on-line (no down time). There is no need for PITR other than the point in time being the latest possible. Typically, I would think doing a weekly full backup, daily incremental backups and turn on journaling to capture what goes on since the last backup. When DB recovery is needed, restore up to the last daily, then reply the journal to restore up to the last time the journal was flushed to disk (checkpoint=15 minutes). I'm not quite sure if something like this is possible with PG. I've read about the WAL file and wonder if it could be used together with the on-line logical backups (pg_dump) to achieve the 15 minute requirement without needing downtime for physical backups.. Any advise? Thanks in Advance.
Run external command as part of an sql statement ?
Hi: At the psql prompt, I can do something like... "select \! id -nu" ...to get the uid of whoever's running psql. I want to be able to run a shell command like this from within a stored procedure. Is there a way to do this ? Thanks
Re: What is the C function to modify attribute
On 17 May 2018 at 15:36, a <372660...@qq.com> wrote: > I am writing a trigger that will record all adjustment of the row into the > last column of the row. I have done a iterator to produce text record and > the last step would be add the text record into the last column (which > designed as an array of user defined type, with the output function display > the abstraction of update and a user defined function is used to retrieve > detail update info). > > Now I would like to find a way that will allow me to update the last column > in C efficiently. > > Can anyone help? Thank you so much!! You might find https://www.postgresql.org/docs/10/static/triggers.html useful. There's a complete example of a trigger function written in C there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: limit and query planner
On 6 June 2018 at 07:17, armand pirvu wrote: > So since > select count(*) from sp_i2birst_reg_staging_test; > count > --- > 6860 > and > select count(*) from sp_i2birst_reg_staging_test where evt_id = 'ACSF17' > and status=0 ; > count > --- > 4239 > > That means to me I fetch almost the whole table and then I fall in the case > you described seqscan-and-sort wins over indexscan . The planner simply assumes that 1 in (6860.0 / 4239.0) rows matches your WHERE clause. Since you want 10 rows from the LIMIT, it thinks it'll just need to read 17 rows from the heap to answer your query. > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=1; > QUERY PLAN > Index Scan using fooidx on sp_i2birst_reg_staging_test (cost=0.28..202.91 > rows=500 width=519) (actual time=0.097..0.527 rows=500 loops=1) >Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 1.024 ms > Execution time: 0.766 ms > this gets 500 rows out of 6860 > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=1 limit 10; > >QUERY PLAN > Limit (cost=0.28..4.33 rows=10 width=519) (actual time=0.073..0.105 rows=10 > loops=1) >-> Index Scan using fooidx on sp_i2birst_reg_staging_test > (cost=0.28..202.91 rows=500 width=519) (actual time=0.072..0.101 rows=10 > loops=1) > Index Cond: (((evt_id)::text = 'ACSF17'::text) AND (status = 1)) > Planning time: 0.280 ms > Execution time: 0.173 ms > > Back to the original > > explain analyze select * from sp_i2birst_reg_staging_test where evt_id = > 'ACSF17' > and status=0 limit 1 ; > > QUERY PLAN > Limit (cost=0.00..0.13 rows=1 width=519) (actual time=0.021..0.021 rows=1 > loops=1) >-> Seq Scan on sp_i2birst_reg_staging_test (cost=0.00..548.40 rows=4239 > width=519) (actual time=0.019..0.019 rows=1 loops=1) > Filter: (((evt_id)::text = 'ACSF17'::text) AND (status = 0)) > Rows Removed by Filter: 1 > Planning time: 0.286 ms > Execution time: 0.110 ms For the more restrictive status, the planner thinks more rows will need to be looked at, which increases the cost of the seqscan, and the planner favours the index scan. You can see the planner estimates 500 rows will match the status=1 query. So thinks ceil(1 * (6860 / 500.0) * 10) = 138 rows will need looked at in the seqscan plan. That's obviously more costly than 17 rows. So the index scan begins to look more favourable. The planner always assumes the rows are evenly distributed, which is not always the case. If your ten rows were at the end of the heap, then the seqscan would be a bad plan. In this case "Rows Removed by Filter" would be high. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Doing a \set through perl DBI ?
Hi: Is there a way to do the equivalent of a "\set foo 1" through perl dbi ? I tried... $dbh->do("\\set foo 1"); and got a syntax error Of course, I'd also have to be able to access the value of foo once its set. I'm guessing the usual way ??? (select :foo) Thanks for any help !
Re: Doing a \set through perl DBI ?
I think I found my own answer. I wanted to use the current linux user's uid as part of a query (again, this is a perl/DBI script). I was thinking I might be able to set a variable into the DB session somehow using \set through DBI to accomplish this. The solution that finally occurred to me was to create a temporary table and have the perl script shove the uid into a column in that table, then query against that. Sort of like using a temp table to store variables set from outside. On Wed, Jun 6, 2018 at 11:46 AM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wednesday, June 6, 2018, David Gauthier > wrote: > >> Hi: >> >> Is there a way to do the equivalent of a "\set foo 1" through perl dbi ? >> I tried... >> $dbh->do("\\set foo 1"); >> and got a syntax error >> >> Of course, I'd also have to be able to access the value of foo once its >> set. I'm guessing the usual way ??? (select :foo) >> >> > No. Meta commands are psql client program only. You'd need to explain > why you want this to get reasonable server-side suggestions. > > David J. > >
Re: Can PostgreSQL create new WAL files instead of reusing old ones?
On Wed, Jun 20, 2018 at 10:35 AM, Jerry Jelinek wrote: > As Dave described in his original email on this topic, we'd like to avoid > recycling WAL files since that can cause performance issues when we have a > read-modify-write on a file that has dropped out of the cache. > > I have implemented a small change to allow WAL recycling to be disabled. > It is visible at: > https://cr.joyent.us/#/c/4263/ > > I'd appreciate getting any feedback on this. > > Thanks, > Jerry > > For reference, there's more context in this thread from several months ago: https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com#cacukrjo7djvub8e2aijoayj8bfkk3xxbtwu3kkaritr67m3...@mail.gmail.com I'll repeat the relevant summary here: tl;dr: We've found that under many conditions, PostgreSQL's re-use of old > WAL files appears to significantly degrade query latency on ZFS. The > reason is > complicated and I have details below. Has it been considered to make this > behavior tunable, to cause PostgreSQL to always create new WAL files > instead of re-using old ones? Thanks, Dave
Re: help understanding create statistic
On 28 June 2018 at 21:38, Luca Ferrari wrote: > CREATE INDEX idx_year > ON expenses ( EXTRACT( year FROM day ) ); > > why is the planner not choosing to use such index on a 'year' raw query? > > EXPLAIN SELECT * FROM expenses > WHERE year = 2016; The expression in the where clause must match the indexed expression. You'd need to add an index on just (year) for that to work. > Am I misunderstaing this functional dependency? Yeah, the statistics are just there to drive the planner's costing. They won't serve as proof for anything else. All you've done by creating those stats is to allow better estimates for queries such as: SELECT * FROM expenses WHERE day = '2018-06-28' and year = 2018; > stxdependencies | {"3 => 5": 1.00} It would appear that "3" is the attnum for day and "5" is year. All that tells the planner is that on the records sampled during analyze is that each "day" had about exactly 1 year. There's nothing then to stop you going and adding a record with the day '2017-01-01' and the year 2018. The stats will remain the same until you analyze the table again. If those stats didn't exist, the planner would have multiplied the selectivity estimates of each item in the WHERE clause individually. So if about 10% of records had year=2018, and 0.01% had '2018-06-28', then the selectivity would have been 0.1 * 0.001. With a functional dependency of 1, the selectivity just becomes 0.001. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Convert Existing Table to a Partition Table in PG10
On 1 July 2018 at 10:15, Clifford Snow wrote: > I also leaned that my range partition value I used on a timestamp needed to > have fractional seconds. I used a range of 2017-01-01 00:00:00 to > 2017-23:59:59 which failed when I attempted to add a record that had a > timestamp of 2017-23:59:59. Adding a fractional second to the range solved > the problem. Please be aware that with RANGE partitions the upper bound is non-inclusive. The lower bound is inclusive. If you want a 2017 partition, then FOR VALUES FROM ('2017-01-01') TO ('2018-01-01') will allow all 2017 timestamps and only 2017 timestamps. You've no need to consider precision of the type and how many 9's you add to anything here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: FK v.s unique indexes
On 3 July 2018 at 19:30, Rafal Pietrak wrote: > tst=# create table test1(load bigint, a int, b int, c bool) partition by > list (c); > CREATE TABLE > tst=# create table test1_true partition of test1 for values in (true); > CREATE TABLE > tst=# create table test1_false partition of test1 for values in (false); > CREATE TABLE > tst=# create unique index load ON test1 (load,a,b,c); > CREATE INDEX > tst=# create table info_text1 (load text, a int, b int, c bool, info > text, foreign key (load,a,b,c) references test1(load,a,b,c)) ; > ERROR: cannot reference partitioned table "test1" > -- > > Why is this forbidden? I don't think there were any actual roadblocks, it was more of just not enough time in the cycle to make it work due to a few technical details that required extra effort to make work. Alvaro managed to simplify the problem and allow foreign keys to be defined on partitioned tables and get that into PG11. So it was a case of 50% is better than 0%, which I very agree with. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_dump out of memory
On 4 July 2018 at 14:43, Andy Colson wrote: > I moved a physical box to a VM, and set its memory to 1Gig. Everything > runs fine except one backup: > > > /pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep > > g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed. > pg_dump: Error message from server: ERROR: out of memory > DETAIL: Failed on request of size 1073741823.> pg_dump: The command was: > COPY public.ofrrds (id, updateddate, bytes) TO > stdout; There will be less memory pressure on the server if the pg_dump was performed from another host. When running pg_dump locally the 290MB bytea value will be allocated in both the backend process pg_dump is using and pg_dump itself. Running the backup remotely won't require the latter to be allocated on the server. > I've been reducing my memory settings: > > maintenance_work_mem = 80MB > work_mem = 5MB > shared_buffers = 200MB You may also get it to work by reducing shared_buffers further. work_mem won't have any affect, neither will maintenance_work_mem. Failing that, the suggestions of more RAM and/or swap look good. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
As a table owner, can I grant "grant" ?
Postgres 9.5.2 on linux Given that I am not superuser, but DO have createrole... Can I grant some other role the ability to grant access to a table I created ? For Example: I create a table called foo. As the creator/owner of this table, I seem to have the ability to grant select,insert,update,delete (etc...) to other roles. I would like to grant some other role the ability to grant access to my table. Thanks in Advance !
Re: How to set array element to null value
On Mon, Jul 09, 2018 at 03:28:45PM +0530, Brahmam Eswar wrote: > I'm trying to reset array element to null. You can do this in SQL as follows: SELECT ARRAY( SELECT CASE e WHEN 'ABC' THEN NULL ELSE e FROM UNNEST(x) _(e) ) This should really be going to pgsql-general because to is about how to use PostgreSQL, not how to change PostgreSQL. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
timestamp (military) at time zone without the suffix
Hi: I would like to get the utc timestamp, 24-hr clock (military time), without the time zone suffix. Below commands were run nearly at the same time... sqfdev=> select now()::timestamp(0) ; now - 2018-07-11 15:27:12 (1 row) ...then immediately... sqfdev=> select now()::timestamp(0) at time zone 'utc' ; timezone 2018-07-11 11:27:12-04 (1 row) 15:27:12 makes sense (it's a bout 3:30 in the afternoon EST). 11:27:12 doesn't make sense. UTC is 5 hours ahead. I would have expected either 20:27 (if it stuck to military time, which I want), or 08:27 (P.M., non-military time) And I want to get rid of the -04 suffix. Is there a way to do this ? Thanks !
Re: timestamp (military) at time zone without the suffix
OK, the "to_char" gets rid of the timezone extension. But the times still don't make sense. UTC should be 5 hours ahead, not behind. It should be EST plus 5 hours (or 4 for DST), not minus. That's why I said I expected 20:27 . When I go to store this in a DB, I want to store the UTC time. How d I do that ? insert into foo (dt) values (localtimestamp(0) at time zone 'utc') ??? On Wed, Jul 11, 2018 at 3:45 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wednesday, July 11, 2018, David Gauthier > wrote: > >> Hi: >> >> I would like to get the utc timestamp, 24-hr clock (military time), >> without the time zone suffix. >> >> Below commands were run nearly at the same time... >> >> sqfdev=> select now()::timestamp(0) ; >> now >> - >> 2018-07-11 15:27:12 >> (1 row) >> >> ...then immediately... >> >> sqfdev=> select now()::timestamp(0) at time zone 'utc' ; >> timezone >> >> 2018-07-11 11:27:12-04 >> (1 row) >> >> >> 15:27:12 makes sense (it's a bout 3:30 in the afternoon EST). >> 11:27:12 doesn't make sense. UTC is 5 hours ahead. >> > > Apparently it's only four hours ahead of your server's time zone setting. > > >> > I would have expected either 20:27 (if it stuck to military time, which >> I want), or 08:27 (P.M., non-military time) >> >> And I want to get rid of the -04 suffix. >> >> Is there a way to do this ? >> > > Specify an appropriate format string with the to_char function. > > https://www.postgresql.org/docs/10/static/functions-formatting.html > > David J. > >
Re: timestamp (military) at time zone without the suffix
Thanks Everyone, they all work, but TL's seems to be the simplest... select current_timestamp(0) at time zone 'utc' I'm kinda stuck with the timestamp data type (vs timestamptz). Wondering if I can stick with that. One last question... I want to store the current UTC date/time in the DB. Does PG unconditionally store something like UTC, then let the queries figure out how they want to look at it (with "at time zone" and "to_char()" etc...) ? Or do I have to intentionally store the UTC value somehow? Right now the code is just inserting and updating records using "localtimestamp(0)". On Wed, Jul 11, 2018 at 4:11 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wednesday, July 11, 2018, David Gauthier > wrote: > >> OK, the "to_char" gets rid of the timezone extension. But the times >> still don't make sense. >> >> When I go to store this in a DB, I want to store the UTC time. How d I >> do that ? >> > > Use the data type that represents exactly that, timestamptz. Using the > timestamp data type is generally not what you want even if you can get the > manipulation logic figured out. > > David J. >
sorting/comparing column values in non-alphanumeric sorting ways ?
Hi: I have a table listing tools and tool versions for a number of different tool configurations. Something like this... create table tv (tool text, tcfg1 test, tcfg2 text, tcfg3 text, highestver text); insert into tv (tool,tcfg1mtcfg2,tcfg3) values ('tool_a','1.0.5b','1.0.10','1.0.9'); I want to load the 'highestver' column with the highest version of tcfg1-3. This won't work... update tv set greatest = greatest(tcfg1,tcfg2,tcfg3) ...because it thinks 1.0.9 is greater than 1.0.10 Is there a way to get this to work right ? Thanks in Advance !
Re: Optimizing execution of expensive subqueries
On 12 July 2018 at 02:24, Mathieu Fenniak wrote: > I'm currently looking at a query that is generally selecting a bunch of > simple columns from a table, and also performing some subqueries to > aggregate related data, and then sorting by one of the simple columns and > paginating the result. > > eg. > > SELECT > tbl.field1, tbl.field2, tbl.field3, ..., > (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId > AND anothertbl.ThingyId = 1) as Thingy1Sum, > ... repeat for multiply thingies ... > FROM > tbl > ORDER BY tbl.field1 LIMIT 20 > > I'm finding that if "tbl" contains hundreds of thousands of rows, the > subqueries are being executed hundreds of thousands of times. Because of > the sorting and pagination, this is appears to be unnecessary, and the > result is slow performance. (PostgreSQL 9.5.9 server) You've got two choices. 1) You can add a btree index on field1 so that the executor does not need to examine all records before taking the top-20, or; 2) move the subquery out of the target list and instead make it a LEFT JOIN adding an appropriate GROUP BY clause. #2 might not be a great option since it may require building groups that don't get used, but it would likely be the bast option if you didn't have a LIMIT clause, or the LIMIT was a larger percentage of the total records. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Finding out why parallel queries not avoided
On 21 July 2018 at 20:15, Didier Carlier wrote: > explain select count(*) from calendar c1, calendar c2, measure m where > c1.stddate='2015-01-01' and c2.stddate='2015-12-31' and m.fromdateid > >=c1.calendarid and m.fromdateid < c2.calendarid; > QUERY PLAN > -- > Aggregate (cost=5073362.73..5073362.74 rows=1 width=8) >-> Nested Loop (cost=8718.47..4988195.81 rows=34066770 width=0) > -> Index Scan using calendar_stddate_unique on calendar c2 > (cost=0.28..2.30 rows=1 width=4) >Index Cond: (stddate = '2015-12-31 00:00:00+01'::timestamp > with time zone) > -> Nested Loop (cost=8718.19..4647525.81 rows=34066770 width=4) >-> Index Scan using calendar_stddate_unique on calendar c1 > (cost=0.28..2.30 rows=1 width=4) > Index Cond: (stddate = '2015-01-01 > 00:00:00+01'::timestamp with time zone) >-> Bitmap Heap Scan on measure m (cost=8717.91..4306855.81 > rows=34066770 width=4) > Recheck Cond: ((fromdateid >= c1.calendarid) AND > (fromdateid < c2.calendarid)) > -> Bitmap Index Scan on idx_measure_fromdate > (cost=0.00..201.22 rows=34072527 width=0) >Index Cond: ((fromdateid >= c1.calendarid) AND > (fromdateid < c2.calendarid)) > > Both queries return the same answers but I don't see why the second one > doesn't use parallel query. You'd likely be better of writing the query as: select count(*) from measure where fromdateid >= (select calendarid from calendar where stddate = '2015-01-01') and fromdateid < (select calendarid from calendar where stddate = '2015-12-31'); The reason you get the poor nested loop plan is that nested loop is the only join method that supports non-equijoin. Unsure why you didn't get a parallel plan. Parallel in pg10 supports a few more plan shapes than 9.6 did. Unsure what version you're using. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Need text wrapping inside a column to avoid record wrapping (tabular output of 'select')
Hi: psql (9.6.7, server 9.1.9) on RHEL6 In order to avoid record wrapping in the tabular output of a "select" statement, I need to limit the width of certain columns. For those columns, I would like to have text wrapping so as not to lose any information (IOW, I don't want to simply truncatate and lose info). Example: name |age |life_story |favorite_color ---+-+--+-- madame marie | 123 | She was born a long time ago, blah, blah,| yellow | | blah, blah, blah, blah, blah, blah, blah,| | | blah, more-more-mmore-more-more, | | | andmore-andmore-andmore-andmore, blah, | | | blah, blah, blah, blah, blah, blah, blah | | | and now she's 123 years old | ---+-+--+--- ... that sort of thing. Also, line breaks in the "life_story" where they make sense, like on spaces, as you see in lines 3&4 of that column Is there an easy way to configure PG to do this ? Thanks in Advance !
Re: Need text wrapping inside a column to avoid record wrapping (tabular output of 'select')
That does it. Good enough, despite the non-white space wrapping thing. Thanks ! On Wed, Jul 25, 2018 at 12:53 PM, Adrian Klaver wrote: > On 07/25/2018 09:40 AM, David Gauthier wrote: > >> Hi: >> >> psql (9.6.7, server 9.1.9) on RHEL6 >> >> In order to avoid record wrapping in the tabular output of a "select" >> statement, I need to limit the width of certain columns. For those >> columns, I would like to have text wrapping so as not to lose any >> information (IOW, I don't want to simply truncatate and lose info). >> >> Example: >> >> name |age |life_story >> |favorite_color >> ---+-+-- >> +-- >> madame marie | 123 | She was born a long time ago, blah, blah,| yellow >> | | blah, blah, blah, blah, blah, blah, blah,| >> | | blah, more-more-mmore-more-more, | >> | | andmore-andmore-andmore-andmore, blah, | >> | | blah, blah, blah, blah, blah, blah, blah | >> | | and now she's 123 years old | >> ---+-+-- >> +--- >> >> ... that sort of thing. Also, line breaks in the "life_story" where they >> make sense, like on spaces, as you see in lines 3&4 of that column >> >> Is there an easy way to configure PG to do this ? >> > > In psql: > > create table wrap_test(name varchar, age integer, life_story varchar, > favorite_color varchar); > > insert into wrap_test values ('madame marie', 123, 'She was born a long > time ago, blah, blah, yellow eblah, blah, blah, blah, blah, blah, blah, > blah, more-more-mmore-more-more, andmore-andmore-andmore-andmore, blah, > blah, blah, blah, blah, blah, blah, blah and now she''s 123 years old', > 'yellow'); > > \pset format wrapped > > select * from wrap_test ; > > I am not showing output as email client distorts it. > > >> Thanks in Advance ! >> > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Restore relhaspkey in PostgreSQL Version 11 Beta
On 31 July 2018 at 11:11, Melvin Davidson wrote: >> If you want stability use information_schema which we'll try very hard to >> not ever break. > Of course. Would you be so kind as to point out where in the > information_schema it > indicates if a table has a primary key or not. Oh wait, now I remember...no > place. With all due respect Sir, you're making a fool of yourself here. I'd suggest that before you debate or argue with people that you ensure that you're correct. This can often be hard to do on the spot, but excuses dwindle a bit more when the communication is asynchronous via email. It's not that difficult to find information_schema.table_constraints and see that constraint_type has "PRIMARY KEY" >>Keeping random atavistic things around, would slow us down, which will be a >>price everybody is paying. > Random atavistic things? I hardly think relhaspkey is random. It's been > there since version 7.2. > Exactly how does keeping it around slow you/us down? Well, it's been known that some people misuse relhaspkey. For example, in [1], someone is recommending to use relhaspkey to check for tables which don't have a PRIMARY KEY constraint. This was the wrong advice as the flag could remain set after the primary key was dropped from the table and before any vacuum took place on the table. The correct advice should have been the same as what Tom mentioned above, by checking for a pg_index record for the table with indisprimary as true. Alternatively, another useful response would have been to check information_schema, which would have provided an SQL standard way to check. Now, in regards to [1]. I rather hope that you can sympathize with the decision to remove the column a little as the person who made the incorrect recommendation in [1] was none other than you yourself. So it seems that you've only assisted in contributing to the columns removal by not only misusing it yourself but also instructing others, publically to do the same. Now, in regards to your general tone here. It appears you're under the assumption that the column was removed for some malicious reason in order to break people's scripts, but I can assure you, there was no malicious intent involved. However, the column *was* removed exactly in order to break queries. The reason being it was most likely the queries were already broken and we deemed the problem big enough to remove the misleading column in order to let people know their queries were broken. Your argument to put the column back carries very little weight, as it appears your script is trying to determine which tables have no primary key incorrectly. So I'd recommend that, instead of expending some keystrokes in replying to this email, that instead, you spend them fixing your broken code. Tom has kindly given you a very good starting point too. Personally, if I had been using a query like yours, I'd be thanking Peter for highlighting it was broken for me. If you'd like something else to read, please also look at [2]. I imagine this is the sort of thing that Andres is talking about. [1] https://www.postgresql.org/message-id/CANu8FiyQsQg7bF3FPT+FU=kK=wjhfewpp+6qe9fxf6yxr+w...@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20140317185255.20724.49675%40wrigleys.postgresql.org -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Template0 datfrozenxid age is 160million and progressing
On 2 August 2018 at 02:31, Rijo Roy wrote: > I agree.. But unfortunately it is the business call and we cannot alter it.. > I am planning to convince them by keeping autovacuum_analyze_threshold to a > high value so that auto analyse will not kick in very often leaving the > autovacuum to do its job.. > Please advise. If autovacuum is disabled, it'll still kick in for any anti-wraparound work that needs to be performed. This is also mentioned in the docs: "Note that even when this parameter is disabled, the system will launch autovacuum processes if necessary to prevent transaction ID wraparound. See Section 24.1.5 for more information." https://www.postgresql.org/docs/10/static/runtime-config-autovacuum.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Weird behaviour of the planner
On 2 August 2018 at 02:48, Guillaume Lelarge wrote: > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2; > >QUERY PLAN > - > Unique (cost=12005.97..12049.20 rows=1 width=1430) (actual > time=20055.294..20323.348 rows=1 loops=1) >-> Sort (cost=12005.97..12006.30 rows=132 width=1430) (actual > time=20055.290..20105.738 rows=6 loops=1) > Sort Key: (... 130 columns ...) > Sort Method: external sort Disk: 84464kB > -> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) > (actual time=0.109..114.142 rows=6 loops=1) > Planning time: 10.012 ms > Execution time: 20337.854 ms > (7 rows) > > That looks quite good. The total cost is 12049, so I expect this plan to > have the smaller cost as it's the choosen plan. Now, I'm disabling Sort, and > here is what I get: > > SET enable_sort TO off; > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2; > >QUERY PLAN > - > HashAggregate (cost=12044.22..12044.23 rows=1 width=1430) (actual > time=508.342..508.343 rows=1 loops=1) >Group Key: (... 130 columns ...) >-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual > time=0.036..57.088 rows=6 loops=1) > Planning time: 1.335 ms > Execution time: 508.529 ms > (5 rows) When the planner adds a new path it compares the cost not exactly, but with a 'fuzz factor'. It's very likely that the hashagg path did not make it as it was not fuzzily any cheaper than the unique path. By default, this fuzz factor is 1%. It seems in your case the costs don't quite match reality which is quite likely due to the poor row estimates on "gleu2". Has that table been analyzed recently? or is there some reason that auto-vacuum is not getting to it? There's a bit more reading of what I'm talking about in https://github.com/postgres/postgres/blob/master/src/backend/optimizer/util/pathnode.c#L141 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: ProjectSet
On 2 August 2018 at 21:17, Oliver Kohll wrote: > Is there an explanation of ProjectSet anywhere? Plan node types and what they each do are not very well documented outside of the source code. ProjectSet appears when the SELECT or ORDER BY clause of the query. They basically just execute the set-returning function(s) for each tuple until none of the functions return any more records. Simple Example: EXPLAIN SELECT generate_series(1,2); QUERY PLAN - ProjectSet (cost=0.00..5.02 rows=1000 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) (2 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: ProjectSet
On 3 August 2018 at 01:16, Oliver Kohll wrote: > Ah thanks David, at least I know what it is now. I don't think I have any > set returning functions though will double check in case there's a joined > view that has one. Perhaps it could be the something to do with cross > product which similarly creates multiple rows on the right for each row on > the left side. postgres=# select proretset from pg_proc where proname = 'regexp_matches'; proretset --- t t (2 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Instead trigger on a view to update base tables ?
In a view of three joined tables, I install a INSTEAD OF trigger fx on the view. The fx contains a list of felds/columns variable associated to each base tables. When an update operation occurs, I am successfully generating the target list of colums altered on Each base table. ( comparing OLD v NEW ) and attempting some dynamic sql generation in my trigger fx. I am taking the list of modified fields on the view, and attempting an update on appropriate base tables. In this sample case "language_preference" was modified on the view and should update the admn.user base table EXECUTE format(' UPDATE admin.user SET (%I) = ( SELECT %I FROM $1 ) WHERE id = $2)', USER_SETTING, USER_SETTING ) USING NEW, NEW.id; When this executes my exception handler generates "err syntax error at or near \"$1\" The formatted statement on my base table (admin.user ) that is throwing this is executing would be: UPDATE admin.user SET (language_preference) = ( SELECT language_preference FROM $1 ) WHERE id = $2)" Feel Like Im close but missing something fundamental. I also an update variant UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ] SET ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) Which I thought might be applicable. but still googling for sample implementation. Thanks for any guidance in this method or better methods to update the base tables. Regards Dave Day
'Identifier' columns
A couple of questions about auto-assigned identifier columns, forgive my ignorance, I'm used to other methods to create IDs... 1. If creating a new application [i.e. no "legacy" reasons to do anything] using PostgreSQL 10, when creating an "auto-assigned integer ID" column, what are the advantages/disadvantages of using the 'SERIAL' datatype [or equivalent explicitly created SEQUENCE w/ nextval() used as default for column] versus the SQL-standard 'integer GENERATED AS IDENTITY'? All other things being equal, it would seem a no-brainer to follow the standard. 2. When using the SQL-standard 'integer GENERATED AS IDENTITY' column, after inserting a column, what is the recommended method to find the ID of the just-inserted row? Is there no SQL-standard way? The docs seem to imply (without explicitly stating) that a SEQUENCE is used behind the scenes hence 'currval()' could be used, but I didn't see in the docs any mention of what the underlying sequence's name is, or how to specify a name. Perhaps 'lastval()' would work, but not in all cases and in any event it has a sloppy feel to me. Thank you in advance for any advice that can be offered. -- David
Re: During promotion, new master tries to archive same segment twice
Hi Phil, On 8/15/18 4:25 PM, Phil Endecott wrote: > > Questions: > > - Can all of this be blamed on the failure of the first attempt > to promote due to the wrong permissions on recovery.conf? Yes, it looks that way. > - Should my archive_command detect the case where it is asked to > write the same file again with the same contents, and report > success in that case? Yes. pgBackRest has done this for years and it saves a *lot* of headaches. > - Is this a bug? I don't think so. There are a number of cases where the same WAL segment can be pushed more than once, especially after failures where Postgres is not sure that the command completed. The archive command should handle this gracefully. Regards, -- -David da...@pgmasters.net
Re: During promotion, new master tries to archive same segment twice
On 8/16/18 4:37 AM, Phil Endecott wrote: > David Steele wrote: >> On 8/15/18 4:25 PM, Phil Endecott wrote: >>> - Should my archive_command detect the case where it is asked to >>> write the same file again with the same contents, and report success >>> in that case? >> >> Yes. > >> There are a number of cases where the same WAL >> segment can be pushed more than once, especially after failures where >> Postgres is not sure that the command completed. The archive command >> should handle this gracefully. > > Hmm, OK. Here's what the current docs say: > > Section 25.3.1: > > "The archive command should generally be designed to refuse to > overwrite any pre-existing archive file. This is an important > safety feature to preserve the integrity of your archive in case > of administrator error (such as sending the output of two > different servers to the same archive directory). > > It is advisable to test your proposed archive command to ensure > that it indeed does not overwrite an existing file, and that it > returns nonzero status in this case." > > And section 26.2.9: > > "When continuous WAL archiving is used in a standby, there > are two different scenarios: the WAL archive can be shared > between the primary and the standby, or the standby can > have its own WAL archive. When the standby has its own WAL > archive, set archive_mode to always, and the standby will call > the archive command for every WAL segment it receives, whether > it's by restoring from the archive or by streaming replication. > The shared archive can be handled similarly, but the > archive_command must test if the file being archived exists > already, and if the existing file has identical contents. > This requires more care in the archive_command, as it must be > careful to not overwrite an existing file with different contents, > but return success if the exactly same file is archived twice. > And all that must be done free of race conditions, if two > servers attempt to archive the same file at the same time." > > So you're saying that that's wrong, and that I must always > handle the case when the same WAL segment is written twice. Seems like an omission in section 25.3.1 rather than a problem in 26.2.9. Duplicate WAL is possible in *all* cases. A trivial example is that Postgres calls archive_command and it succeeds but an error happens (e.g. network) right before Postgres is notified. It will wait a bit and try the same WAL segment again. > I'll file a bug against the documentation. OK. >> pgBackRest has done this for years and it saves a *lot* of headaches. > > The system to which I am sending the WAL files is a rsync.net > account. I use it because of its reliability, but methods for > transferring files are limited largely to things like scp and > rsync. Rsync and scp are not good tools to use for backup because there is no guarantee of durability, i.e. the file is not synced to disk before success is returned. rsync.net may have durability guarantees but you should verify that with them. Even so, crafting a safe archive_command using these tools is going to be very tricky. Regards, -- -David da...@pgmasters.net
Can Pg somehow recognize/honor linux groups to control user access ?
Hi: The title says it all. I need to be control who can gain access to a DB based on a linux user group. I can set up a generic role and password, but also want to prevent users who are not in a specific linux group from accessing the DB. For code that works with the DB, this is easy (just chmod the group on the code file(s)). But is there a way to add an additional gauntlet that checks membership in the linux group if, for example, they were trying to get in using psql at the linux prompt ? There are a couple hundred in the linux group and the list of names changes constantly. I suppose creating a DB role per user in the linux group may be possible if something like a cron was maintaining this (creating/dropping uid based roles as the group membership changes) then give everyone the same password. But does that prevent someone outside the linux group from just logging in with someone else's uid and the generic password? I'm hoping that this is a common need and that someone has a good solution. Thanks in Advance for any help!
unorthodox use of PG for a customer
Hi Everyone: I'm going to throw this internal customer request out for ideas, even though I think it's a bit crazy. I'm on the brink of telling him it's impractical and/or inadvisable. But maybe someone has a solution. He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work. At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast. My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants. I can't think of anything better. Does anyone have any ideas? Thanks in Advance !
Re: unorthodox use of PG for a customer
I tried to convince him of the wisdom of one central DB. I'll try again. >>So are the 58 database(stores) on the workstation going to be working with data independent to each or is the data shared/synced between instances? No, 58 workstations, each with its own DB. There's a concept of a "workarea" (really a dir with a lot of stuff in it) where the script runs. He wants to tie all the runs for any one workarea together and is stuck on the idea that there should be a separate DB per workarea. I told him you could just stick all the data in the same table just with a "workarea" column to distinguish between the workareas. He likes the idea of a separate DB per workarea. He just doesn't gt it. >>I'm no expert, but I've dozens of PostgreSQL databases running mostly without manual maintenance for years. Ya, I've sort of had the same experience with PG DBs. Like the everready bunny, they just keep on running. But these workstations are pretty volatile as they keep overloading them and crash them. Of course any DB running would die too and have to be restarted/recovered. So the place for the DB is really elsewhere, on an external server that wouldn't be subject to this volatility and crashing. I told him about transactions and how you could prevent partial writing of data sets. So far, I'm not hearing of anything that looks like a solution given the constraints he's put on this. Don't get me wrong, he's a very smart and sharp software engineer. Very smart. But for some reason, he doesn't like the client/server DB model which would work so nicely here. I'm just trying to make sure I didn't miss some sort of solution, PG or not, that would work here. Thanks for your interest and input everyone ! On Fri, Aug 24, 2018 at 2:39 PM Edson Carlos Ericksson Richter < rich...@simkorp.com.br> wrote: > Em 24/08/2018 15:18, David Gauthier escreveu: > > Hi Everyone: > > > > I'm going to throw this internal customer request out for ideas, even > > though I think it's a bit crazy. I'm on the brink of telling him it's > > impractical and/or inadvisable. But maybe someone has a solution. > > > > He's writing a script/program that runs on a workstation and needs to > > write data to a DB. This process also sends work to a batch system on > > a server farm external to the workstation that will create multiple, > > parallel jobs/processes that also have to write to the DB as well. The > > workstation may have many of these jobs running at the same time. And > > there are 58 workstation which all have/use locally mounted disks for > > this work. > > > > At first blush, this is easy. Just create a DB on a server and have > > all those clients work with it. But he's also adamant about having > > the DB on the same server(s) that ran the script AND on the locally > > mounted disk. He said he doesn't want the overhead, dependencies and > > worries of anything like an external DB with a DBA, etc... . He also > > wants this to be fast. > > My first thought was SQLite. Apparently, they now have some sort of > > multiple, concurrent write ability. But there's no way those batch > > jobs on remote machines are going to be able to get at the locally > > mounted disk on the workstation. So I dismissed that idea. Then I > > thought about having 58 PG installs, one per workstation, each serving > > all the jobs pertaining to that workstation. That could work. But 58 > > DB instances ? If he didn't like the ideal of one DBA, 58 can't be > > good. Still, the DB would be on the workstation which seems to be > > what he wants. > > I can't think of anything better. Does anyone have any ideas? > > > > Thanks in Advance ! > > > > I'm no expert, but I've dozens of PostgreSQL databases running mostly > without manual maintenance for years, just do the backups, and you are > fine. > In any way, if you need any kind of maintenance, you can program it in > your app (even backup, restore and vacuum) - it is easy to throw > administrative commands thru the available interfaces. > And if the database get out of access, no matter if it is centralized or > remote: you will need someone phisically there to fix it. > AFAIK, you don't even PostgreSQL installer - you can run it embed if you > wish. > > Just my2c, > > Edson > > >
Re: Will there ever be support for Row Level Security on Materialized Views?
On 28 August 2018 at 09:58, Ken Tanzer wrote: > On Tue, Aug 14, 2018 at 5:24 PM Ken Tanzer wrote: >> Basically, I'm wondering whether materialized views are likely to ever >> support row-level security. > > Hi. Just wanted to follow up on my own email and see if anyone could answer > this. You might think there's some master project-wide list of things that are to implement that we all work from, but there's not. There is a todo list [1] that might serve as some sort of guide for a new person who wishes to contribute something, who's looking for inspiration as to what to contribute, but I don't think that list ever goes much beyond that. New features normally appear for one of two reasons: 1) A developer gets inspired to make something happen; or 2) Someone pays a developer to make something happen. So I guess in this case either 1 or 2 has not happened, or they have but the correct people have either not seen your email or they just simply don't want to answer. If you're desperate for the feature, I'd recommend looking into the possibility of going with 1 or 2. [1] https://wiki.postgresql.org/wiki/Todo
Re: Erroneous behavior of primary key
On 28 August 2018 at 07:50, Daniel J Peacock wrote: > I've got an odd situation with a table that has a varchar(255) as the > primary key that is getting key values from an Elasticsearch engine. What > I'm finding is that even though there is a primary key on the table, I am > getting duplicated keys in the data and the constraint is not blocking > these. How do you know they're duplicated? > When I do a "select ,count(*) from group by > having count(*) > 1" I get no results. Yet, when I search the > table for a value that is like a key I know to be duplicated, I get multiple > results. When I select from the table where field is equal to the > duplicated field I get one result. I verified that they are distinct row > with ctid. I also created a clone of the table with CTAS and then tried to > create a unique index on the id varchar field but that failed with > "duplicate keys found". I'm stumped as to what could be the problem. > The only thing that I can think of is that the primary key is somehow > corrupt. I've noticed this behavior on other tables on this database. > What could be causing this sort of problem? If the index is corrupt then you might find that: set enable_indexscan = 0; set enable_indexonlyscan = 0; select ,count(*) from group by having count(*) > 1; would return some rows. You should also verify the above query does use a Seq Scan by performing an EXPLAIN on the query. There are issues that have been fixed in previous releases which could have caused an index to get corrupted in this way, so it's quite useful to know which version of PostgreSQL you're running here and if you've paid attention to the release notes when you've previously upgraded. For example, see [1]. [1] https://www.postgresql.org/docs/9.6/static/release-9-6-2.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Will there ever be support for Row Level Security on Materialized Views?
On 28 August 2018 at 11:06, Ken Tanzer wrote: > Or perhaps they "simply don't want to answer." Well I wondered about that > too, since the last person who asked about this didn't get a response either. > Is it a particularly stupid question? Is there some history that makes it a > touchy subject? Or is it just the Bermuda Triangle of this mailing list? :) I can't imagine why it would be a touchy subject. Generally, if you're working on a feature its best not to keep it a secret as if someone else does the same, then you end up with redundant work being done. If I had to guess what's going on here then I'd say that nobody has been sufficiently motivated to work on this yet. If that's the case, everyone who reads your email is not the person working on this feature, so can't answer your question. I just answered to suggest the reasons why you might not be getting an answer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Autovacuum degrades all other operations by keeping all buffers dirty?
Hello, We've been struggling with some major performance issues related to autovacuum. (I know this is a common problem.) For a while, we believed this was primarily related to I/O contention, but recent observations from our monitoring make me wonder if there's a deeper issue here, so I'm looking for some better understanding. >From reading the 9.6.3 source, it looks like the autovacuum process itself is single-threaded, and it reads pages essentially linearly from the relation (possibly skipping some). When the autovacuum process needs to modify a page, it doesn't write it directly, but rather marks the buffer dirty. The page will be written later, either by the checkpointer (according to its configuration, in terms of time and WAL), the bgwriter (according to its configuration, in terms of write count and sleep time), or else some other backend process that requires a free buffer (if the count of non-dirty buffers reaches zero). Is this accurate? In our case, we found that when autovacuum runs, the number of dirty buffers written by regular backends shoots from a handful to as much as 300 buffers per second. (We have 200 backends on most databases.) More specifically: prior to recent autovacuums starting, databases did under 1000 buffer allocations per second. Most of those were done by the checkpointer. Individual backends did just a handful. After autovacuum started, buffer allocations rose to between 2,000 and 8,000 per second. The checkpointer handled many of these, but so did individual backends (see above). The bgwriter peaked around 25 buffer writes per second. So it seems like the spike in buffers written by normal backends could explain the significant degradation that we see in average latency and overall throughput (which can be as much as 50%). It looks to me like the autovacuum process is effectively generating work (in the form of async writes) that's being distributed implicitly to the various backend processes, creating latency for any other query that happens to require a buffer (including read-only queries). Maybe we can improve this by tuning the bgwriter. But if that's single-threaded, presumably there's an upper bound to how many buffer writes it can handle? Is there anything to guarantee that the bgwriter will keep up with the work that's being generated? That is, for some workloads, wouldn't it be possible that autovacuum could always generate work faster than the bgwriter can do it, and you'd always have some pretty major degradation to all other queries? I've drawn a lot of inferences here, and I'm sure there are some mistakes. I'd appreciate any clarification, correction, or confirmation! Thanks in advance, Dave
Re: Barman versus pgBackRest
On 9/4/18 11:24 AM, Joshua D. Drake wrote: > On 09/04/2018 07:52 AM, Ron wrote: >> On 09/04/2018 09:24 AM, Joshua D. Drake wrote: >>> On 09/04/2018 07:14 AM, Ron wrote: >>>> >>>> That was about barman, in the barman group. This is asking about >>>> pgbackrest... :) >>>> >>>> So: does pgbackrest have this ability which barman does not have? >>>> The "--db-include" option seems to indicate that you can restore a >>>> single db, but does indicate whether or not you can rename it. >>> >>> https://pgbackrest.org/configuration.html#section-restore/option-db-include >>> >>> >> >> Which implies that you can't do it? > > You can restore a single database and then issue a simple ALTER DATABASE > command to change the DB name. This will work, but I don't think it's what Ron is getting at. To be clear, it is not possible to restore a database into an *existing* cluster using pgBackRest selective restore. This is a limitation of PostgreSQL file-level backups. To do what Ron wants you would need to restore it to a new cluster, then use pg_dump to logically dump and restore it to whatever cluster you want it in. This still saves time since there is less to restore but is obviously not ideal. Regards, -- -David da...@pgmasters.net
Re: Barman versus pgBackRest
On 9/4/18 11:55 AM, Ron wrote: > On 09/04/2018 10:51 AM, David Steele wrote: > [snip] >> This will work, but I don't think it's what Ron is getting at. >> >> To be clear, it is not possible to restore a database into an *existing* >> cluster using pgBackRest selective restore. This is a limitation of >> PostgreSQL file-level backups. >> >> To do what Ron wants you would need to restore it to a new cluster, then >> use pg_dump to logically dump and restore it to whatever cluster you >> want it in. This still saves time since there is less to restore but is >> obviously not ideal. > > That's exactly what I'm referring to. > > Presumably I could restore it to a new cluster on the same VM via initdb > on a different port and PGDATA directory? Definitely. No need to initdb since all the required files will be restored by pgBackRest. You'll just need to create an empty directory to restore into. Regards, -- -David da...@pgmasters.net
Re: Barman versus pgBackRest
Hi Ron, On 9/4/18 7:41 AM, Ron wrote: > On 03/09/2018 08:56 AM, David Steele wrote: > [snip] >>> About pgBarman, I like : >>> - be able restore on a remote server from the backup server >> This a good feature, and one that has been requested for pgBackRest. You >> can do this fairly trivially with ssh, however, so it generally hasn't >> been a big deal for people. Is there a particular reason you need this >> feature? > > (Sorry to dredge up this old thread.) > > Do you just change the IP address of the "restore target"? [I'll assume you wanted to hear about pgBackRest here since we discussed it down thread.] Generally restores are done from the database server, but if you want to run a restore from the backup server you can run it via ssh: ssh user@pg-server pgbackrest [...] > >>> - use replication slots for backingup wal on the backup server. >> Another good feature. We have not added it yet because pgBackRest was >> originally written for very high-volume clusters (100K+ WAL per day) and >> our parallel async feature answers that need much better. We recommend >> a replicated standby for more update-to-date data. > > Every N minutes you copy the WAL files to the backup server? > > -- -David da...@pgmasters.net
Re: Barman versus pgBackRest
On 9/4/18 7:41 AM, Ron wrote: > On 03/09/2018 08:56 AM, David Steele wrote: > >>> - use replication slots for backingup wal on the backup server. >> Another good feature. We have not added it yet because pgBackRest was >> originally written for very high-volume clusters (100K+ WAL per day) and >> our parallel async feature answers that need much better. We recommend >> a replicated standby for more update-to-date data. > > Every N minutes you copy the WAL files to the backup server? [Accidentally hit send on the previous post, here's the rest...] WAL segments are transferred whenever Postgres indicates that a segment is finished via the archive_command. Async archiving "looks ahead" to find WAL segments that are ready to archive. You can use archive_timeout to force Postgres to push a WAL segment every N seconds for clusters that have idle time. Regards, -- -David da...@pgmasters.net
Re: Autovacuum degrades all other operations by keeping all buffers dirty?
On Fri, Aug 31, 2018 at 3:50 PM, Andres Freund wrote: > On 2018-08-31 19:31:47 -0300, Alvaro Herrera wrote: > > On 2018-Aug-31, David Pacheco wrote: > > > > > From reading the 9.6.3 source, it looks like the autovacuum process > > > itself is single-threaded, and it reads pages essentially linearly > > > from the relation (possibly skipping some). When the autovacuum > > > process needs to modify a page, it doesn't write it directly, but > > > rather marks the buffer dirty. The page will be written later, > > > > Unless there's some bug, there is a BufferAccessStrategy that only lets > > a few dozen buffers go unwritten before the autovac worker process > > itself is forced to write some. > > I've not re-checked, but I'm not sure that's true if the buffer is > already in s_b, which it'll be for many workloads. > Does that mean this analysis from above is accurate? It looks to me like the autovacuum process is effectively generating work > (in > the form of async writes) that's being distributed implicitly to the > various > backend processes, creating latency for any other query that happens to > require > a buffer (including read-only queries). Thanks, Dave
nested query problem
Hi: I'm having trouble with this query... select sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime from sqf_runs sr, (select perl_sub_name, end_datetime from flow_step_events_view where sqf_id = sr.sqf_id order by 2 limit 1) fse where sr.userid='foo'; ERROR: invalid reference to FROM-clause entry for table "sr" LINE 4: ...atetime from flow_step_events_view where sqf_id = sr.sqf_id ... ^ HINT: There is an entry for table "sr", but it cannot be referenced from this part of the query. If this is indeed a foul, how can I accomplish the same thing ? Thanks in Advance !
Re: nested query problem
Not quite. This returns one value. In the actual "sqf_runs" table, there are many records with user_id = 'foo'. I want one line for each where the fse.p-erl_sub_name and fse.end_datetime values are the latest values found in the flow_step_events_view view where the sqf_ids match. On Thu, Sep 6, 2018 at 5:10 PM Paul Jungwirth wrote: > On 09/06/2018 01:59 PM, David Gauthier wrote: > > I'm having trouble with this query... > > > > select sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime > >from > > sqf_runs sr, > > (select perl_sub_name, end_datetime from flow_step_events_view > > where sqf_id = sr.sqf_id order by 2 limit 1) fse > >where sr.userid='foo'; > > > > ERROR: invalid reference to FROM-clause entry for table "sr" > > LINE 4: ...atetime from flow_step_events_view where sqf_id = sr.sqf_id > ... > > ^ > > HINT: There is an entry for table "sr", but it cannot be referenced > > from this part of the query. > > This calls for a lateral join: > > SELECT sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime > FROMsqf_runs sr > LEFT OUTER JOIN LATERAL ( >SELECT perl_sub_name, end_datetime >FROMflow_step_events_view fsev >WHERE fsev.sqf_id = sr.sqf_id >ORDER BY 2 >LIMIT 1 > ) fse > ON true > WHERE sr.userid = 'foo' > ; > > It's nearly what you had already, but `LATERAL` lets the subquery > reference columns in the other tables. > > A lateral join is conceptually a lot like running your subquery in for > loop, looping over all the rows produced by the rest of the query. It > doesn't have to produce 1 row for each iteration, but saying `LIMIT 1` > ensures that here. > > The `ON true` is just pro forma because you can't have a join without an > `ON` clause. > > You might prefer an INNER JOIN LATERAL, depending on your needs. > > -- > Paul ~{:-) > p...@illuminatedcomputing.com > >
Re: nested query problem
Wow, I take that back. I thought there were many recs with "foo" but there wa sonly one. When I ran this against a value that actually had multiple records, it ran fine. Sorry for that. And Thanks for this query ! On Thu, Sep 6, 2018 at 5:15 PM David Gauthier wrote: > Not quite. This returns one value. In the actual "sqf_runs" table, there > are many records with user_id = 'foo'. I want one line for each where the > fse.p-erl_sub_name and fse.end_datetime values are the latest values found > in the flow_step_events_view view where the sqf_ids match. > > On Thu, Sep 6, 2018 at 5:10 PM Paul Jungwirth > wrote: > >> On 09/06/2018 01:59 PM, David Gauthier wrote: >> > I'm having trouble with this query... >> > >> > select sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime >> >from >> > sqf_runs sr, >> > (select perl_sub_name, end_datetime from flow_step_events_view >> > where sqf_id = sr.sqf_id order by 2 limit 1) fse >> >where sr.userid='foo'; >> > >> > ERROR: invalid reference to FROM-clause entry for table "sr" >> > LINE 4: ...atetime from flow_step_events_view where sqf_id = sr.sqf_id >> ... >> > ^ >> > HINT: There is an entry for table "sr", but it cannot be referenced >> > from this part of the query. >> >> This calls for a lateral join: >> >> SELECT sr.sqf_id, sr.status, fse.perl_sub_name, fse.end_datetime >> FROMsqf_runs sr >> LEFT OUTER JOIN LATERAL ( >>SELECT perl_sub_name, end_datetime >>FROMflow_step_events_view fsev >>WHERE fsev.sqf_id = sr.sqf_id >>ORDER BY 2 >>LIMIT 1 >> ) fse >> ON true >> WHERE sr.userid = 'foo' >> ; >> >> It's nearly what you had already, but `LATERAL` lets the subquery >> reference columns in the other tables. >> >> A lateral join is conceptually a lot like running your subquery in for >> loop, looping over all the rows produced by the rest of the query. It >> doesn't have to produce 1 row for each iteration, but saying `LIMIT 1` >> ensures that here. >> >> The `ON true` is just pro forma because you can't have a join without an >> `ON` clause. >> >> You might prefer an INNER JOIN LATERAL, depending on your needs. >> >> -- >> Paul ~{:-) >> p...@illuminatedcomputing.com >> >>
Re: pgbackrest when data/base is symlinked to another volume
Hi Ron, On 9/6/18 11:21 PM, Ron wrote: Will pgbackrest properly backup and restore the cluster if data/base, data/pg_xlog and data/pg_log are symlinks? PGDATA=/var/lib/pgsql/9.6/data $PGDATA/base -> /Database/9.6/base $PGDATA/pg_log -> /Database/9.6/pg_log $PGDATA/pg_xlog -> /Database/9.6/pg_xlog Yes, this will work. Note that restore does not recreate symlinks by default so you'll need to specify --link-all to enable symlink creation. See https://pgbackrest.org/configuration.html#section-restore/option-link-all for details. Using symlinks in this way will make management of your clusters more difficult, mostly because systems need more provisioning before restores can be performed. In general I'd recommend against it unless there are performance considerations. Also, you might consider using log_directory to relocate log files rather than a symlink. This will exclude log files from your backup which is usually preferable -- primary logs restored to a standby are out of context and can cause confusion. Regards, -- -David da...@pgmasters.net
Re: pgbackrest when data/base is symlinked to another volume
On 9/7/18 8:47 PM, Ron wrote: On 09/07/2018 05:22 PM, David Steele wrote: On 9/6/18 11:21 PM, Ron wrote: Will pgbackrest properly backup and restore the cluster if data/base, data/pg_xlog and data/pg_log are symlinks? PGDATA=/var/lib/pgsql/9.6/data $PGDATA/base -> /Database/9.6/base $PGDATA/pg_log -> /Database/9.6/pg_log $PGDATA/pg_xlog -> /Database/9.6/pg_xlog Yes, this will work. Note that restore does not recreate symlinks by default so you'll need to specify --link-all to enable symlink creation. See https://pgbackrest.org/configuration.html#section-restore/option-link-all for details. Using symlinks in this way will make management of your clusters more difficult, mostly because systems need more provisioning before restores can be performed. In general I'd recommend against it unless there are performance considerations. Now that I'm thinking more about what you wrote... "data" isn't on it's own partition. data/*base* has it's own partition. What's the recommended method for putting *base**/* on a partition different from data/? Or is that not recommended? All the user data goes in base so there's really no need to separate it out of data. Typically pg_wal and tablespaces are relocated onto different devices for performance (or to get more space). If the partitions are on the same device then there's no performance benefit, just admin hassle. -- -David da...@pgmasters.net
Re: Volume partitioning (was Re: pgbackrest when data/base is symlinked to another volume)
Hi Ron On 9/8/18 4:26 PM, Ron wrote: Googled "postgresql disk partitioning" and "postgresql volume partitioning" without much success. Is the best practice volume partitioning: /Database/9.6/data /Database/9.6/data/pg_log /Database/9.6/data/pg_xlog where /var/lib/pgsql/9.6 (on RHEL6) is a symlink to /Database/9.6/data and PGDATA=/Database/9.6/data *or * /Database/9.6/data/base /Database/9.6/data/pg_log /Database/9.6/data/pg_xlog where PGDATA=/var/lib/pgsql/9.6/data and base, pg_log and px_xlog are symlinks to the partitions? This is really a matter of preference, but simpler is better, so the first one looks better to me. In general I prefer to keep everything in /var/lib/pgsql/9.6: /var/lib/pgsql/9.6/data /var/lib/pgsql/9.6/data/pg_xlog -> /var/lib/pgsql/9.6/wal Then use the log_directory setting to put logs in: /var/log/pgsql or /var/lib/pgsql/9.6/log pgBackRest will expect to be pointed at a real directory (pg-path) and expect the data_directory in PostgreSQL to match. Regards, -- -David da...@pgmasters.net
Re: Code of Conduct plan
On 9/14/18 7:52 AM, James Keener wrote: I fail to see how that makes everyone here part of a community anymore than I'm part of the "community" of regulars at a bar I walk into for the first time. As I said, the rules can and should apply within the list, but applying them outside the list is odd and wreaks of authoritarianism. Jim In the 20 years I've been using PG, I've not noted any bizarre "list speech" except this discussion that suggests others should monitor people's behavior wherever they are, and report any "infraction" to PG, so PG can boot them. I'm with those who think that idea is diametrically opposed to open source's freedom. What next, monitor what apps people are using their DB for and decide if the "community" approves of its character or not? David
Convert interval to hours
Hi: In perl/DBI, I have code that's getting me an "age" which returns something like... "-17 days -08:29:35". How can I convert that to a number of hours (as a float I would presume) ? Thanks
Re: Setting up continuous archiving
On 9/26/18 8:20 AM, Yuri Kanivetsky wrote: I'm trying to compile a basic set of instruction needed to set up continuous archiving and to recover from a backup. I'm running PostgreSQL 9.3 on Debian Stretch system. This is an incredibly complex topic and it is very difficult to implement correctly. My advice is that you use a mature backup solution like pgBackRest or barman rather than rolling your own. -- -David da...@pgmasters.net
How can I get and handle the status of sql statements that run in plpgsql ?
Hi: psql (9.6.7, server 9.5.2) on linux How does one get the status of an sql statement executed in plpgsql? If that status is cryptic, how can that be translated to something which someone could understand? Finally, how can I effectively do a start transaction and either rollback or commit based on the results of the sql statements run? create or replace function xfer_savings_to_checking(acct_no text, howmuch float) returns text as $$ declare savings_balance float; checking_balance float; begin select balance into savings_balance from savings_acct_info where acct = acct_no; sql_status = if(something_went_wrong) then raise exception 'select savings statement was bad "%"',sql_status; return 'error'; end if; select balance into checking_balance from checking_acct_info where acct = acct_no; sql_status = if(something_went_wrong) then raise exception 'select checking statement was bad "%"',sql_status; return 'error'; end if; if(howmuch > saving_balance) then raise notice 'Hey, you dont have that much to xfer ! You only have %',savings_balance; return 'error'; end if; start transaction; update savings_acct_info set balance = balance - howmuch where acct = acct_no; sql_status = if(something_went_wrong) then raise exception 'updating savings acct "%"',sql_status; rollback; return 'error'; end if; update checking_acct_info set balance = balance + howmuch where acct = acct_no; sql_status = if(something_went_wrong) then raise exception 'updating checking acct "%"',sql_status; rollback; return 'error'; end if; commit; end; $$ language plpgsql; Of course I don't know what the and "something_went_wrong" pieces look like, or they even make sense with how this sort of thing shold be properly handled in plpgsql. Also, in my trials, it appears that plpgsql doesn't like "start transaction". So how is that piece done ? Thanks in Advance for any help !
Re: How can I get and handle the status of sql statements that run in plpgsql ?
Thanks Adrian and Christopher ! So the transaction will be rolled back automatically if there's a problem. Got it ! Question: How do I detect when it is appropriate to raise notice so as to be able to see the error message? Or will that automatically be sent to stdout if an error occurs? Question: Also, how can I detect how many records were operated on by, say, an update statement? In the example I gave, I would want to make sure that the transfer amount was deducted from the savings AND that the amount was added to the checking. Both need to be 1, else I want to... 1) send a message (raise notice) 2) rollback somehow. On Tue, Oct 2, 2018 at 5:27 PM Adrian Klaver wrote: > On 10/2/18 1:47 PM, David Gauthier wrote: > > Hi: > > psql (9.6.7, server 9.5.2) on linux > > > > How does one get the status of an sql statement executed in plpgsql? If > > that status is cryptic, how can that be translated to something which > > someone could understand? Finally, how can I effectively do a start > > transaction and either rollback or commit based on the results of the > > sql statements run? > > > > > > > Of course I don't know what the and > > "something_went_wrong" pieces look like, or they even make sense with > > how this sort of thing shold be properly handled in plpgsql. Also, in > > The below(read to bottom of the page) might help: > > > https://www.postgresql.org/docs/10/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING > > > > my trials, it appears that plpgsql doesn't like "start transaction". So > > how is that piece done ? > > > > Thanks in Advance for any help ! > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: How can I get and handle the status of sql statements that run in plpgsql ?
I found "no_data" here... https://www.postgresql.org/docs/10/static/errcodes-appendix.html update blah, blah... if(no_data) then raise exception "update failed to update anything"; end if Is that how you detect if nothing was updated and how to make sure the thing returns and rolls back ? On Wed, Oct 3, 2018 at 11:46 AM David Gauthier wrote: > Thanks Adrian and Christopher ! > > So the transaction will be rolled back automatically if there's a > problem. Got it ! > > Question: How do I detect when it is appropriate to raise notice so as to > be able to see the error message? Or will that automatically be sent to > stdout if an error occurs? > > Question: Also, how can I detect how many records were operated on by, > say, an update statement? In the example I gave, I would want to make sure > that the transfer amount was deducted from the savings AND that the amount > was added to the checking. Both need to be 1, else I want to... > 1) send a message (raise notice) > 2) rollback somehow. > > > > > On Tue, Oct 2, 2018 at 5:27 PM Adrian Klaver > wrote: > >> On 10/2/18 1:47 PM, David Gauthier wrote: >> > Hi: >> > psql (9.6.7, server 9.5.2) on linux >> > >> > How does one get the status of an sql statement executed in plpgsql? >> If >> > that status is cryptic, how can that be translated to something which >> > someone could understand? Finally, how can I effectively do a start >> > transaction and either rollback or commit based on the results of the >> > sql statements run? >> > >> >> > >> > Of course I don't know what the and >> > "something_went_wrong" pieces look like, or they even make sense with >> > how this sort of thing shold be properly handled in plpgsql. Also, in >> >> The below(read to bottom of the page) might help: >> >> >> https://www.postgresql.org/docs/10/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING >> >> >> > my trials, it appears that plpgsql doesn't like "start transaction". >> So >> > how is that piece done ? >> > >> > Thanks in Advance for any help ! >> > >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> >
Re: Postgres 11, partitioning with a custom hash function
On 4 October 2018 at 16:22, Harry B wrote: > I am still having trouble reconciling what happens under the HASH > partitioning!. If I have text column forming the basis of PARTITIONED BY > HASH, the HASH value used in the partitioning setup does not seem to match > to `hashtext()` of that value It won't match. The hash partition hash is seeded with a special const (HASH_PARTITION_SEED) see [1]. You could likely roll your own hash ops. See [2] for an example. This can then be used to create a hash partitioned table like [3]. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/partitioning/partbounds.c#l2056 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/insert.sql#l241 [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/regress/sql/hash_part.sql#l10 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres 11, partitioning with a custom hash function
On 5 October 2018 at 06:18, Harry B wrote: > > Thank you David! These helped me create an operator class. > However, there still seems to be a 'off-by-a-fixed-N' difference between the > hash value returned and how PG selects the partition. hmm, actually, this is probably due to the hash_combine64() call in compute_partition_hash_value(). This seems to combine the hash value with 0 regardless of if there's another partition key column to hash. If you could somehow do the reverse of what hash_combine64() will do to you hash before returning it to the function then you might get somewhere, but that does not look possible since it appears to be throwing away some bits. It may not have been a great choice to decide to have compute_partition_hash_value() do this unnecessary combine, but it's likely a few months too late to change that now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres 11, partitioning with a custom hash function
On 5 October 2018 at 09:43, Harry B wrote: > Now the big question: How scared should I be relying on this? I don't mind > it breaking on major version upgrades (which would mean I need to dump & > restore my entire set), but how likely is it to change unannounced in a > minor/security release? Unless of course, you break it in a way that makes > custom-hash function impossible. I don't see how we could possibly change it once v11 is out the door. Such a change would break pg_upgrade and I imagine we want that to work for a long time to come yet, at least until there is some other reason that is worthy of breaking it. The bar is likely set pretty high for that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Advice on logging strategy
On 10/11/18 11:26 AM, Mike Martin wrote: This results in enormous log files which take ages to import using copy becuase each execute statement is logged with the parameters chosen Is there any way around this? I cant find any way to filter dml statements pgAudit (https://github.com/pgaudit/pgaudit) gives you fine-grain control over what is logged by command type, table, or user as well as a lot more detail. -- -David da...@pgmasters.net
Re: Setting up continuous archiving
On 10/15/18 5:09 PM, Benoit Lobréau wrote: By the way, do/can they both use streaming to receive WAL records? Or streaming is only for standby servers. For backups you have only file-based log shipping? barman supports streaming but it's not as magical as one might think. See pgbarman's documentation for how to manager .partial files. pgbackrest archives only wal files when postgres uses the archive_command. You also have the option to do paralllel async wal push/get. It can be useful if you write wals quicker than you can archive them or if you want to restore more quickly. We have focused on archive_command because the performance is much better because it can be parallelized. Then, I suppose they both don't support partial PITR (http://docs.pgbarman.org/release/2.4/#scope), where there are standalone backups that extends to points in time for which there are no WAL files. I'm not sure if this matters, but I assume that it might be effective in terms of disk space. It's a hot backup so you have to have wals files so that your backup is consistent at the end of the backup. You can build something like what you describe with pgbackrest, archive-copy and a copy of the backup directory to another place. The --repo1-retention-archive-type and --repo1-retention-archive options allow you do keep WAL for a smaller number of backups in order to save space. https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive So, for example: [global] repo1-retention-full=4 # The following option is the default but included for clarity repo1-retention-archive-type=full repo1-retention-archive=2 This configuration will retain 4 full backups but only keep PITR WAL for 2 of them, i.e. the WAL generated between backups. The WAL required to make a backup consistent is always retained so the 2 older backups can be played to consistency but no further. I'd like to be able to handle db failure as fast as possible. Ideally, automatically. Which probably means either repmgr, or corosync + pacemaker + PAF. Is that what you mean by HA here? yes. You can also look into patrony for an alternative. It's a matter of preference and requirements. pgBackRest certainly works with Pacemaker/Corosync and Patroni. If your standby is lagging too far behind (and you dont use slots) you can use pgbackrest's archive to fill the gap in wal files and catchup with the master. Recovering WAL out of the archive is safer than using slots. Since the WAL is transferred over the network compressed it can also save a lot of bandwidth. Regards, -- -David da...@pgmasters.net
Re: Help with list partitioning on expression
On 19 October 2018 at 02:49, Dinko Papak wrote: > I have created table partitioned by list on expression using timestamp > column, so for each timestamp value function returns int and each partition > table is for single value (range of timestamp for which function result is > the same). This helps me to simplify querying as I do not need to always > write date ranges. Querying (partition pruning) works nice, however when I > attach new partition it seems to always scan whole table, although I do have > necessary check constraint on partitioned table. I have tried to make > timestamp column both null and not null. Also, it takes longer to attach > partition then to add constraint itself although per my understanding those > 2 operations should do the same scan. It's not all that obvious, but if you have PARTITION BY LIST (extract(minute FROM ts)) and try to attach a partition like: CREATE TABLE mypartition PARTITION OF parted FOR VALUES IN (1); then the partition constraint is actually (extract(minute FROM ts) IS NOT NULL AND extract(minute FROM ts) = 1). If your CHECK constraint is just checking `extract(minute FROM ts) = 1` then the current code in PartConstraintImpliedByRelConstraint() is not smart enough to know that `extract(minute FROM ts) = 1` is strict and cannot match nulls. Perhaps that could be improved, but that's how it is today. Likely you'll have better luck with a check constraint that explicitly checks the function IS NOT NULL. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Aggregated join vs. aggregate in column?
On Fri, 12 Jul 2019 at 19:32, Durumdara wrote: > 2.) I may relocate this section as join... > > select > request.*, s.max_s_date > from request > left join > ( > select schedule.product_id, max(s_date) as max_s_date from schedule > where schedule.ok = True > group by schedule.product_id > ) s on (s.product_id = request.product_id) > ... > > But I really don't know what the hell will happen in query optimizer with > this method. > > a.) > Optimizer is clever, and it calculates the aggregates only in the needed rows. > So it find the request.product_id-s, and execute the "s" query only in these > rows. > > b.) > Or it isn't enough wise, it executes the "s" subquery on whole schedule, and > later joins to main table. > The schedule table is big in customers' database, so this is worst case. :-( I'm afraid for the particular query above, the answer is closer to b) However, that's only going to be a problem if there are many more distinct product_id records in "schedule". If you were to add a WHERE clause to the outer query that did WHERE request.product_id = X, then that qual would be pushed down into the subquery. This qual pushing only works for equality. So if you changed out WHERE request.product_id = X to WHERE request.product_id IN(X,Y); then that wouldn't push the qual to the subquery. > I asked this because sometimes I need to get more result columns in the > select, but I couldn't retreive more... > > ( > select max(s_date) as s_date from schedule > where schedule.product_id = request.product_id and schedule.ok = True > ) as max_s_date, <=== only one column > > So sometimes the join is better. > > But if the optimizer isn't enough wise, I can get these values only "WITH" > queries (select the main rows to temp, run subselects with only these > records, return the mix of main and subselects in one query). If you don't want to repeat the same subquery in the SELECT list then you could perform a CROSS JOIN LATERAL. For example: select request.*, s.max_s_date, s.max_s1_date from request cross join lateral ( select max(s_date) as max_s_date, max(s1_date) as max_s1_date from schedule where schedule.ok = True and s.product_id = request.product_id) s; In this case, the subquery will be executed once per output row, so if you have some restrictive WHERE clause on the outer query then the subquery will be executed fewer times. With a bit of training, you should be able to see what the query planner has done for yourself by using the EXPLAIN command: https://www.postgresql.org/docs/current/sql-explain.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: PGbackrest capacity related question
On 7/22/19 5:08 PM, chiru r wrote: Hi All, I am using pgbackrest on my production workloads and it is working as expected. However our DB is growing more and more . We have currently two disks and each one is 4 TB size. However 3.8TB size data got loaded into DB currently and our /pgback mount point size is 4TB. So if the DB size grows more then 4TB, how to add another file system to store backup under "repo-path"?. pgBackRest does not have multi-volume support because there are a number of ways to accomplish this using ZFS, Linux volumes, etc. You'll need the expand the storage for the repo using one of these methods or reduce your retention settings. Regards, -- -David da...@pgmasters.net
Re: lead() with arrays - strange behaviour
On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer wrote: > The following statement tries to find the overlapping values in id_list > between the current row and the next row: > > select id, >id_list, >lead(id_list) over (order by id) as next_list, >array(select unnest(id_list) intersect select unnest(lead(id_list) > over (order by id))) as common_ids > from sample_data; > > The above returns: > > id | id_list | next_list | common_ids > ---+-+---+--- > 1 | {1,2,3} | {2,3,4} | {} > 2 | {2,3,4} | {4,5,6} | {} > 3 | {4,5,6} | | {} > > The empty array for "common_ids" is obviously incorrect. I think you're confused with what the SELECT with the empty FROM clause does here. In your subquery "id_list" is just a parameter from the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to return anything since those are both just effectively scalar values, to which there is no "next" value. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: FW: Undelivered Mail Returned to Sender
On Sun, 11 Aug 2019 at 06:53, stan wrote: > > I apologize for asking, what I suspect will turn out to be a newbie > question, but I have managed to get myself quite confused on this. > > I am defining a view as follows > > > CREATE OR REPLACE view purchase_view as > select > project.proj_no , > qty , > mfg_part.mfg_part_no , > mfg.name as m_name , > mfg_part.descrip as description , > ( > SELECT > name > FROM > vendor > WHERE > bom_item.vendor_key = > ( > SELECT > vendor_key > FROM > mfg_vendor_relationship > WHERE > bom_item.mfg_key = mfg_key > AND > prefered = TRUE > AND > bom_item.project_key = project_key > > ) > ) > as v_name , > ERROR: more than one row returned by a subquery used as an expression > > Can someone please enlighten me as to the error of my ways? Looks to me like your WHERE clause is wrong in the subquery. "WHERE bom_item.vendor_key =" surely that should be just "WHERE vendor_key = " (assuming that's the primary key column of the vendor table). Also, you've mentioned you've only a single record in the mfg_vendor_relationship, so the error can't be due to multiple records matching in the mfg_vendor_relationship table. However, given the unique constraint on that table includes 3 columns and you're just filtering on 2 of them, then it would only take some rows in there with the same mfg_key and project_key values but a different vendor_key to get the same error from that part of the query. If that shouldn't happen, then perhaps your UNIQUE constraint should not contain the vendor_key column. You'd have to explain what you need in more detail for someone to be able to help you fix that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Transaction state on connection Idle/Open/Failed
In JDBC (latest version), I'm using the org.postgresql.jdbc.PgConnection.getTransactionState() that returns an enumeration of IDLE, OPEN or FAILED. I am familiar with IDLE, meaning the connection has a new transaction started, but isn't doing anything. We think of this as the "normal" state for our connections. How does OPEN compare to IDLE? If we stop using a connection that's in the OPEN state, is something wrong? That is, does it have pending updates that require a commit/rollback? Same for FAILED. If we no longer need a connection in the FAILED state, is something wrong? Does it have pending updates that require a commit/rollback (assuming in this state rollback makes more sense). I'm really trying to find out what it means for the connection to be in OPEN or FAILED states compared to IDLE. Thanks, David
Rename a column if not already renamed.?
I have a function used by my patch files which attempts to determine if the old column name exists, if so it proceeds with the rename. CREATE OR REPLACE FUNCTION sys.rename_column(schema_table_ regclass, old_name_ text, new_name text) RETURNS boolean LANGUAGE plpgsql AS $function$ BEGIN IF EXISTS(SELECT 1 FROM pg_attribute WHERE attrelid = schema_table_ AND attname = old_name_ AND NOT attisdropped) THEN EXECUTE format ('ALTER TABLE %s RENAME %s TO %s', schema_table_, old_name_, new_name); RETURN TRUE; ELSE RETURN FALSE; END IF; END; $function$; This seems to function correctly except. If I am upgrading a site from Postgres 9.6 to version 11.3 and the column has already been renamed in the 9.6 branch the above existence test fails and an exception is generated. It all seems to work correctly if I repeat this same patch in the 11.3 branch. The function definition is the same for both branches. I suspect I am overlooking some fundamental issue here. Anyone with a thought. Thanks Dave Day
RE: Rename a column if not already renamed.?
Thanks for the feedback. The error is something like column already exists and Quoting issue is Unlikely, and as I mentioned it works as expected if Im repeating the patch in 11.3 And I assume would if I was in 9.6. I suspect it has something to do with the content, or lack of, in the pg_attribute table following an upgrade. -Original Message- From: Luca Ferrari [mailto:fluca1...@gmail.com] Sent: Tuesday, August 20, 2019 3:41 PM To: Day, David Cc: pgsql-gene...@postgresql.org Subject: Re: Rename a column if not already renamed.? On Tue, Aug 20, 2019 at 9:07 PM Day, David wrote: > EXECUTE format ('ALTER TABLE %s RENAME %s TO %s', schema_table_, > old_name_, new_name); Hard to say without the error, but any chance there is a quoting problem? EXECUTE format ('ALTER TABLE %s RENAME %I TO %I', schema_table_, old_name_, new_name); Luca
RE: Rename a column if not already renamed.?
I agree the function could be improved to deal with both old and new name existing simultaneously. That is almost certainly the root cause, and one that I would confirm if the tester and site were currently available to me. Our work flow for this scenario is something like: 1. 9.6 pg_dump takes a snapshot of our 9.6 database. 2. Postgres is upgraded/freshly installed to 11.3.. 3. The 9.6 database is restored using the version 11 pg_restore tool. 4. Once our application process starts up, it sees there is a patch available in it's old branch that is one greater then it's restored 9.6 content. That happens to be a merge patch which resets the expectations. It attempts to apply all patches in the new branch since the point of divergence and runs into my current issue. It occurs to me I could simply put an exception handler in the rename column function and I would likely proceed merrily along. But curiosity is killing me and the cat. What is causing the old name to persist in the pg_attribute table after the rename. ? Would a stale function referencing the old column name be a contributor? Regards Dave Day -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Tuesday, August 20, 2019 4:57 PM To: Day, David Cc: Luca Ferrari ; pgsql-gene...@postgresql.org Subject: Re: Rename a column if not already renamed.? "Day, David" writes: > The error is something like column already exists and Not sure about the workflow this function is used within, but maybe you need to consider what to do when both the old and new column names exist. Because that sure sounds like what is happening. regards, tom lane
RE: Rename a column if not already renamed.?
Restoring into 11.3 instance the 9.6 dump ? -> yes. For the upgrade scenario, I confirmed that both old column name and new name are in the pg_attribute table at the time the patch attempts to rename it. Why both is a big question. However, It is easy enough to re-write the column rename function to deal with the simultaneous possibility. I will include the redefined function in the merge patch and see how it goes. I'll update the thread after some further exploration. Thanks all for your assistance. -Original Message- From: Adrian Klaver [mailto:adrian.kla...@aklaver.com] Sent: Wednesday, August 21, 2019 11:47 AM To: Day, David ; Tom Lane Cc: Luca Ferrari ; pgsql-gene...@postgresql.org Subject: Re: Rename a column if not already renamed.? On 8/21/19 7:52 AM, Day, David wrote: > I agree the function could be improved to deal with both old and new name > existing simultaneously. > That is almost certainly the root cause, and one that I would confirm if the > tester and site were currently available to me. > > Our work flow for this scenario is something like: > > 1. 9.6 pg_dump takes a snapshot of our 9.6 database. > 2. Postgres is upgraded/freshly installed to 11.3.. > 3. The 9.6 database is restored using the version 11 pg_restore tool. In 3) you are restoring to the new 11.3 instance, correct? > > 4. Once our application process starts up, it sees there is a patch available > in it's old branch that is one greater then it's restored 9.6 content. > That happens to be a merge patch which resets the expectations. > It attempts to apply all patches in the new branch since the point of > divergence and runs into my current issue. > > It occurs to me I could simply put an exception handler in the rename column > function and I would likely proceed merrily along. > But curiosity is killing me and the cat. What is causing the old name to > persist in the pg_attribute table after the rename. ? If you are indeed working on the new instance pg_attribute would have no 'memory' of the dropped column. It would seem to me to come down to what is passed into sys.rename_column() as old_name_, new_name. > > Would a stale function referencing the old column name be a contributor? > > > Regards > > > Dave Day > > > > > -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Tuesday, August 20, 2019 4:57 PM > To: Day, David > Cc: Luca Ferrari ; pgsql-gene...@postgresql.org > Subject: Re: Rename a column if not already renamed.? > > "Day, David" writes: >> The error is something like column already exists and > > Not sure about the workflow this function is used within, but maybe you need > to consider what to do when both the old and new column names exist. > Because that sure sounds like what is happening. > > regards, tom lane > > > > > -- Adrian Klaver adrian.kla...@aklaver.com
For SELECT statement (just a reading one, no 'FOR UPDATE'), is COMMIT or ROLLBACK preferred?
Using the latest PostgreSQL, does it matter if my code does a ROLLBACK or a COMMIT on an non-modifying SELECT statement? My impression is they'd be the same as nothing is changed and therefore there's nothing to commit or rollback, but wondered if there was any difference in how they are processed by Postgres? Thanks, David
Re: For SELECT statement (just a reading one, no 'FOR UPDATE'), is COMMIT or ROLLBACK preferred?
On 8/25/19 12:40 PM, Rob Sargent wrote On Aug 25, 2019, at 1:09 PM, David Wall wrote: Using the latest PostgreSQL, does it matter if my code does a ROLLBACK or a COMMIT on an non-modifying SELECT statement? My impression is they'd be the same as nothing is changed and therefore there's nothing to commit or rollback, but wondered if there was any difference in how they are processed by Postgres? Thanks, David In interactive psql, both issue a warning that there is no current transaction. What is your auto-commit setting and how is your code sent to the server? We are accessing it via JDBC, and so we SQL via PreparedStatements against a Connection, and the connection is not auto-commit. By default, the connection has a BEGIN TRANSACTION in place, so after all requests we do, we need to commit/rollback. The main issue is that if we do a SELECT and get a ResultSet that has no rows, if we do a commit or a rollback, it seems reasonable that these are identical as no changes were made. My inclination is to do a Connection.commit() on the connection because it wasn't in error or anything even if no rows were found, but wondered if a Connection.rollback() has any difference (positive/negative) in such a scenario. We have SELECT sql statements that sometimes do a rollback after such queries because even though no rows was found is fine for SQL, it may be an issue in the application that expects there to be at least one row. So we're trying to determine if there's actually any difference between commit/rollback after SELECT statements (with rows returned or not), a bit like if there's any difference for an UPDATE statement that returns zero rows were updated.
RE: Rename a column if not already renamed.?
Hi, Finally resolved this. Bottom-line some stupidity-bad analysis on my part. Scenario was - changes were ported from trunk back to a branch and then rolling that branch back into trunk. Altering the rename_column fx to check that old and new name did not exist was a necessary for merge process to complete. I ended up with an additional patch in trunk that would only be relevant to a upgraded system, to DROP IF EXISTS old_column name that was re-added by a trunk patch to when the branch rolled forward. Obviously nothing to do with 9.6 -> 11.3 postgres upgrade. Again thanks to all for assistance Dave -Original Message- From: Day, David Sent: Wednesday, August 21, 2019 2:58 PM To: 'Adrian Klaver' ; Tom Lane Cc: Luca Ferrari ; pgsql-gene...@postgresql.org Subject: RE: Rename a column if not already renamed.? Restoring into 11.3 instance the 9.6 dump ? -> yes. For the upgrade scenario, I confirmed that both old column name and new name are in the pg_attribute table at the time the patch attempts to rename it. Why both is a big question. However, It is easy enough to re-write the column rename function to deal with the simultaneous possibility. I will include the redefined function in the merge patch and see how it goes. I'll update the thread after some further exploration. Thanks all for your assistance. -Original Message- From: Adrian Klaver [mailto:adrian.kla...@aklaver.com] Sent: Wednesday, August 21, 2019 11:47 AM To: Day, David ; Tom Lane Cc: Luca Ferrari ; pgsql-gene...@postgresql.org Subject: Re: Rename a column if not already renamed.? On 8/21/19 7:52 AM, Day, David wrote: > I agree the function could be improved to deal with both old and new name > existing simultaneously. > That is almost certainly the root cause, and one that I would confirm if the > tester and site were currently available to me. > > Our work flow for this scenario is something like: > > 1. 9.6 pg_dump takes a snapshot of our 9.6 database. > 2. Postgres is upgraded/freshly installed to 11.3.. > 3. The 9.6 database is restored using the version 11 pg_restore tool. In 3) you are restoring to the new 11.3 instance, correct? > > 4. Once our application process starts up, it sees there is a patch available > in it's old branch that is one greater then it's restored 9.6 content. > That happens to be a merge patch which resets the expectations. > It attempts to apply all patches in the new branch since the point of > divergence and runs into my current issue. > > It occurs to me I could simply put an exception handler in the rename column > function and I would likely proceed merrily along. > But curiosity is killing me and the cat. What is causing the old name to > persist in the pg_attribute table after the rename. ? If you are indeed working on the new instance pg_attribute would have no 'memory' of the dropped column. It would seem to me to come down to what is passed into sys.rename_column() as old_name_, new_name. > > Would a stale function referencing the old column name be a contributor? > > > Regards > > > Dave Day > > > > > -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Tuesday, August 20, 2019 4:57 PM > To: Day, David > Cc: Luca Ferrari ; pgsql-gene...@postgresql.org > Subject: Re: Rename a column if not already renamed.? > > "Day, David" writes: >> The error is something like column already exists and > > Not sure about the workflow this function is used within, but maybe you need > to consider what to do when both the old and new column names exist. > Because that sure sounds like what is happening. > > regards, tom lane > > > > > -- Adrian Klaver adrian.kla...@aklaver.com
How to reformat output of "age()" function
How can I change the default output of the "age" function to be, for example, in minutes? E.g. dvdb=> select age('2019-09-11 09:00:00','2019-09-09 11:00:00'); age 1 day 22:00:00 (1 row) I want the equivalent of that time delta in minutes. Thanks in Advance
Re: How to reformat output of "age()" function
Thanks a lot! On Wed, Sep 11, 2019 at 12:34 PM Francisco Olarte wrote: > David: > > On Wed, Sep 11, 2019 at 5:39 PM David Gauthier > wrote: > > How can I change the default output of the "age" function to be, for > example, in minutes? > > E.g. > > dvdb=> select age('2019-09-11 09:00:00','2019-09-09 11:00:00'); > > age > > > > 1 day 22:00:00 > > (1 row) > > I want the equivalent of that time delta in minutes. > > Some answers have already been posted, but also consider what you are > doing. Intervals have three components, months, days, seconds for a > reason, ( ignoring leap seconds for now ) not all days have 24h ( > daylight saving time changes ) and not all months have 28/29/30/31 > days. IIRC interval normalization for epoch assumes all months have 30 > days, all days have 24 hours. > > If you want to know the elapsed minutes between two timestamps, it > might be better to do it directly, extract the epoch from both ( > seconds ), substract, divide by 60 truncating/rounding if you need to. > > This is what happens in one case on my timezone ( Europe/Madrid ): > > > test=# select '2019.11.20 20:00:00'::timestamptz as end, '2019.06.10 > 10:00:00'::timestamptz as start; > end | start > + > 2019-11-20 20:00:00+01 | 2019-06-10 10:00:00+02 > (1 row) > > test=# select age('2019.11.20 20:00:00'::timestamptz, '2019.06.10 > 10:00:00'::timestamptz) as interval; > interval > - > 5 mons 10 days 10:00:00 > (1 row) > > test=# select extract(epoch from age('2019.11.20 > 20:00:00'::timestamptz, '2019.06.10 10:00:00'::timestamptz)) as > interval_seconds; > interval_seconds > -- > 1386 > (1 row) > > test=# select extract(epoch from '2019.11.20 20:00:00'::timestamptz) - > extract(epoch from '2019.06.10 10:00:00'::timestamptz) as > elapsed_seconds; > elapsed_seconds > - > 14122800 > (1 row) > > Francisco Olarte. >
Web GUI for PG table ?
Hi: We're considering replacing a windows AccessDB based system with PG. Access was chosen because of it's GUI to its tables (looks and behaves like a SS). But performance can be volatile given the fact that the AccessDB front-ends and back-end are at different sites 1000+ miles apart. The belief is that a web-based approach where the DB server and web server are at the same site would be better. Question is, would such a "spreadsheet-like" GUI have to be built from scratch or is there something that can be leveraged? Things to consider are record locking, constraint pull-downs, triggers updating values, etc... . We already have a nice PG server doing other work for us and would like to use that if possible.
Re: Web GUI for PG table ?
Many good visualization options but I need one that runs on the web AND allows insert/update/delete records. On Thu, Sep 12, 2019 at 10:42 AM Adrian Klaver wrote: > On 9/12/19 7:08 AM, David Gauthier wrote: > > Hi: > > > > We're considering replacing a windows AccessDB based system with PG. > > Access was chosen because of it's GUI to its tables (looks and behaves > > like a SS). But performance can be volatile given the fact that the > > AccessDB front-ends and back-end are at different sites 1000+ miles > > apart. The belief is that a web-based approach where the DB server and > > web server are at the same site would be better. > > Question is, would such a "spreadsheet-like" GUI have to be built from > > scratch or is there something that can be leveraged? Things to consider > > are record locking, constraint pull-downs, triggers updating values, > > etc... . We already have a nice PG server doing other work for us and > > would like to use that if possible. > > I've been using Tabulator(http://tabulator.info/) a JavaScript library > for displaying data in tabular form. It does not care about the backend > just that it gets data. > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: pgbackrest restore to new location?
On 9/17/19 7:23 AM, Luca Ferrari wrote: > On Tue, Sep 17, 2019 at 12:00 PM Ron wrote: >> The real problem is that after doing that, "pg_ctl start -D >> /path/to/new/data" fails with "PANIC: could not locate a valid checkpoint >> record". Sounds like backup_label is missing or has been deleted. The easiest way to restore to a new location is just to copy pgbackrest.conf from the primary (or create a new one) with the same stanza and then alter pg1-path in pgbackrest.conf or at the command line. Regards, -- -David da...@pgmasters.net
Re: pgbackrest restore to new location?
On 9/17/19 10:03 PM, Stephen Frost wrote: > > That said- it brings up a pretty serious issue that should be discussed, > and that's nuking this: > > HINT: If you are not restoring from a backup, try removing the file > ".../backup_label". > > That hint is absolutely wrong these days when many tools have been > updated to use the non-exclusive backup method and it just ends up > getting people into trouble and, worse, can result in them having > corrupted clusters. > > I'll get a patch into the next commitfest to remove it. The exclusive > method has been deprecated for quite a few releases and we should stop > giving bad advice on the assumption that people are using it. We updated the error message and hints in c900c152, but it was not back-patched. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: pgbackrest - question about restoring cluster to a new cluster on same server
On 9/18/19 6:59 PM, Ron wrote: > > Scenario: there's data corruption on production server, so we need to do > a PITR restore from "a few days ago" of the cluster holding the prod > databases to a second cluster on that same VM in order to try and find > the missing data and load it back into the prod cluster. > > Other than putting a high I/O load on the LUN where repo-path is located > (from both writing WALs to it and reading the backed up files), will > there be any problems when "pg_ctl start" processes recovery.conf and > applies the WAL files to the new cluster while the prod cluster is > writing new WAL files. > > Does my question make sense? It does, but the answer lies outside of pgBackRest. "Can the repo storage handle the load of archive-push and archive-get at the same time" is really a question of storage and network throughput. pgBackRest compresses everything by default which goes a long way towards increasing throughput, but ultimately we don't control the bandwidth. Having said that, if the storage and network throughput are sufficient, restoring and recovering a standby using pgBackRest will not impact the primary as a direct pg_basebackup will. Regards, -- -David da...@pgmasters.net
Re: pgbackrest - question about restoring cluster to a new cluster on same server
On 9/18/19 9:40 PM, Ron wrote: > > I'm concerned with one pgbackrest process stepping over another one and > the restore (or the "pg_ctl start" recovery phase) accidentally > corrupting the production database by writing WAL files to the original > cluster. This is not an issue unless you seriously game the system. When a cluster is promoted it selects a new timeline and all WAL will be archived to the repo on that new timeline. It's possible to promote a cluster without a timeline switch by tricking it but this is obviously a bad idea. So, if you promote the new cluster and forget to disable archive_command there will be no conflict because the clusters will be generating WAL on separate timelines. In the case of a future failover a higher timeline will be selected so there still won't be a conflict. Unfortunately, that dead WAL from the rogue cluster will persist in the repo until an PostgreSQL upgrade because expire doesn't know when it can be removed since it has no context. We're not quite sure how to handle this but it seems a relatively minor issue, at least as far as consistency is concerned. If you do have a split-brain situation where two primaries are archiving on the same timeline then first-in wins. WAL from the losing primary will be rejected. Regards, -- -David da...@pgmasters.net