Re: [GENERAL] mild modification to pg_dump

2017-11-17 Thread Andres Freund
On 2017-11-17 18:56:45 -0300, marcelo wrote: > Truly, I'm catched in a very big app, so I have no time to read all > the docs. People on this list also have jobs. - Andres -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.pos

Re: [GENERAL] postmaster deadlock while logging after syslogger exited

2017-11-16 Thread Andres Freund
On November 16, 2017 7:06:23 PM PST, Tom Lane wrote: >Andres Freund writes: >> On 2017-11-16 21:39:49 -0500, Tom Lane wrote: >>> What might be worth thinking about is allowing the syslogger process >to >>> inherit the postmaster's OOM-kill-proofness settin

Re: [GENERAL] postmaster deadlock while logging after syslogger exited

2017-11-16 Thread Andres Freund
On 2017-11-16 21:39:49 -0500, Tom Lane wrote: > > We could work around a situation like that if we made postmaster use a > > *different* pipe as stderr than the one we're handing to normal > > backends. If postmaster created a new pipe and closed the read end > > whenever forking a syslogger, we sh

Re: [GENERAL] postmaster deadlock while logging after syslogger exited

2017-11-16 Thread Andres Freund
On 2017-11-17 11:09:56 +0900, Michael Paquier wrote: > On Fri, Nov 17, 2017 at 10:50 AM, Andres Freund wrote: > > On 2017-11-06 15:35:03 -0500, Tom Lane wrote: > >> David Pacheco writes: > >> > I ran into what appears to be a deadlock in the logging subsystem. It &

Re: [GENERAL] postmaster deadlock while logging after syslogger exited

2017-11-16 Thread Andres Freund
ion like that if we made postmaster use a *different* pipe as stderr than the one we're handing to normal backends. If postmaster created a new pipe and closed the read end whenever forking a syslogger, we should get EPIPEs when writing after syslogger died and could fall back to proper stderr or

Re: [GENERAL] Postgres 10.1 fails to start: server did not start in time

2017-11-12 Thread Andres Freund
fiddle with this a bunch in the regression tests, to get things to work properly on slow animals? If we had to do that, other people had to do so as well. Not the friendliest experience... Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] xmin increasing within a transaction block?

2017-11-06 Thread Andres Freund
--- > 2465 > (1 row) > > > Why is xmin greater than the current transaction id (and most notably > not "fixed")? > What am I missing here? > I'm running 9.6.5. That doesn't look like plain postgres behaviour to me. Any chance you're using a

Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread Andres Freund
that both checkpointer and startup process touch most shared buffers and thus show up as having touched all that memory. Regards, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgcon2015, what happened to SMR disk technolgy ?

2017-10-17 Thread Andres Freund
nformation to share about SMR technology at the moment. > I guess i saw it coming ^^ What I heard as rumours, not super trustworthy ones but not entirely uninformed, is that SMR drives are currently pretty much entirely sold to companies doing online data storage and such. Greetings, Andres

Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-11 Thread Andres Freund
On 2017-10-12 10:25:43 +0800, Craig Ringer wrote: > On 4 October 2017 at 00:21, milist ujang wrote: > > On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringer wrote: > >> > >> > >> Can you get stacks please? > >> > >> Use -g > > > > > > # Events: 2K cpu-clock > > # > > # Overhead Command Shared Obje

Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

2017-10-10 Thread Andres Freund
ckground_bytes = 0 > * vm.dirty_background_ratio = 2 > * vm.dirty_bytes = 0 > * vm.dirty_expire_centisecs = 3000 > * vm.dirty_ratio = 20 > * vm.dirty_writeback_centisecs = 500 I'd suggest monitoring /proc/meminfo for the amount of Dirty and Writeback memory, and see whether rapid changes there

Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

2017-10-05 Thread Andres Freund
Hi, On 2017-10-06 05:53:39 +0300, Vladimir Nicolici wrote: > Hello, it’s postgres 9.6. Consider setting checkpoint_flush_after to 16MB or something large like that. > I will probably try the compression on Monday or Tuesday, I can only > experiment with a single set of changes in a day, and I p

Re: [GENERAL] Strange checkpoint behavior - checkpoints take a long time

2017-10-05 Thread Andres Freund
Hi, On 2017-10-05 22:58:31 +0300, Vladimir Nicolici wrote: > I changed some configuration parameters during the night to the values I was > considering yesterday: > > - shared_buffers = 144GB #previously 96 GB > - bgwriter_lru_maxpages = 100 #previously 400 > - checkpoint_timeout = 30min #prev

Re: [GENERAL] Logical Replication - test_decoding - unchanged-toast-datum

2017-09-27 Thread Andres Freund
discernible from actual datums, so ... I agree that test_decoding isn't a great base of a replication tool, but I don't think it's completely unsuitable, and I also think that ship has sailed. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgr

Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-20 Thread Andres Freund
On 2017-09-20 13:00:34 -0500, Jerry Sievers wrote: > >> Pg 9.3 on monster 2T/192 CPU Xenial thrashing > > > > Not sure what the word "thrashing" in that sentence means. > > Cases of dozens or hundreds of sessions running typical statements for > this system but running 100% on their CPUs. Seems t

Re: [GENERAL] Up to date conventional wisdom re max shared_buffer size?

2017-09-19 Thread Andres Freund
ly high but does take big dips > that are sometimes sustained for seconds or even minutes. "shared_buffer coherency"? Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [HACKERS] pgjdbc logical replication client throwing exception

2017-09-15 Thread Andres Freund
On 2017-09-15 20:00:34 +, Vladimir Sitnikov wrote: > ++pgjdbc dev list. > > >I am facing unusual connection breakdown problem. Here is the simple code > that I am using to read WAL file: > > Does it always fails? > Can you create a test case? For instance, if you file a pull request with > th

Re: [GENERAL] Postgresql init and cleanup module functions

2017-09-05 Thread Andres Freund
On 2017-09-05 20:15:57 +0300, Yan Pas wrote: > I'm writing C postgresql module with some psql-functions and global state. > Do module API provide any init and cleanup functions? If yes then it would > be fine to see them in "35.9. C-Language Functions" help page." The relevant doc page is at https

Re: [GENERAL] Unlogged Crash Detection

2017-08-29 Thread Andres Freund
On 2017-08-29 20:19:52 +0900, Michael Paquier wrote: > On Tue, Aug 29, 2017 at 6:06 PM, Gersner wrote: > > I see, interesting. > > Please do not top-post. This is not the recommended way of dealing > with threads on this mailing list. > > > We have lots of unlogged tables, upon a crash we want t

Re: [GENERAL] streaming replication - crash on standby

2017-08-09 Thread Andres Freund
Hi, Please quote properly on postgres mailing lists. On 2017-08-09 22:31:23 +, Seong Son (US) wrote: > I see. Thank you. > > But the Postgresql process had crashed at that time so the streaming > replication was no longer working. Why would it crash and is that normal? You've given us

Re: [GENERAL] streaming replication - crash on standby

2017-08-09 Thread Andres Freund
Hi, On 2017-08-09 22:03:43 +, Seong Son (US) wrote: > The last line from pg_xlogdump of the last WAL file on the crashed standby > server shows the following. > > pg_xlogdump: FATAL: error in WAL record at DF/4CB95FD0: unexpected pageaddr > DB/62B96000 in log segment 00DF00

Re: [GENERAL] Interesting streaming replication issue

2017-08-09 Thread Andres Freund
nless you use archive_timeout (or as you do manually switch segments). Streaming replication doesn't guarantee that WAL is retained unless you use replication slots - which you don't appear to be. You can make SR retain more with approximate methods like wal_keep_segments too, but that's not a guara

Re: [GENERAL] Multixact members limit exceeded

2017-08-09 Thread Andres Freund
> 4000 multixacts). Maybe your multixact freeze min age is too high? > Getting rid of 15 GB of bloat is a good side effect, though, I'm sure. I wonder if there's longrunning transactions preventing cleanup. I suggest checking pg_stat_activity, pg_prepared_xacts, pg_replication_s

Re: [GENERAL] Multixact members limit exceeded

2017-08-09 Thread Andres Freund
p duties *less* aggressive, but trying to keep the rate of cleanup high. E.g. by increasing vacuum_freeze_min_age and vacuum_multixact_freeze_min_age, so only urgently old stuff gets cleaned up. Do you know how you come to have a lot of large multixacts? That's often indicative of

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andres Freund
invent an API that can represent creation, deletion, and writes to arbitrary offsets, for output plugins. > > I wish PG in some future version will address these quirks so one can > > operate on LOs more smoothly. You're welcome to help... Greetings, Andres Freund -- Sent v

Re: [GENERAL] pglogical vs. built-in logical replication in pg-10

2017-06-22 Thread Andres Freund
uld actually be easy, it's a bit more difficult to decide how to represent it to output plugins... Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Andres Freund
On 2017-06-19 15:21:20 -0700, Peter Geoghegan wrote: > On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes wrote: > > Unfortunately, it is only implemented in very narrow circumstances. You > > have to be doing bitmap index scans of many widely scattered rows to make it > > useful. I don't think that th

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-10 Thread Andres Freund
scacheCallback in our code, and while I think not all of > them can be reached in a single process, we demonstrably get as high as 21 > registered callbacks in some regression test runs. That's not leaving a > lot of daylight for add-on modules. The second patch attached includes >

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
On 2017-05-05 21:32:27 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote: > >> It appears that most of the time is spent in the > >> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache > >> i

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
ems to apply cleanly to 9.5) I've seen this be a significant fraction of CPU time completely independent of logical decoding, so I'd guess this is worthwhile independently. Not sure what a good benchmark for this would be. Greetings, Andres Freund -- Sent via pgsql-general mailing l

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
x27;t mind putting in the work if someone > could describe what is happening here, and have a discussion with me about > what kind of changes might be necessary to improve the performance. If you could provide an easily runnable sql script that reproduces the issue, I'll have a look. I thi

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Andres Freund
On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote: > I can confirm this observation. I bought the Intel 750 NVMe SSD last year, > the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of > sustained O_DIRECT sequential writes. But when running pgbench, I can't push > more than ~3

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Andres Freund
On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote: > On 04/27/2017 09:34 AM, Andres Freund wrote: > > On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > > > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > > > I would agree it isn't yet a

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Andres Freund
On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > Ok, based on the, few, answers I've got so far, my experience is indeed > > skewed. A number of the PG users I interacted with over the last couple &

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Andres Freund
Hi, On 2017-04-24 21:17:43 -0700, Andres Freund wrote: > I've lately seen more and more installations where the generation of > write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious > whether that's primarily a "sampling error" of mine, or w

[GENERAL] Questionaire: Common WAL write rates on busy servers.

2017-04-24 Thread Andres Freund
s for wal_compression, max_wal_size (9.5+) / checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? - Could you quickly describe your workload? Feel free to add any information you think is pertinent ;) Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-gene

Re: [GENERAL] Protocol 2 and query parameters support

2017-04-24 Thread Andres Freund
On 2017-04-24 16:07:01 -0400, Rader, David wrote: > As Tom mentioned, it sounds like the issue is that Presto expects to only > use simple query, not extended query (no server-side prepared statements). > The JDBC driver supports setting the prepare threshold to 0 to disable > using server-side pre

Re: [GENERAL] Protocol 2 and query parameters support

2017-04-23 Thread Andres Freund
port protocol 2. Could you provide a reference about presto using v2 protocol? A quick search didn't turn anything up. Presto seems a bit too new to rely on v2, given how long ago v3 has been introduced. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@pos

Re: [GENERAL] options for logical replication plugins?

2017-03-15 Thread Andres Freund
Hi, On 2017-03-15 18:29:06 +, Chris Withers wrote: > Shame the decoding has to be done on the server-side rather than the client > side. Why? You can't filter on the client side. You don't have any catalog information available, so you'd have to transport a lot of metadata and / or decode t

Re: [GENERAL] Potential Bug: Frequent Unnecessary Degeneration

2017-02-15 Thread Andres Freund
Hi, On 2017-02-15 20:00:11 -0330, David O'Mahony wrote: > We're running two nodes using with replication enabled. > > pgpool routinely (every day) performs a failover with the following > statements appearing the in log: This list is about bugs in postgresql.org maintained projects, pgpool isn't

Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-03 Thread Andres Freund
On 2017-02-03 22:17:55 +0300, Nikolai Zhubr wrote: > 03.02.2017 20:29, Andres Freund: > [...] > > > > Could you use process monitor or such to see what the process is doing > > > > while using a lot of CPU? > > > > > > I'm not sure how t

Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-03 Thread Andres Freund
On 2017-02-01 01:02:11 +0300, Nikolai Zhubr wrote: > 31.01.2017 19:51, Andres Freund: > [...] > > > Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32) > > > > That's good to know, less because of 519b0757, more because of the latch > >

Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-01-31 Thread Andres Freund
On 2017-01-31 11:45:33 +0300, Nikolai Zhubr wrote: > 31.01.2017 10:37, I wrote: > > [...] > > > > 1. "select localtimestamp" 40 times (As separate requests, one by > > > > one, but > > > > no delay inserted in between) > > > > 2. wait 1/2 second. > > > > 3. goto 1 > > > > > > Craig, could this be

Re: [GENERAL] logical decoding output plugin

2016-12-10 Thread Andres Freund
ack (no catalog access there), but you can do it the first time through the change callback. For lookups the most complete lookup is to use RangeVarGetRelid() to get the oid. There's other variants, but that's the easiest approach. Do you have to care about the table being renamed? G

Re: [GENERAL] Is is safe to use SPI in multiple threads?

2016-12-09 Thread Andres Freund
On 2016-12-09 16:52:05 +0800, Qiu Xiafei wrote: > 1. Is there a way to use SPI in multi-thread style? No. > 2. Another option is to use libpq, like normal clients do. Is libpq as > efficient as SPI? No. - Andres -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make ch

Re: [GENERAL] hot_standby_feedback

2016-11-28 Thread Andres Freund
y, the only option is to cancel it. > > Is that explanation correct? No. That just means that we don't update the value more frequently. The value reported is a "horizon" meaning that nothing older than the reported value can be accessed. Greetings, Andres Freund --

Re: [GENERAL] out-of-order XID insertion in KnownAssignedXids

2016-10-20 Thread Andres Freund
to be ok as long as there's no checkpoints while taking the base backups (or when the control file was copied early enough). But as soon as a second checkpoint happens before the control file is copied... Fredrik, how did you end up removing the label? Greetings, Andres Freund -- Sent via p

Re: [GENERAL] out-of-order XID insertion in KnownAssignedXids

2016-10-18 Thread Andres Freund
produce any kind of log mesages > / details that would be helpful. Could you use pg_xlogdump to dump the WAL file on which replay failed? And then attach the output in a compressed manner? Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To m

Re: [GENERAL] LOG: munmap(0x7fff80000000) failed: Invalid argument

2016-10-10 Thread Andres Freund
On 2016-10-10 18:21:48 -0400, Tom Lane wrote: > Chris Richards writes: > > Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ... > > Creating new cluster 9.5/main ... > > config /etc/postgresql/9.5/main > > data /var/lib/postgresql/9.5/main > > locale en_US.UTF-8 > > LOG: munmap(0x7fff8

Re: [GENERAL] Lock contention in TransactionIdIsInProgress()

2016-10-08 Thread Andres Freund
On 2016-10-07 08:36:12 -0500, Merlin Moncure wrote: > Won't happen. Only bugs get ported back Hopefully we don't port bugs back all that often. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-ge

Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-16 Thread Andres Freund
On 2016-09-17 03:12:53 +0300, Nikolai Zhubr wrote: > 17.09.2016 2:05, Andres Freund: > [...] > > Well, it's not pretty. I quite dislike this bit, and I've complained > > about it before. But it is noteworthy that it's nearly impossible to > > hit these

Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-16 Thread Andres Freund
e dislike this bit, and I've complained about it before. But it is noteworthy that it's nearly impossible to hit these days, due to ssl-renegotiation support having been ripped out. That's what could trigger openssl to require writes upon reads. Greetings, Andres Freund -- Sent v

Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-02 Thread Andres Freund
er outage and not lose all your data. > > Stonebraker's new stuff is cool, but it is NOT designed to survive > total power failure. > > Two totally different design concepts. It's apples and oranges to compare > them. I don't think they're that fundame

Re: [GENERAL] what change in postgres 9.5 improvements for multi-CPU machines

2016-08-16 Thread Andres Freund
On 2016-03-25 18:11:21 +0800, 657985...@qq.com wrote: > hello everyone: > I was bothered by the postgres spinlock for a long time . How to > understand this sentence "postgres 9.5 performance improvements for multi-CPU > machines" > at present my database is 9.3.5 。 Can it solve the spin

Re: [GENERAL] Confusing with commit time usage in logical decoding

2016-03-02 Thread Andres Freund
Hi, On 2016-02-29 11:12:14 +0100, Weiping Qu wrote: > If you received this message twice, sorry for annoying since I did not > subscribe successfully previously due to conflicting email domain. > > Dear postgresql general mailing list, > > I am currently using the logical decoding feature (versi

Re: [GENERAL] 2x Time difference between first and subsequent run of the same query on fresh established connection (on linux, with perf data included, all query data in the shared buffers) on postgre

2016-02-25 Thread Andres Freund
Hi, On 2016-02-25 13:50:11 +1100, Maxim Boguk wrote: > The first run (something fishy with kernel calls): > 19,60% postgres [kernel.kallsyms] [k] filemap_map_pages > 15,86% postgres postgres [.] hash_search_with_hash_value >8,20% postgres postgres [.] heap_hot_se

Re: [GENERAL] A motion

2016-01-23 Thread Andres Freund
On 2016-01-23 15:31:02 -0800, Joshua D. Drake wrote: > With respect Adrian, that is a motion that never stands a chance. If you > don't want to read it, set up a filter that sends it right to the round > file. It'd help if there weren't six, but one thread... -- Sent via pgsql-general mailing l

Re: [GENERAL] Shared system resources

2016-01-04 Thread Andres Freund
On 2015-12-23 11:55:36 -0500, George Neuner wrote: > With sufficient privileges, a debugger-like process can attach and > examine the memory of a running - or just terminated - process, but it > won't have access to discarded (unmapped) memory. You just have to load a kernel module to do so - it's

Re: [GENERAL] Old source code needed

2015-11-27 Thread Andres Freund
On 2015-11-27 10:31:03 +0100, NTPT wrote: > Thanx for help,  I grab the source code  that match  old cluster fs backup. > > However: Should it run fine compiled  with recent gcc 4.9.3 ? > > while compiled with this gcc , I got a lot of strange errors like > > ERROR:  could not identify an ord

Re: [GENERAL] full_page_writes on SSD?

2015-11-24 Thread Andres Freund
On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote: > On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk wrote: > > > if SSDs have 4kB/8kB sectors, and we'd make the Postgres page > > size equal to the SSD page size, do we still need full_page_writes? > > If an OS write of the PostgreSQL page size ha

Re: [GENERAL] ERROR: tablespace "archive2" is not empty

2015-10-19 Thread Andres Freund
On 2015-10-19 11:14:33 +0200, Josip Rodin wrote: > On Mon, Oct 19, 2015 at 11:06:59AM +0200, Andres Freund wrote: > > Hi, > > > > On 2015-10-19 10:49:11 +0200, Josip Rodin wrote: > > > % sudo -H -u postgres psql mydb -c "SELECT oid, relname, relkind FROM >

Re: [GENERAL] postgresql 9.4 streaming replication

2015-10-19 Thread Andres Freund
> /var/log/standby.log' > primary_slot_name='standby1' pg_standby is for a "warm standby" - instead of signalling an error if an archive file does not exist it'll sleep. Thus this node will never enter streaming replication. Use cp or something instead. Gre

Re: [GENERAL] ERROR: tablespace "archive2" is not empty

2015-10-19 Thread Andres Freund
g query. The files on disk are relefilenodes not oids. Try WHERE pg_relation_filenode(oid) IN ... Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] BDR workers exiting?

2015-10-12 Thread Andres Freund
On 2015-10-12 14:37:07 +, Steve Pribyl wrote: > I am loading up a 60G database into BDR database and these "ERRORS" are in my > logs. Is not normal behavior or is something going bad. > > 2015-10-12 09:28:59.389 CDT,,,30371,,561bc17d.76a3,1,,2015-10-12 09:19:41 > CDT,5/0,0,ERROR,XX000,"data

Re: [GENERAL] Drop or alter column under load give ERROR #42804 structure of query does not match function result type:

2015-10-12 Thread Andres Freund
;s[1] example, the above sequence of events does not fail. It > fails in Victors's case when the server is under load, so it seems there is > another factor in play. The above sequence is only problematic if 2) happens exactly between 1) and 3), which is not particularly likely given that 1)

Re: [GENERAL] Drop or alter column under load give ERROR #42804 structure of query does not match function result type:

2015-10-12 Thread Andres Freund
of the "users" type when the table is used, but there's no locking preventing the columns to be dropped while the function is used. So what happens is that 1) the function is parsed & planned 2) DROP COLUMN is executed 3) the contained statement is executed 4) a mismatch between

Re: [GENERAL] Drop or alter column under load give ERROR #42804 structure of query does not match function result type:

2015-10-12 Thread Andres Freund
n undesirable way. (In the case where our own xact * modifies the rel, the relcache update happens via * CommandCounterIncrement, not here.) */ if (res != LOCKACQUIRE_ALREADY_HELD) AcceptInvalidationMessages(); } I've not investigated what

Re: [GENERAL] in defensive of zone_reclaim_mode on linux

2015-09-06 Thread Andres Freund
the negative impact of transparent hugepages being mitigated to some degree by zone reclaim mode (which'll avoid some cross-node transfers). Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] FDW and BDR

2015-09-02 Thread Andres Freund
ikely, nobody removed the error check. Either way it should be simple to implement. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bdr download

2015-08-31 Thread Andres Freund
On 2015-08-31 18:41:19 +0200, Andres Freund wrote: > http://archives.postgresql.org/message-id/%2053A2AA64.9040709%402ndquadrant.com http://archives.postgresql.org/message-id/53A2AA64.9040709%402ndquadrant.com There was a space too much in the link... Greetings, Andres Freund -- Sent

Re: [GENERAL] bdr download

2015-08-31 Thread Andres Freund
this list is what to use for BDR reports > > and discussions, so this report is in the right place. > > Huh, why did we decide that when the community doesn't control any of > it? That doesn't make any sense. http://archives.postgresql.org/message-id/%2053A2AA64.9040709%4

Re: [GENERAL] Grouping sets, cube and rollup

2015-08-26 Thread Andres Freund
On 2015-08-26 17:09:26 -0500, Merlin Moncure wrote: > On Tue, Aug 25, 2015 at 7:04 PM, Edson Richter wrote: > > Any chance to get those amazing wonderful features backported to 9.4? > > you might have some luck merging in the feature yourself if you're so > inclined. It's imo too large a featur

Re: [GENERAL] Logical decoding off of a replica?

2015-07-30 Thread Andres Freund
On 2015-07-28 17:54:57 +, Curt Micol wrote: > Hello, > > I've been working to get a replica setup to perform logical decoding > and haven't been able to get the right configuration. I've tried > everything I can think of. While researching I found this post on > Stack Overflow: > http://stacko

Re: [GENERAL] Delete rule does not prevent truncate

2015-07-24 Thread Andres Freund
On 2015-07-24 10:29:21 +0100, Tim Smith wrote: > That's not the point. Backups are important, but so is the concept of > various layers of anti-fat-finger protection. Restoring off backups > should be last resort, not first. Oh, comeon. Install a TRUNCATE trigger and let this thread die. Andre

Re: [GENERAL] Delete rule does not prevent truncate

2015-07-23 Thread Andres Freund
On 2015-07-23 12:57:20 +0100, Tim Smith wrote: > Thus, I should not have to use a trigger for TRUNCATE because the "each > row" concept does not apply. Plus it makes perfect sense to want to > transform the truncate command and transform into ignore That'd entirely defeat the point of TRUNCATE

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Andres Freund
On 2015-07-20 17:00:52 +0300, Spiros Ioannou wrote: > FYI we have an 9.3.5 with commit_delay = 4000 and commit_siblings = 5 with > a 8TB dataset which seems fine. (Runs on different - faster hardware > though). 9.4 has a different xlog insertion algorithm (scaling much better), so that unfortunate

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Andres Freund
Hi, On 2015-07-20 15:58:33 +0300, Spiros Ioannou wrote: > Happened again, another backtrace from a COMMIT process. I changed the > commit_delay to 0 (it was 4000 to help with our storage) and will report > back. What hardware & OS is this happening on? Regards, Andres -- Sent via pgsql-gener

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Andres Freund
Heikki, On 2015-07-20 13:27:12 +0200, Andres Freund wrote: > On 2015-07-20 13:22:42 +0200, Andres Freund wrote: > > Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in > > XLogFlush(). > > These are the relevant stack traces: > db9lock/

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Andres Freund
On 2015-07-20 13:22:42 +0200, Andres Freund wrote: > Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in > XLogFlush(). These are the relevant stack traces: db9lock/debuglog-commit.txt #2 0x7f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, valptr=0x7f70f2ab66a0,

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Andres Freund
On 2015-07-20 13:06:51 +0200, Alvaro Herrera wrote: > Spiros Ioannou wrote: > > Hi Tom, > > thank you for your input. The DB was stuck again, I attach all logs and > > stack traces. > > > > A stack trace from a COMMIT, an INSERT, an UPDATE, the wal writer, the > > writer, and a sequence. > > > >

Re: [GENERAL] cascading replication and replication slots.

2015-06-23 Thread Andres Freund
On 2015-06-22 13:32:23 +, Leif Gunnar Erlandsen wrote: > Is it possible to use a replication_slot for a downstream-server when setting > up cascading replication on 9.4 yes. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://w

Re: [GENERAL] CLOG read problem after pg_basebackup

2015-06-21 Thread Andres Freund
70" at offset 237568: Success. > > (the clog file differed in each case of course..) How exactly are you starting the the standby after the basebackups? Any chance you removed backup.label? This sounds like the typical symptoms of doing that. Greetings, Andres Freund -- Sent via pgsq

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-08 14:23:32 -0300, Alvaro Herrera wrote: > Sure. I just concern that we might be putting excessive trust on > emergency workers being launched at a high pace. I'm not sure what to do about that. I mean, it'd not be hard to simply ignore naptime upon wraparound, but I'm not sure that'd

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera wrote: >Andres Freund wrote: > >> A first version to address this problem can be found appended to this >> email. >> >> Basically it does: >> * Whenever more than MULTIXACT_MEMBER_SAFE_THRESHOLD are u

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-08 15:15:04 +0200, Andres Freund wrote: > 1) the autovacuum trigger logic isn't perfect yet. I.e. especially with > autovacuum=off you can get into situations where emergency vacuums > aren't started when necessary. This is particularly likely to happen >

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-05 16:56:18 -0400, Tom Lane wrote: > Andres Freund writes: > > On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas > > wrote: > >> I think we would be foolish to rush that part into the tree. We > >> probably got here in the first place by rushing

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-07 Thread Andres Freund
On 2015-06-05 20:47:33 +0200, Andres Freund wrote: > On 2015-06-05 14:33:12 -0400, Tom Lane wrote: > > Robert Haas writes: > > > 1. The problem that we might truncate an SLRU members page away when > > > it's in the buffers, but not drop it from the buffers, leadin

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas wrote: >On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund >wrote: >> On 2015-06-05 14:33:12 -0400, Tom Lane wrote: >>> Robert Haas writes: >>> > 1. The problem that we might truncate an SLRU members page away >

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
committing today to a release wrap on Monday, > I don't see it happening till after PGCon. I wonder if, with all the recent, err, training, we could wrap it on Tuesday instead. Independent of the truncation rework going in or not, an additional work day to go over all the changes and

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
On 2015-06-05 11:43:45 -0400, Tom Lane wrote: > Robert Haas writes: > > On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch wrote: > >> I read through this version and found nothing to change. I encourage other > >> hackers to study the patch, though. The surrounding code is challenging. > > > Andres t

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Andres Freund
Hi, On 2015-06-04 12:57:42 -0400, Robert Haas wrote: > + /* > + * Do we need an emergency autovacuum? If we're not sure, assume yes. > + */ > + return !oldestOffsetKnown || > + (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD); I think without teaching a

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Andres Freund
On 2015-06-03 15:01:46 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > That's not necessarily the case though, given how the code currently > > works. In a bunch of places the SLRUs are accessed *before* having been > > made consistent by WAL replay. Especia

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Andres Freund
de is. We must be missing some crucial detail ... I might have missed it in this already long thread. Could you share a bunch of details about hte case? It'd be very interesting to see the contents of the backup label (to see where start/end are), the contents of the initial checkpoint (to se

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
y happen after a upgrade from < 9.3. And in that case we initialize nextOffset to 0. That ought to safe us? Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
ndary, right? I'm not sure that's actually ok; because the value at the beginning of the segment can very well end up being a 0, as MaybeExtendOffsetSlru() will have filled the page with zeros. I think that should be harmless, the worst that can happen is that oldestOffset errorneously is 0, which should be correct, even though possibly overly conservative, in these cases. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
On 2015-06-02 11:49:56 -0400, Robert Haas wrote: > On Tue, Jun 2, 2015 at 11:44 AM, Andres Freund wrote: > > On 2015-06-02 11:37:02 -0400, Robert Haas wrote: > >> The exact circumstances under which we're willing to replace a > >> relminmxid with a newly-c

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
ems like somebody needs to do it. I'm willing to invest the time to develop an initial version, but I'll need help evaluating it. I don't have many testing resources available atm, and I'm not going to trust stuff I developed while travelling by just looking at the code. Greeti

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
On 2015-06-02 11:29:24 -0400, Robert Haas wrote: > On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund wrote: > > But what *definitely* looks wrong to me is that a TruncateMultiXact() in > > this scenario now (since a couple weeks ago) does a > > SimpleLruReadPage_ReadOnly() in

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
On 2015-06-02 11:16:22 -0400, Robert Haas wrote: > I'm having trouble figuring out what to do about this. I mean, the > essential principle of this patch is that if we can't count on > relminmxid, datminmxid, or the control file to be accurate, we can at > least look at what is present on the disk

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Andres Freund
On 2015-06-01 14:22:32 -0400, Robert Haas wrote: > On Mon, Jun 1, 2015 at 4:58 AM, Andres Freund wrote: > > The lack of WAL logging actually has caused problems in the 9.3.3 (?) > > era, where we didn't do any truncation during recovery... > > Right, but now we're

  1   2   3   >