WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
Just in case this helps the next person who can't figure out why their postgres server won't start today: If you are running Postgres inside Microsoft WSL (at least on Ubuntu, maybe on others too), and just picked up a software update to version 11.2, you will need to go into your /etc/postgresql.conf file and set fsync=off. This took me a while to fix because the error you message you get if you don't is the generic: terminating connection because of crash of another server process 2015-07-15 20:18:37 UTC The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. I spent a long time trying to completely uninstall and resintall, etc. to recover from the "crash" although I don't think there ever was one and the message appears on first use of the create database command even on a completely clean install. I don't know if this is possible/reasonable, but if the database code could automatically turn fsync off on WSL it might save the next users some trouble.
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
Thanks Tom I feel like I'm in a little over my head here but I'll try to help as I can. With fsync off, everything appears to run as it did before on 11.1. With fsync default/on, the problem is easily reproducible by trying to create a database. I believe the very first time I saw it it was with a routine query but I'm not 100% sure. psql-11.2=> create database testdb; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. SSL SYSCALL error: EOF detected The connection to the server was lost. Attempting reset: Failed. !> Here are the entries from the log: 1527 2019-02-14 15:06:08.218 DST [8398] PANIC: could not flush dirty data: Function not implemented 1528 2019-02-14 15:06:08.218 DST [8396] LOG: checkpointer process (PID 8398) was terminated by signal 6: Aborted 1529 2019-02-14 15:06:08.218 DST [8396] LOG: terminating any other active server processes 1530 2019-02-14 15:06:08.218 DST [8422] homestead@homestead WARNING: terminating connection because of crash of another server process 1531 2019-02-14 15:06:08.218 DST [8422] homestead@homestead DETAIL: The postmaster has commanded this server process to roll back the current transaction an d exit, because another server process exited abnormally and possibly corrupted shared memory. 1532 2019-02-14 15:06:08.218 DST [8422] homestead@homestead HINT: In a moment you should be able to reconnect to the database and repeat your command. 1533 2019-02-14 15:06:08.218 DST [8401] WARNING: terminating connection because of crash of another server process 1534 2019-02-14 15:06:08.218 DST [8401] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anot her server process exited abnormally and possibly corrupted shared memory. 1535 2019-02-14 15:06:08.218 DST [8401] HINT: In a moment you should be able to reconnect to the database and repeat your command. 1536 2019-02-14 15:06:08.241 DST [8396] LOG: all server processes terminated; reinitializing 1537 2019-02-14 15:06:08.259 DST [8433] LOG: database system was interrupted; last known up at 2019-02-14 15:05:30 DST 1538 2019-02-14 15:06:08.259 DST [8433] PANIC: could not flush dirty data: Function not implemented 1539 2019-02-14 15:06:08.264 DST [8396] LOG: startup process (PID 8433) was terminated by signal 6: Aborted 1540 2019-02-14 15:06:08.264 DST [8396] LOG: aborting startup due to startup process failure 1541 2019-02-14 15:06:08.266 DST [8434] homestead@homestead FATAL: the database system is in recovery mode 1542 2019-02-14 15:06:08.268 DST [8396] LOG: database system is shut down As to why it worked before, I don't think fsync() ever worked on WSL, and there were places where you'd see warnings about it in 11.1, they just wouldn't crash the server. As to the "running with scissors" risk, I'm going to guess the most common use case for WSL is as a personal dev box where all the data is disposable anyway. That's the case for me at least. Best, Bruce On Thu, Feb 14, 2019 at 2:48 PM Tom Lane wrote: > Bruce Klein writes: > > If you are running Postgres inside Microsoft WSL (at least on Ubuntu, > maybe > > on others too), and just picked up a software update to version 11.2, you > > will need to go into your /etc/postgresql.conf file and set fsync=off. > > Hm. Probably this is some unexpected problem with the > panic-on-fsync-failure change; although that still leaves some things > unexplained, because if fsync is failing for you now, why didn't it fail > before? Anyway, you might try experimenting with data_sync_retry, > instead of running with scissors by turning off fsync altogether. > See first item in the release notes: > > https://www.postgresql.org/docs/11/release-11-2.html > > Also, we'd quite like to hear more details; can you find any PANIC > messages in the server log? > > regards, tom lane >
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
> In 11.1 did you see the message "WARNING: could not flush dirty data: Function not implemented" Yes re: the discussions of O/S and filesystem in that thread: I am not qualified to describe the implementation of WSL but I believe it is neither pure Ubuntu running on metal, nor a virtual machine hosted on Windows. I believe what the Microsoft folks have done is implement something around the driver/kernel layer that fools Ubuntu into thinking it is connected to hardware it expects, while it is ultimately still running on top of a Windows kernel and Windows filesystem. That includes stubbing out or otherwise presenting an appearance of implementing some functions like perhaps fsync() that it really doesn't. Note I believe this is fundamentally different from the old Cygwin and similar projects approach, i.e. WSL does not involve recompiling on top of window specific libraries etc. If any of these details are important to anyone you should verify them from a more credible source. If it matters, the Ubuntu version I am running on WSL now is 16.04.5. On Thu, Feb 14, 2019 at 3:44 PM Ravi Krishna wrote: > Hi Bruce, > > Check my earlier thread on PG 10.5 on Ubuntu Bash with WSL. > > > https://www.postgresql.org/message-id/1301077575.68539.1535929075959%40mail.yahoo.com > > In 11.1 did you see the message "WARNING: could not flush dirty data: > Function not implemented" > > regards >
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
> I guess the OP is reporting about a .deb that was built on a real Linux system Yes, I (OP) installed via: % wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - % sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -sc)-pgdg main" > /etc/apt/sources.list.d/PostgreSQL.list' % sudo apt update % sudo apt-get install postgresql-11 > no one bothered to complain about PostgreSQL spewing scary looking warnings on WSL for years At least you weren't spamming a once-per-second(!) log entry about a missing function call like one of my other packages did (can't remember, maybe it was nginx?) WSL still feels early and if you're going to try it, you get used to annoyances like that. I'm glad Microsoft is trying though and I hope with time and support they get all the way there because developers who have enterprise or other reasons to be on Windows instead of Mac desktops deserve to have decent unix tools too. Warts and all I still find it overall more convenient and fluid than my previous VirtualBox / vagrant solution. On Fri, Feb 15, 2019 at 11:20 AM Thomas Munro wrote: > On Sat, Feb 16, 2019 at 6:50 AM Andres Freund wrote: > > On February 15, 2019 9:44:50 AM PST, Tom Lane wrote: > > >Andres Freund writes: > > >> On February 15, 2019 9:13:10 AM PST, Tom Lane > > >wrote: > > >>> I'm of the opinion that we shouldn't be panicking for > > >sync_file_range > > >>> failure, period. > > > > > >> With some flags it's strictly required, it does"eat"errors depending > > >on the flags. So I'm not sure I understand? > > > > > >Really? The specification says that it starts I/O, not that it waits > > >around for any to finish. > > > > That depends on the flags you pass in. By memory I don't think it eats > an error with our flags in recent kernels, but I'm not sure. > > Right, there was some discussion of that, and I didn't (and still > don't) think it'd be wise to rely on undocumented knowledge about > which flags can eat errors based on a drive-by reading of a particular > snapshot of the Linux tree. The man page says it can return EIO; I > think we should assume that it might actually do that. > > BTW I had a report from someone on IRC that PostgreSQL breaks in other > ways (not yet understood) if you build it directly on WSL/Ubuntu. I > guess the OP is reporting about a .deb that was built on a real Linux > system. I'm vaguely familiar with these types of problems from other > platforms (Linux syscall emulation on FreeBSD and Sun-ish systems, and > also I'm old enough to remember people doing SCO SysV syscall > emulation on Linux systems back before certain valuable software was > available natively); it's possible that you get ENOSYS on other > emulators too, considering that other kernels don't seem to have a > sync_file_range()-like facility, but probably no one cares, since > there is no reason to run PostgreSQL on a syscall emulator when you > can run it natively. This is a bit different though: I guess people > want to be able to develop Linux-stack stuff on company-issued Windows > computers for later deployment on Linux servers; someone interested in > this would ideally make it work and set up a build farm animal to tell > us when we break it. It would probably require only minimal changes, > but considering that no one bothered to complain about PostgreSQL > spewing scary looking warnings on WSL for years, it's not too > surprising that we didn't consider this case before. A bit like the > nightjar case, the PANIC patch revealed a pre-existing problem that > had gone unreported and needs some work, but it doesn't seem like a > very good reason to roll back that part of the change completely IMHO. > > -- > Thomas Munro > http://www.enterprisedb.com >
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
Sounds good to me. Thank you! On Fri, Feb 22, 2019 at 11:47 AM Thomas Munro wrote: > On Tue, Feb 19, 2019 at 5:31 PM Thomas Munro > wrote: > > On Tue, Feb 19, 2019 at 5:16 PM James Sewell > wrote: > > >> Here's a starter patch that shows one of the approaches discussed. It > > >> gets WSL users to a better place than they were before, by suppressing > > >> further warnings after the first one. > > > > > > This wasn't quite right, updated to check erro for ENOSYS (not rc) > > > > > > This compiles and stops the panic on WSL (with a single warning). > > > > > > I haven't tested if a version compiled on Linux will behave the same > way - but based on the error messages in the top post it looks like the > behavior is the same. > > > > Great. Thanks for testing, and for the fix! Well that all sounds > > like good news: it corrects the behaviour from 11.2, and also improves > > on the previous behaviour which I'd have accepted as a bug if anyone > > had reported it. So the next problem is that we don't have a > > consensus on whether this is the right approach, so I don't feel like > > I can commit it yet. Does any want to make another concrete proposal? > > Ok, here's the version I'm planning to push soon if there are no > objections. > Re-adding Bruce to the thread, as I just noticed the CC list got > pruned at some point in this thread. > > -- > Thomas Munro > https://enterprisedb.com >
Re: Methods to quickly spin up copies of an existing databases
Apologies for the low tech suggestion, but if this really is a clone of a previously existing template, could the clone operation just be done ahead of time? I.e., have the build server keep X copies ready for use and generate additional copies as those are consumed, so that the cloning is no longer on the critical path? On Fri, Mar 1, 2019 at 11:09 AM Jerry Sievers wrote: > Kenneth Marshall writes: > > > On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote: > > > >> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to > >> store your database, then it might work for you. ZFS is > >> copy-on-write so it can very quickly clone a database. > >> > >> kevin > > > > Hi Arjun > > > > Redhat 7 does have LVM snapshots that does something similar. Kevin is > > correct, COW is the secret. > > Going a bit further... > > Any sort of storage backend that can support *atomic* snapshots across > *all* volumes (in case multiple tablespaces ar involved), can be used to > permit $instantaneous cloning where instantaneous relates to the actual > snapshot time and crash recovery. > > Inability to make *atomic* snaps but perhaps seperate snaps very > quickly, combined with PITR can result in clones of high-churn systems > sized in TBs (as in our use case) to be provisioned in about 1 minute. > > Nothing but the most trivial system can be cloned rapidly and perhaps > any number of times in succession without employment of > thin-provisioning, copy-on-write (as mentioned already), etc. > >Virtual copy is more and more compelling as physical >size, or more precisely, *physical* copy time grow. > > HTH > > > > > > > Regards, > > Ken > > > > > > -- > Jerry Sievers > Postgres DBA/Development Consulting > e: postgres.consult...@comcast.net > >