WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-14 Thread Bruce Klein
Just in case this helps the next person who can't figure out why their
postgres server won't start today:

If you are running Postgres inside Microsoft WSL (at least on Ubuntu, maybe
on others too), and just picked up a software update to version 11.2, you
will need to go into your /etc/postgresql.conf file and set fsync=off.

This took me a while to fix because the error you message you get if you
don't is the generic:

terminating connection because of crash of another server process
2015-07-15 20:18:37 UTC The postmaster has commanded this server process to
roll back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.

I spent a long time trying to completely uninstall and resintall, etc. to
recover from the "crash" although I don't think there ever was one and the
message appears on first use of the create database command even on a
completely clean install.

I don't know if this is possible/reasonable, but if the database code could
automatically turn fsync off on WSL it might save the next users some
trouble.


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-14 Thread Bruce Klein
Thanks Tom I feel like I'm in a little over my head here but I'll try to
help as I can.

With fsync off, everything appears to run as it did before on 11.1.

With fsync default/on, the problem is easily reproducible by trying to
create a database. I believe the very first time I saw it it was with a
routine query but I'm not 100% sure.

psql-11.2=> create database testdb;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Failed.
!>

Here are the entries from the log:
1527 2019-02-14 15:06:08.218 DST [8398] PANIC:  could not flush dirty data:
Function not implemented
1528 2019-02-14 15:06:08.218 DST [8396] LOG:  checkpointer process (PID
8398) was terminated by signal 6: Aborted
1529 2019-02-14 15:06:08.218 DST [8396] LOG:  terminating any other active
server processes
1530 2019-02-14 15:06:08.218 DST [8422] homestead@homestead WARNING:
terminating connection because of crash of another server process
1531 2019-02-14 15:06:08.218 DST [8422] homestead@homestead DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction an d exit, because another server process exited abnormally
and possibly corrupted shared memory.
1532 2019-02-14 15:06:08.218 DST [8422] homestead@homestead HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
1533 2019-02-14 15:06:08.218 DST [8401] WARNING:  terminating connection
because of crash of another server process
1534 2019-02-14 15:06:08.218 DST [8401] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because anot her server process exited abnormally and possibly
corrupted shared memory.
1535 2019-02-14 15:06:08.218 DST [8401] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
1536 2019-02-14 15:06:08.241 DST [8396] LOG:  all server processes
terminated; reinitializing
1537 2019-02-14 15:06:08.259 DST [8433] LOG:  database system was
interrupted; last known up at 2019-02-14 15:05:30 DST
1538 2019-02-14 15:06:08.259 DST [8433] PANIC:  could not flush dirty data:
Function not implemented
1539 2019-02-14 15:06:08.264 DST [8396] LOG:  startup process (PID 8433)
was terminated by signal 6: Aborted
1540 2019-02-14 15:06:08.264 DST [8396] LOG:  aborting startup due to
startup process failure
1541 2019-02-14 15:06:08.266 DST [8434] homestead@homestead FATAL:  the
database system is in recovery mode
1542 2019-02-14 15:06:08.268 DST [8396] LOG:  database system is shut down

As to why it worked before, I don't think fsync() ever worked on WSL, and
there were places where you'd see warnings about it in 11.1, they just
wouldn't crash the server.

As to the "running with scissors" risk, I'm going to guess the most common
use case for WSL is as a personal dev box where all the data is disposable
anyway. That's the case for me at least.

Best,
Bruce

On Thu, Feb 14, 2019 at 2:48 PM Tom Lane  wrote:

> Bruce Klein  writes:
> > If you are running Postgres inside Microsoft WSL (at least on Ubuntu,
> maybe
> > on others too), and just picked up a software update to version 11.2, you
> > will need to go into your /etc/postgresql.conf file and set fsync=off.
>
> Hm.  Probably this is some unexpected problem with the
> panic-on-fsync-failure change; although that still leaves some things
> unexplained, because if fsync is failing for you now, why didn't it fail
> before?  Anyway, you might try experimenting with data_sync_retry,
> instead of running with scissors by turning off fsync altogether.
> See first item in the release notes:
>
> https://www.postgresql.org/docs/11/release-11-2.html
>
> Also, we'd quite like to hear more details; can you find any PANIC
> messages in the server log?
>
> regards, tom lane
>


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-14 Thread Bruce Klein
>  In 11.1 did you see the message "WARNING: could not flush dirty data:
Function not implemented"
Yes

re: the discussions of O/S and filesystem in that thread:
I am not qualified to describe the implementation of WSL but I believe it
is neither pure Ubuntu running on metal, nor a virtual machine hosted on
Windows. I believe what the Microsoft folks have done is implement
something around the driver/kernel layer that fools Ubuntu into thinking it
is connected to hardware it expects, while it is ultimately still running
on top of a Windows kernel and Windows filesystem. That includes stubbing
out or otherwise presenting an appearance of implementing some functions
like perhaps fsync() that it really doesn't. Note I believe this is
fundamentally different from the old Cygwin and similar projects approach,
i.e. WSL does not involve recompiling on top of window specific libraries
etc. If any of these details are important to anyone you should verify them
from a more credible source.

If it matters, the Ubuntu version I am running on WSL now is 16.04.5.


On Thu, Feb 14, 2019 at 3:44 PM Ravi Krishna  wrote:

> Hi Bruce,
>
> Check my earlier thread on PG 10.5 on Ubuntu Bash with WSL.
>
>
> https://www.postgresql.org/message-id/1301077575.68539.1535929075959%40mail.yahoo.com
>
> In 11.1 did you see the message "WARNING: could not flush dirty data:
> Function not implemented"
>
> regards
>


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-15 Thread Bruce Klein
>  I guess the OP is reporting about a .deb that was built on a real Linux
system

Yes, I (OP) installed via:
  % wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc |
sudo apt-key add -
  % sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/
$(lsb_release -sc)-pgdg main" > /etc/apt/sources.list.d/PostgreSQL.list'
  % sudo apt update
  % sudo apt-get install postgresql-11

> no one bothered to complain about PostgreSQL spewing scary looking
warnings on WSL for years

At least you weren't spamming a once-per-second(!) log entry about a
missing function call like one of my other packages did (can't remember,
maybe it was nginx?)

WSL still feels early and if you're going to try it, you get used to
annoyances like that. I'm glad Microsoft is trying though and I hope with
time and support they get all the way there because developers who have
enterprise or other reasons to be on Windows instead of Mac desktops
deserve to have decent unix tools too. Warts and all I still find it
overall more convenient and fluid than my previous VirtualBox / vagrant
solution.

On Fri, Feb 15, 2019 at 11:20 AM Thomas Munro 
wrote:

> On Sat, Feb 16, 2019 at 6:50 AM Andres Freund  wrote:
> > On February 15, 2019 9:44:50 AM PST, Tom Lane  wrote:
> > >Andres Freund  writes:
> > >> On February 15, 2019 9:13:10 AM PST, Tom Lane 
> > >wrote:
> > >>> I'm of the opinion that we shouldn't be panicking for
> > >sync_file_range
> > >>> failure, period.
> > >
> > >> With some flags it's strictly required, it does"eat"errors depending
> > >on the flags. So I'm not sure I understand?
> > >
> > >Really?  The specification says that it starts I/O, not that it waits
> > >around for any to finish.
> >
> > That depends on the flags you pass in. By memory I don't think it eats
> an error with our flags in recent kernels, but I'm not sure.
>
> Right, there was some discussion of that, and I didn't (and still
> don't) think it'd be wise to rely on undocumented knowledge about
> which flags can eat errors based on a drive-by reading of a particular
> snapshot of the Linux tree.  The man page says it can return EIO; I
> think we should assume that it might actually do that.
>
> BTW I had a report from someone on IRC that PostgreSQL breaks in other
> ways (not yet understood) if you build it directly on WSL/Ubuntu.  I
> guess the OP is reporting about a .deb that was built on a real Linux
> system.  I'm vaguely familiar with these types of problems from other
> platforms (Linux syscall emulation on FreeBSD and Sun-ish systems, and
> also I'm old enough to remember people doing SCO SysV syscall
> emulation on Linux systems back before certain valuable software was
> available natively); it's possible that you get ENOSYS on other
> emulators too, considering that other kernels don't seem to have a
> sync_file_range()-like facility, but probably no one cares, since
> there is no reason to run PostgreSQL on a syscall emulator when you
> can run it natively.  This is a bit different though: I guess people
> want to be able to develop Linux-stack stuff on company-issued Windows
> computers for later deployment on Linux servers; someone interested in
> this would ideally make it work and set up a build farm animal to tell
> us when we break it.  It would probably require only minimal changes,
> but considering that no one bothered to complain about PostgreSQL
> spewing scary looking warnings on WSL for years, it's not too
> surprising that we didn't consider this case before.  A bit like the
> nightjar case, the PANIC patch revealed a pre-existing problem that
> had gone unreported and needs some work, but it doesn't seem like a
> very good reason to roll back that part of the change completely IMHO.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-23 Thread Bruce Klein
Sounds good to me. Thank you!

On Fri, Feb 22, 2019 at 11:47 AM Thomas Munro 
wrote:

> On Tue, Feb 19, 2019 at 5:31 PM Thomas Munro 
> wrote:
> > On Tue, Feb 19, 2019 at 5:16 PM James Sewell 
> wrote:
> > >> Here's a starter patch that shows one of the approaches discussed.  It
> > >> gets WSL users to a better place than they were before, by suppressing
> > >> further warnings after the first one.
> > >
> > > This wasn't quite right, updated to check erro for ENOSYS (not rc)
> > >
> > > This compiles and stops the panic on WSL (with a single warning).
> > >
> > > I haven't tested if a version compiled on Linux will behave the same
> way - but based on the error messages in the top post it looks like the
> behavior is the same.
> >
> > Great.  Thanks for testing, and for the fix!  Well that all sounds
> > like good news: it corrects the behaviour from 11.2, and also improves
> > on the previous behaviour which I'd have accepted as a bug if anyone
> > had reported it.  So the next problem is that we don't have a
> > consensus on whether this is the right approach, so I don't feel like
> > I can commit it yet.  Does any want to make another concrete proposal?
>
> Ok, here's the version I'm planning to push soon if there are no
> objections.
> Re-adding Bruce to the thread, as I just noticed the CC list got
> pruned at some point in this thread.
>
> --
> Thomas Munro
> https://enterprisedb.com
>


Re: Methods to quickly spin up copies of an existing databases

2019-03-01 Thread Bruce Klein
Apologies for the low tech suggestion, but if this really is a clone of a
previously existing template, could the clone operation just be done ahead
of time? I.e., have the build server keep X copies ready for use and
generate additional copies as those are consumed, so that the cloning is no
longer on the critical path?

On Fri, Mar 1, 2019 at 11:09 AM Jerry Sievers 
wrote:

> Kenneth Marshall  writes:
>
> > On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote:
> >
> >> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to
> >> store your database, then it might work for you. ZFS is
> >> copy-on-write so it can very quickly clone a database.
> >>
> >> kevin
> >
> > Hi Arjun
> >
> > Redhat 7 does have LVM snapshots that does something similar. Kevin is
> > correct, COW is the secret.
>
> Going a bit further...
>
> Any sort of storage backend that can support *atomic* snapshots across
> *all* volumes (in case multiple tablespaces ar involved), can be used to
> permit $instantaneous cloning where instantaneous relates to the actual
> snapshot time and crash recovery.
>
> Inability to make *atomic* snaps but perhaps seperate snaps very
> quickly, combined with PITR can result in clones of high-churn systems
> sized in TBs (as in our use case) to be provisioned in about 1 minute.
>
> Nothing but the most trivial system can be cloned rapidly and perhaps
> any number of times in succession without employment of
> thin-provisioning, copy-on-write (as mentioned already), etc.
>
>Virtual copy is more and more compelling as physical
>size, or more precisely, *physical* copy time grow.
>
> HTH
>
>
>
> >
> > Regards,
> > Ken
> >
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
>
>