Guillaume: the cluster I try to start is the one used with pg_basebackup, not the result of the backup.
2015-02-09 12:51 GMT-05:00 Guillaume Lelarge <guilla...@lelarge.info>: > 2015-02-09 18:40 GMT+01:00 Guillaume Drolet <droletguilla...@gmail.com>: > >> I tried starting the cluster again. Once again everything looked fine at >> the start (first three lines of this log, in English this time): >> >> 2015-02-09 11:40:55 EST LOG: database system was shut down at 2015-02-06 >> 09:50:21 EST >> 2015-02-09 11:40:55 EST LOG: database system is ready to accept >> connections >> 2015-02-09 11:40:55 EST LOG: autovacuum launcher started >> >> Since it seemed to work, I opened the terminal and tried connecting to >> the database: >> >> C:\Users\admlocal>psql -U postgres -d mortalite >> >> So far so good, I got a connection: >> >> psql (9.3.5) >> Attention : l'encodage console (850) diffère de l'encodage Windows (1252). >> Les caractères 8 bits peuvent ne pas fonctionner correctement. >> Voir la section « Notes aux utilisateurs de Windows » de la >> page >> référence de psql pour les détails. >> Saisissez « help » pour l'aide. >> >> mortalite=# >> >> I tried the help command and it worked: >> >> mortalite=# help >> Vous utilisez psql, l'interface en ligne de commande de PostgreSQL. >> Saisissez: >> \copyright pour les termes de distribution >> \h pour l'aide-mémoire des commandes SQL >> \? pour l'aide-mémoire des commandes psql >> \g ou point-virgule en fin d'instruction pour exécuter la requête >> \q pour quitter >> >> >> But then when I tried to query the db, it crashed: >> >> mortalite=# \dt >> la connexion au serveur a été coupée de façon inattendue >> Le serveur s'est peut-être arrêté anormalement avant ou durant le >> traitement de la requête. >> La connexion au serveur a été perdue. Tentative de réinitialisation : >> Échec. >> !> >> >> And here's the rest of the log file after the crash: >> >> 2015-02-09 12:29:19 EST LOG: server process (PID 2240) was terminated by >> exception 0xC0000005 >> 2015-02-09 12:29:19 EST DETAIL: Failed process was running: SELECT >> n.nspname as "Schema", >> c.relname as "Name", >> CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' >> THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' >> WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type", >> pg_catalog.pg_get_userbyid(c.relowner) as "Owner" >> FROM pg_catalog.pg_class c >> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace >> WHERE c.relkind IN ('r','') >> AND n.nspname <> 'pg_catalog' >> AND n.nspname <> 'information_schema' >> AND n.nspname !~ '^pg_toast' >> AND pg_catalog.pg_table_is_visible(c.oid) >> ORDER BY 1,2; >> 2015-02-09 12:29:19 EST HINT: See C include file "ntstatus.h" for a >> description of the hexadecimal value. >> 2015-02-09 12:29:19 EST LOG: terminating any other active server >> processes >> 2015-02-09 12:29:19 EST WARNING: terminating connection because of crash >> of another server process >> 2015-02-09 12:29:19 EST DETAIL: The postmaster has commanded this server >> process to roll back the current transaction and exit, because another >> server process exited abnormally and possibly corrupted shared memory. >> 2015-02-09 12:29:19 EST HINT: In a moment you should be able to >> reconnect to the database and repeat your command. >> 2015-02-09 12:29:19 EST LOG: archiver process (PID 4576) exited with >> exit code 1 >> 2015-02-09 12:29:19 EST LOG: all server processes terminated; >> reinitializing >> 2015-02-09 12:29:29 EST FATAL: pre-existing shared memory block is still >> in use >> 2015-02-09 12:29:29 EST HINT: Check if there are any old server >> processes still running, and terminate them. >> >> According to this page >> <https://msdn.microsoft.com/en-ca/library/cc704588.aspx>, exception >> 0xC0000005 means STATUS_ACCESS_VIOLATION - The instruction at 0x%08lx >> referenced memory at 0x%08lx. The memory could not be %s. This is not of >> much help to me. >> >> I hope these additional bits of information can help someone figuring out >> a solution to get my cluster up and running again. >> >> PS. I was thinking of reinstalling PGSQL over my current install but >> keeping my PGDATA. I've done it in the past for fixing problems with >> starting the service and it worked. What do you think? >> >> > The SQL query you see in your log is the result of your \dt. It should > work. I don't see why reinstalling PostgreSQL will fix anything here. > > You said you did a pg_basebackup. The cluster you try to start is the > cluster used with pg_basebackup or a restore done with the pg_basebackup? > > Cheers, >> >> Guillaume >> >> >> >> >> >> 2015-02-09 11:37 GMT-05:00 Guillaume Drolet <droletguilla...@gmail.com>: >> >> >>> >>> 2015-02-07 1:24 GMT-05:00 Guillaume Lelarge <guilla...@lelarge.info>: >>> >>>> Le 6 févr. 2015 17:31, "Adrian Klaver" <adrian.kla...@aklaver.com> a >>>> écrit : >>>> > >>>> > On 02/06/2015 05:03 AM, Guillaume Drolet wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> Yesterday I ran a pg_basebackup of my cluster. Since it has >>>> completed, >>>> >> my cluster doesn't work properly. I tried restarting the computer (or >>>> >> service) a few times but I always get the same messages in my logs >>>> (it's >>>> >> in French. If someone is willing to help me I can try to translate >>>> the >>>> >> logs. Just ask): >>>> > >>>> > >>>> > Enter Google Translate:) >>>> > >>>> >>>> But first, Guillaume, do yourself and everyone else a favor: turn the >>>> dam log into English. Set lc_messages to 'C' in postgresql.conf. >>>> >>> Thanks for this! I didn't know about this great feature. >>> >>> >>>> > First some questions: >>>> > >>>> > 1) What Postgres version? >>>> > >>>> > 2) What OS(s)? I am assuming Windows from the log info below, but we >>>> all know what assuming gets you. >>>> > >>>> > 3) Where were you backing up from and to? >>>> > >>>> > 4) Which cluster does not start, the master or the child you created >>>> with pg_basebackup? >>>> > >>>> > >>>> >> >>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données a été >>>> >> interrompu ; dernier lancement connu à 2015-02-06 07:05:05 EST >>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données n'a pas >>>> été >>>> >> arrêté proprement ; restauration >>>> >> automatique en cours >>>> >> 2015-02-06 07:11:38 EST LOG: record with zero length at 24B/2C000160 >>>> >> 2015-02-06 07:11:38 EST LOG: la ré-exécution n'est pas nécessaire >>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données est prêt >>>> >> pour accepter les connexions >>>> >> 2015-02-06 07:11:38 EST LOG: lancement du processus autovacuum >>>> >> 2015-02-06 07:11:38 EST FATAL: le rôle « 208375PT$ » n'existe pas >>>> > >>>> > >>>> > So where is role 208375PT$ supposed to come from? >>>> > >>>> > >>>> >> >>>> >> Then if I start pgAdmin I get a series of pop-ups I have to click OK >>>> to >>>> >> to continue: >>>> >> >>>> >> An error has ocurred: Column not found in pgSet: "datlastsysoid" >>>> >> An error has ocurred: Column not found in pgSet: datlastsysoid >>>> >> An error has ocurred: Column not found in pgSet: oid >>>> >> An error has ocurred: Column not found in pgSet: encoding >>>> >> An error has ocurred: Column not found in pgSet: Connection to >>>> database >>>> >> broken >>>> > >>>> > >>>> > Not sure about that this, someone more versed in pgAdmin will have to >>>> answer. >>>> > >>>> >>>> Usually you see these messages when you're using a pgadmin major >>>> release older than a PostgreSQL make release. For a 9.3 release, that would >>>> mean a pgadmin older than 1.18. >>>> >>> >>> I'm running pgadmin 1.18.1 >>> >>>> > >>>> >> >>>> >> And after that, I went back to the log file and there's new >>>> information >>>> >> added: >>>> >> >>>> >> 2015-02-06 07:51:05 EST LOG: processus serveur (PID 184) a été >>>> arrêté >>>> >> par l'exception 0x80000004 >>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué exécutait >>>> : >>>> >> SELECT version(); >>>> >> 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier d'en-tête C « >>>> >> ntstatus.h » pour une description de la valeur >>>> >> hexadécimale. >>>> > >>>> > >>>> > Well according to here: >>>> > >>>> > https://msdn.microsoft.com/en-us/library/cc704588.aspx >>>> > >>>> > 0x80000004 >>>> > STATUS_SINGLE_STEP >>>> > >>>> > >>>> > {EXCEPTION} Single Step A single step or trace operation has just >>>> been completed. >>>> > >>>> > A developer is going to have explain what that means. >>>> > >>>> > >>>> > >>>> >> 2015-02-06 07:51:05 EST LOG: arrêt des autres processus serveur >>>> actifs >>>> >> 2015-02-06 07:51:05 EST ATTENTION: arrêt de la connexion à cause de >>>> >> l'arrêt brutal d'un autre processus serveur >>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le postmaster a commandé à ce >>>> processus >>>> >> serveur d'annuler la transaction >>>> >> courante et de quitter car un autre processus serveur a quitté >>>> >> anormalement >>>> >> et qu'il existe probablement de la mémoire partagée corrompue. >>>> >> 2015-02-06 07:51:05 EST ASTUCE : Dans un moment, vous devriez être >>>> >> capable de vous reconnecter à la base de >>>> >> données et de relancer votre commande. >>>> >> 2015-02-06 07:51:05 EST LOG: processus d'archivage (PID 692) quitte >>>> >> avec le code de sortie 1 >>>> >> 2015-02-06 07:51:05 EST LOG: tous les processus serveur se sont >>>> >> arrêtés, réinitialisation >>>> >> 2015-02-06 07:51:15 EST FATAL: le bloc de mémoire partagé >>>> pré-existant >>>> >> est toujours en cours d'utilisation >>>> >> 2015-02-06 07:51:15 EST ASTUCE : Vérifier s'il n'y a pas de vieux >>>> >> processus serveur en cours d'exécution. Si c'est le >>>> >> cas, fermez-les. >>>> >> >>>> >> I was about to try restarting postgresql using the base backup I made >>>> >> yesterday but since this means I'll have to copy my database again >>>> (700 >>>> >> GB takes a while...) I am looking for a better solution from more >>>> >> experienced people. >>>> > >>>> > >>>> > >>>> > My suspicion is you copied at least partly over a running server. >>>> > >>>> >>> >>> >> > > > -- > Guillaume. > http://blog.guillaume.lelarge.info > http://www.dalibo.com >