Re: [GENERAL] Cluster seems broken after pg_basebackup

Guillaume Drolet Mon, 09 Feb 2015 10:41:17 -0800

Guillaume: the cluster I try to start is the one used with pg_basebackup,
not the result of the backup.


2015-02-09 12:51 GMT-05:00 Guillaume Lelarge <guilla...@lelarge.info>:

> 2015-02-09 18:40 GMT+01:00 Guillaume Drolet <droletguilla...@gmail.com>:
>
>> I tried starting the cluster again. Once again everything looked fine at
>> the start (first three lines of this log, in English this time):
>>
>> 2015-02-09 11:40:55 EST LOG:  database system was shut down at 2015-02-06
>> 09:50:21 EST
>> 2015-02-09 11:40:55 EST LOG:  database system is ready to accept
>> connections
>> 2015-02-09 11:40:55 EST LOG:  autovacuum launcher started
>>
>> Since it seemed to work, I opened the terminal and tried connecting to
>> the database:
>>
>> C:\Users\admlocal>psql -U postgres -d mortalite
>>
>> So far so good, I got a connection:
>>
>> psql (9.3.5)
>> Attention : l'encodage console (850) diffère de l'encodage Windows (1252).
>>             Les caractères 8 bits peuvent ne pas fonctionner correctement.
>>             Voir la section « Notes aux utilisateurs de Windows » de la
>> page
>>             référence de psql pour les détails.
>> Saisissez « help » pour l'aide.
>>
>> mortalite=#
>>
>> I tried the help command and it worked:
>>
>> mortalite=# help
>> Vous utilisez psql, l'interface en ligne de commande de PostgreSQL.
>> Saisissez:
>>     \copyright pour les termes de distribution
>>     \h pour l'aide-mémoire des commandes SQL
>>     \? pour l'aide-mémoire des commandes psql
>>     \g ou point-virgule en fin d'instruction pour exécuter la requête
>>     \q pour quitter
>>
>>
>> But then when I tried to query the db, it crashed:
>>
>> mortalite=# \dt
>> la connexion au serveur a été coupée de façon inattendue
>>         Le serveur s'est peut-être arrêté anormalement avant ou durant le
>>         traitement de la requête.
>> La connexion au serveur a été perdue. Tentative de réinitialisation :
>> Échec.
>> !>
>>
>> And here's the rest of the log file after the crash:
>>
>> 2015-02-09 12:29:19 EST LOG:  server process (PID 2240) was terminated by
>> exception 0xC0000005
>> 2015-02-09 12:29:19 EST DETAIL:  Failed process was running: SELECT
>> n.nspname as "Schema",
>>       c.relname as "Name",
>>       CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm'
>> THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence'
>> WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
>>       pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
>>     FROM pg_catalog.pg_class c
>>          LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
>>     WHERE c.relkind IN ('r','')
>>           AND n.nspname <> 'pg_catalog'
>>           AND n.nspname <> 'information_schema'
>>           AND n.nspname !~ '^pg_toast'
>>       AND pg_catalog.pg_table_is_visible(c.oid)
>>     ORDER BY 1,2;
>> 2015-02-09 12:29:19 EST HINT:  See C include file "ntstatus.h" for a
>> description of the hexadecimal value.
>> 2015-02-09 12:29:19 EST LOG:  terminating any other active server
>> processes
>> 2015-02-09 12:29:19 EST WARNING:  terminating connection because of crash
>> of another server process
>> 2015-02-09 12:29:19 EST DETAIL:  The postmaster has commanded this server
>> process to roll back the current transaction and exit, because another
>> server process exited abnormally and possibly corrupted shared memory.
>> 2015-02-09 12:29:19 EST HINT:  In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2015-02-09 12:29:19 EST LOG:  archiver process (PID 4576) exited with
>> exit code 1
>> 2015-02-09 12:29:19 EST LOG:  all server processes terminated;
>> reinitializing
>> 2015-02-09 12:29:29 EST FATAL:  pre-existing shared memory block is still
>> in use
>> 2015-02-09 12:29:29 EST HINT:  Check if there are any old server
>> processes still running, and terminate them.
>>
>> According to this page
>> <https://msdn.microsoft.com/en-ca/library/cc704588.aspx>, exception
>> 0xC0000005 means STATUS_ACCESS_VIOLATION - The instruction at 0x%08lx
>> referenced memory at 0x%08lx. The memory could not be %s. This is not of
>> much help to me.
>>
>> I hope these additional bits of information can help someone figuring out
>> a solution to get my cluster up and running again.
>>
>> PS. I was thinking of reinstalling PGSQL over my current install but
>> keeping my PGDATA. I've done it in the past for fixing problems with
>> starting the service and it worked. What do you think?
>>
>>
> The SQL query you see in your log is the result of your \dt. It should
> work. I don't see why reinstalling PostgreSQL will fix anything here.
>
> You said you did a pg_basebackup. The cluster you try to start is the
> cluster used with pg_basebackup or a restore done with the pg_basebackup?
>
> Cheers,
>>
>> Guillaume
>>
>>
>>
>>
>>
>> 2015-02-09 11:37 GMT-05:00 Guillaume Drolet <droletguilla...@gmail.com>:
>>
>>
>>>
>>> 2015-02-07 1:24 GMT-05:00 Guillaume Lelarge <guilla...@lelarge.info>:
>>>
>>>> Le 6 févr. 2015 17:31, "Adrian Klaver" <adrian.kla...@aklaver.com> a
>>>> écrit :
>>>> >
>>>> > On 02/06/2015 05:03 AM, Guillaume Drolet wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> Yesterday I ran a pg_basebackup of my cluster. Since it has
>>>> completed,
>>>> >> my cluster doesn't work properly. I tried restarting the computer (or
>>>> >> service) a few times but I always get the same messages in my logs
>>>> (it's
>>>> >> in French. If someone is willing to help me I can try to translate
>>>> the
>>>> >> logs. Just ask):
>>>> >
>>>> >
>>>> > Enter Google Translate:)
>>>> >
>>>>
>>>> But first, Guillaume, do yourself and everyone else a favor: turn the
>>>> dam log into English. Set lc_messages to 'C' in postgresql.conf.
>>>>
>>> Thanks for this! I didn't know about this great feature.
>>>
>>>
>>>> > First some questions:
>>>> >
>>>> > 1) What Postgres version?
>>>> >
>>>> > 2) What OS(s)? I am assuming Windows from the log info below, but we
>>>> all know what assuming gets you.
>>>> >
>>>> > 3) Where were you backing up from and to?
>>>> >
>>>> > 4) Which cluster does not start, the master or the child you created
>>>> with pg_basebackup?
>>>> >
>>>> >
>>>> >>
>>>> >> 2015-02-06 07:11:38 EST LOG:  le système de bases de données a été
>>>> >> interrompu ; dernier lancement connu à 2015-02-06 07:05:05 EST
>>>> >> 2015-02-06 07:11:38 EST LOG:  le système de bases de données n'a pas
>>>> été
>>>> >> arrêté proprement ; restauration
>>>> >>      automatique en cours
>>>> >> 2015-02-06 07:11:38 EST LOG:  record with zero length at 24B/2C000160
>>>> >> 2015-02-06 07:11:38 EST LOG:  la ré-exécution n'est pas nécessaire
>>>> >> 2015-02-06 07:11:38 EST LOG:  le système de bases de données est prêt
>>>> >> pour accepter les connexions
>>>> >> 2015-02-06 07:11:38 EST LOG:  lancement du processus autovacuum
>>>> >> 2015-02-06 07:11:38 EST FATAL:  le rôle « 208375PT$ » n'existe pas
>>>> >
>>>> >
>>>> > So where is role 208375PT$ supposed to come from?
>>>> >
>>>> >
>>>> >>
>>>> >> Then if I start pgAdmin I get a series of pop-ups I have to click OK
>>>> to
>>>> >> to continue:
>>>> >>
>>>> >> An error has ocurred: Column not found in pgSet: "datlastsysoid"
>>>> >> An error has ocurred: Column not found in pgSet: datlastsysoid
>>>> >> An error has ocurred: Column not found in pgSet: oid
>>>> >> An error has ocurred: Column not found in pgSet: encoding
>>>> >> An error has ocurred: Column not found in pgSet: Connection to
>>>> database
>>>> >> broken
>>>> >
>>>> >
>>>> > Not sure about that this, someone more versed in pgAdmin will have to
>>>> answer.
>>>> >
>>>>
>>>> Usually you see these messages when you're using a pgadmin major
>>>> release older than a PostgreSQL make release. For a 9.3 release, that would
>>>> mean a pgadmin older than 1.18.
>>>>
>>>
>>> I'm running pgadmin 1.18.1
>>>
>>>> >
>>>> >>
>>>> >> And after that, I went back to the log file and there's new
>>>> information
>>>> >> added:
>>>> >>
>>>> >> 2015-02-06 07:51:05 EST LOG:  processus serveur (PID 184) a été
>>>> arrêté
>>>> >> par l'exception 0x80000004
>>>> >> 2015-02-06 07:51:05 EST DÉTAIL:  Le processus qui a échoué exécutait
>>>> :
>>>> >> SELECT version();
>>>> >> 2015-02-06 07:51:05 EST ASTUCE :  Voir le fichier d'en-tête C «
>>>> >> ntstatus.h » pour une description de la valeur
>>>> >>      hexadécimale.
>>>> >
>>>> >
>>>> > Well according to here:
>>>> >
>>>> > https://msdn.microsoft.com/en-us/library/cc704588.aspx
>>>> >
>>>> > 0x80000004
>>>> > STATUS_SINGLE_STEP
>>>> >
>>>> >
>>>> > {EXCEPTION} Single Step A single step or trace operation has just
>>>> been completed.
>>>> >
>>>> > A developer is going to have explain what that means.
>>>> >
>>>> >
>>>> >
>>>> >> 2015-02-06 07:51:05 EST LOG:  arrêt des autres processus serveur
>>>> actifs
>>>> >> 2015-02-06 07:51:05 EST ATTENTION:  arrêt de la connexion à cause de
>>>> >> l'arrêt brutal d'un autre processus serveur
>>>> >> 2015-02-06 07:51:05 EST DÉTAIL:  Le postmaster a commandé à ce
>>>> processus
>>>> >> serveur d'annuler la transaction
>>>> >>      courante et de quitter car un autre processus serveur a quitté
>>>> >> anormalement
>>>> >>      et qu'il existe probablement de la mémoire partagée corrompue.
>>>> >> 2015-02-06 07:51:05 EST ASTUCE :  Dans un moment, vous devriez être
>>>> >> capable de vous reconnecter à la base de
>>>> >>      données et de relancer votre commande.
>>>> >> 2015-02-06 07:51:05 EST LOG:  processus d'archivage (PID 692) quitte
>>>> >> avec le code de sortie 1
>>>> >> 2015-02-06 07:51:05 EST LOG:  tous les processus serveur se sont
>>>> >> arrêtés, réinitialisation
>>>> >> 2015-02-06 07:51:15 EST FATAL:  le bloc de mémoire partagé
>>>> pré-existant
>>>> >> est toujours en cours d'utilisation
>>>> >> 2015-02-06 07:51:15 EST ASTUCE :  Vérifier s'il n'y a pas de vieux
>>>> >> processus serveur en cours d'exécution. Si c'est le
>>>> >>      cas, fermez-les.
>>>> >>
>>>> >> I was about to try restarting postgresql using the base backup I made
>>>> >> yesterday but since this means I'll have to copy my database again
>>>> (700
>>>> >> GB takes a while...) I am looking for a better solution from more
>>>> >> experienced people.
>>>> >
>>>> >
>>>> >
>>>> > My suspicion is you copied at least partly over a running server.
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Guillaume.
>   http://blog.guillaume.lelarge.info
>   http://www.dalibo.com
>

Re: [GENERAL] Cluster seems broken after pg_basebackup

Reply via email to