[Cloud] Puppet failures?

2024-10-22 Thread Travis Briggs
Hi,

I woke up this morning to "Puppet failure on" alerts for every cloud
VPS server I have access to. I basically just ignored it because I assumed
that many failures was something systemic, and I would wait for an
announcement from the cloud services team.

Before I try to debug something this gnarly, *is* there something going on?
Or is it just coincidence that 8 machines across two projects got these
failures this morning?

Thanks,
-Travis
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Puppet failures?

2024-10-22 Thread Bryan Davis
On Tue, Oct 22, 2024 at 3:33 PM Travis Briggs  wrote:
>
> Hi,
>
> I woke up this morning to "Puppet failure on" alerts for every cloud VPS 
> server I have access to. I basically just ignored it because I assumed that 
> many failures was something systemic, and I would wait for an announcement 
> from the cloud services team.
>
> Before I try to debug something this gnarly, *is* there something going on? 
> Or is it just coincidence that 8 machines across two projects got these 
> failures this morning?

You can likely ignore any Puppet failure messages you got on
2024-10-22. There was an unexpected failure of the shared Puppet
server that caused all Puppet runs to fail until it was corrected. See
https://phabricator.wikimedia.org/T377803 for more details.

Bryan
-- 
Bryan DavisWikimedia Foundation
Principal Software Engineer   Boise, ID USA
[[m:User:BDavis_(WMF)]]  irc: bd808
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Puppet failures?

2024-10-22 Thread Travis Briggs
Thanks Bryan!

-Travis

On Tue, Oct 22, 2024 at 5:01 PM Bryan Davis  wrote:

> On Tue, Oct 22, 2024 at 3:33 PM Travis Briggs  wrote:
> >
> > Hi,
> >
> > I woke up this morning to "Puppet failure on" alerts for every cloud
> VPS server I have access to. I basically just ignored it because I assumed
> that many failures was something systemic, and I would wait for an
> announcement from the cloud services team.
> >
> > Before I try to debug something this gnarly, *is* there something going
> on? Or is it just coincidence that 8 machines across two projects got these
> failures this morning?
>
> You can likely ignore any Puppet failure messages you got on
> 2024-10-22. There was an unexpected failure of the shared Puppet
> server that caused all Puppet runs to fail until it was corrected. See
> https://phabricator.wikimedia.org/T377803 for more details.
>
> Bryan
> --
> Bryan DavisWikimedia Foundation
> Principal Software Engineer   Boise, ID USA
> [[m:User:BDavis_(WMF)]]  irc: bd808
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Fwd: [Cloud VPS alert][wikispeech] Puppet failure on producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)

2024-10-22 Thread Dreamy Jazz
I also got this error today for my instance.

Dreamy Jazz

On Tue, 22 Oct 2024 at 09:57, Zoran Dori  wrote:

> Hi,
> I would try with running apt update/apt upgrade, then rebooting the VM.
>
> That usually fixed things for me before when I had a VM.
>
> Best regards,
> Zoran
>
> уто, 22. окт 2024. 10:54 Sebastian Berlin 
> је написао/ла:
>
>> I received this email for four instances in two different projects.
>> Running `sudo run-puppet-agent` gave the same error. Do I need to do
>> anything?
>>
>> *Sebastian Berlin*
>> Utvecklare/*Developer*
>> Wikimedia Sverige (WMSE)
>>
>> E-post/*E-Mail*: sebastian.ber...@wikimedia.se
>> Telefon/*Phone*: (+46) 0707 - 92 03 84
>>
>>
>> -- Forwarded message -
>> From: root 
>> Date: Tue, 22 Oct 2024 at 10:16
>> Subject: [Cloud VPS alert][wikispeech] Puppet failure on
>> producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)
>> To: 
>>
>>
>>
>> Puppet is having issues on the
>> "producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)" instance in
>> project
>> wikispeech in Wikimedia Cloud VPS.
>>
>> Puppet is running with failures.
>>
>> Working Puppet runs are needed to maintain instance security and logins.
>> As long as Puppet continues to fail, this system is in danger of becoming
>> unreachable.
>>
>> You are receiving this email because you are listed as member for the
>> project that contains this instance.  Please take steps to repair
>> this instance or contact a Cloud VPS admin for assistance.
>>
>> If your host is expected to fail puppet runs and you want to disable this
>> alert, you can create a file under /.no-puppet-checks, that will skip the
>> checks.
>>
>> You might find some help here:
>>
>> https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Cloud_VPS_alert_Puppet_failure_on
>>
>> For further support, visit #wikimedia-cloud on libera.chat or
>> 
>>
>> Some extra info follows:
>>  Last run summary:
>> application:
>>   converged_environment: production
>>   initial_environment: production
>>   run_mode: agent
>> changes:
>>   total: 0
>> events:
>>   failure: 0
>>   success: 0
>>   total: 0
>> resources:
>>   changed: 0
>>   corrective_change: 0
>>   failed: 0
>>   failed_to_restart: 0
>>   out_of_sync: 0
>>   restarted: 0
>>   scheduled: 0
>>   skipped: 0
>>   total: 0
>> time:
>>   fact_generation: 0.5404427628964186
>>   last_run: 1729583430
>>   plugin_sync: 0.7785281259566545
>>   startup_time: 0.652441437
>>   total: 2.206511162
>> version:
>>   config: null
>>   puppet: 7.23.0
>>
>>
>>  Failed resources if any:
>>
>>   No failed resources.
>>
>> --- Last run log:
>>
>> ERR: Could not retrieve catalog from remote server: Error 500 on SERVER:
>> Server Error: Failed when searching for node
>> producer.wikispeech.eqiad1.wikimedia.cloud: Exception while executing
>> '/usr/local/bin/puppet-enc': Cannot run program "/usr/local/bin/puppet-enc"
>> (in directory "."): error=0, Failed to exec spawn helper: pid: 869822, exit
>> value: 1
>> WARNING: Not using cache on failed catalog
>> ERR: Could not retrieve catalog; skipping run
>>
>>  Exceptions that happened when running the script if any:
>>   No exceptions happened.
>>
>> ___
>> Cloud mailing list -- cloud@lists.wikimedia.org
>> List information:
>> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Fwd: [Cloud VPS alert][wikispeech] Puppet failure on producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)

2024-10-22 Thread Arturo Borrero Gonzalez

On 10/22/24 12:13, Dreamy Jazz wrote:

I also got this error today for my instance.



Hi there,

I can confirm there was a problem today with puppetservers because a Java 
upgrade.

See here for details:

https://phabricator.wikimedia.org/T377803

I think the problem should be solved now.

regards.
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Fwd: [Cloud VPS alert][wikispeech] Puppet failure on producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)

2024-10-22 Thread Sebastian Berlin
I received this email for four instances in two different projects. Running
`sudo run-puppet-agent` gave the same error. Do I need to do anything?

*Sebastian Berlin*
Utvecklare/*Developer*
Wikimedia Sverige (WMSE)

E-post/*E-Mail*: sebastian.ber...@wikimedia.se
Telefon/*Phone*: (+46) 0707 - 92 03 84


-- Forwarded message -
From: root 
Date: Tue, 22 Oct 2024 at 10:16
Subject: [Cloud VPS alert][wikispeech] Puppet failure on
producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)
To: 



Puppet is having issues on the "producer.wikispeech.eqiad1.wikimedia.cloud
(172.16.0.200)" instance in project
wikispeech in Wikimedia Cloud VPS.

Puppet is running with failures.

Working Puppet runs are needed to maintain instance security and logins.
As long as Puppet continues to fail, this system is in danger of becoming
unreachable.

You are receiving this email because you are listed as member for the
project that contains this instance.  Please take steps to repair
this instance or contact a Cloud VPS admin for assistance.

If your host is expected to fail puppet runs and you want to disable this
alert, you can create a file under /.no-puppet-checks, that will skip the
checks.

You might find some help here:

https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Cloud_VPS_alert_Puppet_failure_on

For further support, visit #wikimedia-cloud on libera.chat or


Some extra info follows:
 Last run summary:
application:
  converged_environment: production
  initial_environment: production
  run_mode: agent
changes:
  total: 0
events:
  failure: 0
  success: 0
  total: 0
resources:
  changed: 0
  corrective_change: 0
  failed: 0
  failed_to_restart: 0
  out_of_sync: 0
  restarted: 0
  scheduled: 0
  skipped: 0
  total: 0
time:
  fact_generation: 0.5404427628964186
  last_run: 1729583430
  plugin_sync: 0.7785281259566545
  startup_time: 0.652441437
  total: 2.206511162
version:
  config: null
  puppet: 7.23.0


 Failed resources if any:

  No failed resources.

--- Last run log:

ERR: Could not retrieve catalog from remote server: Error 500 on SERVER:
Server Error: Failed when searching for node
producer.wikispeech.eqiad1.wikimedia.cloud: Exception while executing
'/usr/local/bin/puppet-enc': Cannot run program "/usr/local/bin/puppet-enc"
(in directory "."): error=0, Failed to exec spawn helper: pid: 869822, exit
value: 1
WARNING: Not using cache on failed catalog
ERR: Could not retrieve catalog; skipping run

 Exceptions that happened when running the script if any:
  No exceptions happened.
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Fwd: [Cloud VPS alert][wikispeech] Puppet failure on producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)

2024-10-22 Thread Zoran Dori
Hi,
I would try with running apt update/apt upgrade, then rebooting the VM.

That usually fixed things for me before when I had a VM.

Best regards,
Zoran

уто, 22. окт 2024. 10:54 Sebastian Berlin 
је написао/ла:

> I received this email for four instances in two different projects.
> Running `sudo run-puppet-agent` gave the same error. Do I need to do
> anything?
>
> *Sebastian Berlin*
> Utvecklare/*Developer*
> Wikimedia Sverige (WMSE)
>
> E-post/*E-Mail*: sebastian.ber...@wikimedia.se
> Telefon/*Phone*: (+46) 0707 - 92 03 84
>
>
> -- Forwarded message -
> From: root 
> Date: Tue, 22 Oct 2024 at 10:16
> Subject: [Cloud VPS alert][wikispeech] Puppet failure on
> producer.wikispeech.eqiad1.wikimedia.cloud (172.16.0.200)
> To: 
>
>
>
> Puppet is having issues on the "producer.wikispeech.eqiad1.wikimedia.cloud
> (172.16.0.200)" instance in project
> wikispeech in Wikimedia Cloud VPS.
>
> Puppet is running with failures.
>
> Working Puppet runs are needed to maintain instance security and logins.
> As long as Puppet continues to fail, this system is in danger of becoming
> unreachable.
>
> You are receiving this email because you are listed as member for the
> project that contains this instance.  Please take steps to repair
> this instance or contact a Cloud VPS admin for assistance.
>
> If your host is expected to fail puppet runs and you want to disable this
> alert, you can create a file under /.no-puppet-checks, that will skip the
> checks.
>
> You might find some help here:
>
> https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Cloud_VPS_alert_Puppet_failure_on
>
> For further support, visit #wikimedia-cloud on libera.chat or
> 
>
> Some extra info follows:
>  Last run summary:
> application:
>   converged_environment: production
>   initial_environment: production
>   run_mode: agent
> changes:
>   total: 0
> events:
>   failure: 0
>   success: 0
>   total: 0
> resources:
>   changed: 0
>   corrective_change: 0
>   failed: 0
>   failed_to_restart: 0
>   out_of_sync: 0
>   restarted: 0
>   scheduled: 0
>   skipped: 0
>   total: 0
> time:
>   fact_generation: 0.5404427628964186
>   last_run: 1729583430
>   plugin_sync: 0.7785281259566545
>   startup_time: 0.652441437
>   total: 2.206511162
> version:
>   config: null
>   puppet: 7.23.0
>
>
>  Failed resources if any:
>
>   No failed resources.
>
> --- Last run log:
>
> ERR: Could not retrieve catalog from remote server: Error 500 on SERVER:
> Server Error: Failed when searching for node
> producer.wikispeech.eqiad1.wikimedia.cloud: Exception while executing
> '/usr/local/bin/puppet-enc': Cannot run program "/usr/local/bin/puppet-enc"
> (in directory "."): error=0, Failed to exec spawn helper: pid: 869822, exit
> value: 1
> WARNING: Not using cache on failed catalog
> ERR: Could not retrieve catalog; skipping run
>
>  Exceptions that happened when running the script if any:
>   No exceptions happened.
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/