[Cloud] Mono framework upgraded in toolforge
Hi, since the beginning of this month we found the need to upgrade the version of mono framework in Toolforge to something newer [0]. Affected tools/boots maintainers were aware of the upcoming changes, but in case some of you are developing a new tool/boot, please just note the change: mono-complete was upgraded: * from 3.2.8+dfsg-4ubuntu1.1 * to 5.12.0.226-0xamarin3+ubuntu1404b1 Users that were using their own mono framework versions (because ours was old) could try now using our new. Please, if you detect any regression, let me know, we can rollback this if required. [0] https://phabricator.wikimedia.org/T194665 ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] Heads up tools/bots using Mono/.NET in Toolforge/GridEngine
We upgraded the Mono/.NET framework in Toolforge/GridEngine from the 3.x version to 5.x [0]. We discovered that some tweaking is required due to some weird behavior regarding memory allocation by the framework [1]. The first symptom you will see is your boot doing high CPU load (spins). The fix is easy, just telling Mono that more memory is available when running the tool/bot. But you require to cancel your job submissions and resend. Please refer to the phabricator bug [1] for more details. Sorry for the inconvenience. [0] https://phabricator.wikimedia.org/T194665 [1] https://phabricator.wikimedia.org/T195834 ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] prometheus user issue
Hi! We deleted the prometheus user from LDAP and created it locally [0]. This may cause puppet failures, since there is a timeframe in which the id/gid in /var/lib/prometheus is the old LDAP one. We are running a massive, CloudVPS-wide deluser/adduser/chown operation to fix this. [0] https://phabricator.wikimedia.org/T196137 ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] Operation on Cloud VPS next monday 13th Aug
Hi! Next monday 13th we will be doing some maintenance on the main Cloud VPS deployment to merge the keystone service of both main and eqiad1 deployments (the new one that we will eventually put into production). Toolforge users will not be affected by this outage. Day: Monday 13th August Start time: 14:00 UTC Finish time: 16:00 UTC or ASAP Keystone is a central point in openstack, so most horizon operations like login, creating/deleting VMs could be affected. On the other hand, VMs will keep working and we don't expect any network outage. This operation will allow us to have a smooth transition in the future when we move all projects and instances to the new eqiad1 deployment and is a previous step to having multi-region support in our Cloud VPS service. Please let us know any question or suggestions you may have. best regards. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] Operation on Cloud VPS next monday 13th Aug
On 07/08/18 18:24, Arturo Borrero Gonzalez wrote: > Hi! > > Next monday 13th we will be doing some maintenance on the main Cloud VPS > deployment to merge the keystone service of both main and eqiad1 > deployments (the new one that we will eventually put into production). > > Toolforge users will not be affected by this outage. > > Day: Monday 13th August > Start time: 14:00 UTC > Finish time: 16:00 UTC or ASAP > > Keystone is a central point in openstack, so most horizon operations > like login, creating/deleting VMs could be affected. On the other hand, > VMs will keep working and we don't expect any network outage. > > This operation will allow us to have a smooth transition in the future > when we move all projects and instances to the new eqiad1 deployment and > is a previous step to having multi-region support in our Cloud VPS service. > > Please let us know any question or suggestions you may have. > Reminder, this is happening today in 30 minutes. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] Operation on Cloud VPS next monday 13th Aug
On 13/08/18 15:30, Arturo Borrero Gonzalez wrote: > On 07/08/18 18:24, Arturo Borrero Gonzalez wrote: >> Hi! >> >> Next monday 13th we will be doing some maintenance on the main Cloud VPS >> deployment to merge the keystone service of both main and eqiad1 >> deployments (the new one that we will eventually put into production). >> >> Toolforge users will not be affected by this outage. >> >> Day: Monday 13th August >> Start time: 14:00 UTC >> Finish time: 16:00 UTC or ASAP >> >> Keystone is a central point in openstack, so most horizon operations >> like login, creating/deleting VMs could be affected. On the other hand, >> VMs will keep working and we don't expect any network outage. >> >> This operation will allow us to have a smooth transition in the future >> when we move all projects and instances to the new eqiad1 deployment and >> is a previous step to having multi-region support in our Cloud VPS service. >> >> Please let us know any question or suggestions you may have. >> > > Reminder, this is happening today in 30 minutes. > The work has been done, and all should be working again. Please, let us know any issue you may find. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] Ubuntu deprecation plans
Hi! We would like to share some information regarding Wikimedia Cloud Services plans for deprecating Ubuntu, specially Trusty. Ubuntu Trusty's end-of-life is April 2019 and the WMF decided to consolidate in a single operating system, which is Debian. In Cloud VPS, projects containing Ubuntu virtual machine instances have been contacted by means of a Phabricator task. Toolforge users aren't affected by this right now, because Toolforge itself is currently running Trusty. But we are already working on the next, Debian-based, Toolforge version. All this information, more details (and timelines), can be found on this Wikitech page: https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation Please, let us know any question or doubt you may have. Best regards. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] Brief service interruption next Monday 2018-11-19 at 13:00 UTC
Next monday 2018-11-19 we will be rebooting several Cloud VPS infrastructure servers [0] for maintenance and security updates. This is just a simple reboot of servers and we don't expect any outage or major interruptions, but some services may be down briefly: * Horizon and Wikitech may misbehave * instance creation/deletion/shutdown, etc * CI tests may stop running Apologies in advance for any inconvenience, and please let us know any issue you may find after these operations. [0] cloudcontrol1003, cloudservices1003, labcontrol1001, labservices1001 ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] OSM database reboot next Tuesday 2018-11-20 at 17:30 UTC
Hi, next Tuesday 2018-11-20 at 17:30 UTC we will be rebooting the OSM database (part of our data services) for maintenance and security updates. In concrete the labstore1006.eqiad.wmnet (osmdb.eqiad.wmnet) server will be rebooted. The other server in the cluster, labstore1007.eqiad.wmnet has been rebooted already, but we won't be doing any pre-failover for operative reasons. Apologies in advance for any inconvenience, and please let us know any issue you may find after these operations. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] Brief service interruption next Monday 2018-11-19 at 13:00 UTC
On 11/15/18 2:03 PM, Arturo Borrero Gonzalez wrote: > Next monday 2018-11-19 we will be rebooting several Cloud VPS > infrastructure servers [0] for maintenance and security updates. > > This is just a simple reboot of servers and we don't expect any outage > or major interruptions, but some services may be down briefly: > > * Horizon and Wikitech may misbehave > * instance creation/deletion/shutdown, etc > * CI tests may stop running > > Apologies in advance for any inconvenience, and please let us know any > issue you may find after these operations. > > [0] cloudcontrol1003, cloudservices1003, labcontrol1001, labservices1001 > Remember, this is happening right now. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] OSM database reboot next Tuesday 2018-11-20 at 17:30 UTC
On 11/15/18 5:58 PM, Arturo Borrero Gonzalez wrote: > Hi, > > next Tuesday 2018-11-20 at 17:30 UTC we will be rebooting the OSM > database (part of our data services) for maintenance and security updates. > > In concrete the labstore1006.eqiad.wmnet (osmdb.eqiad.wmnet) server will > be rebooted. The other server in the cluster, labstore1007.eqiad.wmnet > has been rebooted already, but we won't be doing any pre-failover for > operative reasons. > > Apologies in advance for any inconvenience, and please let us know any > issue you may find after these operations. > Reminder: this is happening in ~10 minutes. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] OSM database reboot next Tuesday 2018-11-20 at 17:30 UTC
On 11/20/18 6:19 PM, Arturo Borrero Gonzalez wrote: > On 11/15/18 5:58 PM, Arturo Borrero Gonzalez wrote: >> Hi, >> >> next Tuesday 2018-11-20 at 17:30 UTC we will be rebooting the OSM >> database (part of our data services) for maintenance and security updates. >> >> In concrete the labstore1006.eqiad.wmnet (osmdb.eqiad.wmnet) server will >> be rebooted. The other server in the cluster, labstore1007.eqiad.wmnet >> has been rebooted already, but we won't be doing any pre-failover for >> operative reasons. >> >> Apologies in advance for any inconvenience, and please let us know any >> issue you may find after these operations. >> > > > Reminder: this is happening in ~10 minutes. > We are done! Please report any issue you may find. Thanks, best regards. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] CloudVPS network maintenance 2018-11-27 @ 17:30 UTC
Hi, next Tuesday, 2018-11-27 @ 17:30UTC we will reboot the labnet1001.eqiad.wmnet server for maintenance and security updates. This server provides virtual networking services for CloudVPS in the main deployment (the old one, different from the eqiad1 deployment). We won't be doing any failover prior to the reboot for operative reasons (we measured the failover downtime is longer than the actual reboot time). The impact of this brief reboot downtime will be: * all VMs in the main CloudVPS deployment won't have network connectivity * ongoing network connections (downloads, uploads) will fail and will have to be restarted * cross connectivity between VM instances in the main and eqiad1 deployment won't be possible Thanks for your understanding, and let us know any issues you may find after the reboot next week. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] CloudVPS network maintenance 2018-11-27 @ 17:30 UTC
On 11/21/18 10:54 AM, Arturo Borrero Gonzalez wrote: > Hi, > > next Tuesday, 2018-11-27 @ 17:30UTC we will reboot the > labnet1001.eqiad.wmnet server for maintenance and security updates. > > This server provides virtual networking services for CloudVPS in the > main deployment (the old one, different from the eqiad1 deployment). > We won't be doing any failover prior to the reboot for operative reasons > (we measured the failover downtime is longer than the actual reboot time). > > The impact of this brief reboot downtime will be: > > * all VMs in the main CloudVPS deployment won't have network connectivity > * ongoing network connections (downloads, uploads) will fail and will > have to be restarted > * cross connectivity between VM instances in the main and eqiad1 > deployment won't be possible > > Thanks for your understanding, and let us know any issues you may find > after the reboot next week. > Reminder, this is happening in ~10 minutes. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] CloudVPS network maintenance tomorrow 2018-12-20 @ 17:00 UTC
Hi! Tomorrow 2018-12-20 @ 17:00 UTC (~24h from now) we will be conducting some network maintenance in Cloud VPS (openstack). We will be doing some works on the transport network that connects the Neutron server to the rest of the internet. Running CloudVPS instances will see a brief connection problem if connected to any external service (outside CloudVPS). If everything goes fine, according to our tests all should be fine, all operations will be finished in just a couple of minutes. Let us know any issue you may find. Thanks. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] CloudVPS network maintenance tomorrow 2018-12-20 @ 17:00 UTC
On 12/19/18 6:16 PM, Arturo Borrero Gonzalez wrote: > Hi! > > Tomorrow 2018-12-20 @ 17:00 UTC (~24h from now) we will be conducting > some network maintenance in Cloud VPS (openstack). > > We will be doing some works on the transport network that connects the > Neutron server to the rest of the internet. Running CloudVPS instances > will see a brief connection problem if connected to any external service > (outside CloudVPS). > > If everything goes fine, according to our tests all should be fine, all > operations will be finished in just a couple of minutes. > > Let us know any issue you may find. Thanks. > Reminder, this is happening now. ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] CloudVPS Trusty instances shutdown
(list cross-posting on purpose, sorry for that) Hi! Today is the deadline for Ubuntu Trusty instances running in CloudVPS [0]. We will be shutting down the remaining instances next monday (2019-01-21) to avoid having the weekend in-between. This situation has been communicated in the corresponding phabricator task to the involved people. The only exception to this deadline is for projects actively working on migrating to Debian. For the record, affected projects are: * queryrapi https://phabricator.wikimedia.org/T204683 * telnet https://phabricator.wikimedia.org/T204694 * wikidataconcepts https://phabricator.wikimedia.org/T204695 [1] * wildcat https://phabricator.wikimedia.org/T204703 * design https://phabricator.wikimedia.org/T204502 * dumps https://phabricator.wikimedia.org/T204503 * maps https://phabricator.wikimedia.org/T204506 [1] * getstarted https://phabricator.wikimedia.org/T204508 * tools/toolsbeta https://phabricator.wikimedia.org/T204530 [1] (please check individual phabricator tasks to see which concrete VMs are affected) Toolforge gridengine users have a separate deprecation process, and you may found additional information on wikitech [2]. [0] https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation [1] project seems to be actively working in a replacement, we will grant an exception. [2] https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Current status of Toolforge and Cloud VPS (2019-02-16)
Hi, Here is just a brief update on the status of Toolforge and CloudVPS by today 2019-02-16, along with some guess-estimations and what to expect in following days. Keeping track of all the events we had this week may be complex, because they were several of them, and heavily intermixed. * CloudVPS suffered severe hardware issues this week [0]. We solved most of the problems and added spare hardware [1] because our server capacity was really lowered. This service should be mostly stable right now. * Toolsdb (tools.db.svc.eqiad.wmflabs) is currently overloaded and suffering from hardware errors. We are already working on a replacement for this service [2]. Services depending on this database aren't working properly (like PAWS) and Toolforge tools that use it are also affected. An honest estimation is that services (specially Toolsdb) we won't be fully recovered until at least next Tuesday (2019-02-26). Our current plans involve replacing the Toolsdb hardware with virtual machines inside CloudVPS [3]. We are trying to be extra cautious to prevent data loss and other problems usually associated with doing things in a rush. Finally, I would like to mention that we are all well aware of the importance of these services for the community and we are doing our best to get things fixed. Thanks for your understanding and patience. regards [0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190213-cloudvps [1] CloudVPS: drain and rebuild labvirt1009 as cloudvirt1009 https://phabricator.wikimedia.org/T216239 [2] ToolsDB overload and cleanup https://phabricator.wikimedia.org/T216208 [3] Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020 https://phabricator.wikimedia.org/T193264 -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Horizon and Toolsadmin issues
Hi, following some vandalism attempts, both Horizon and Toolsadmin are affected by a general Oauth issue in Wikitech which prevents from proper user authentication. Affected URLs are: * https://horizon.wikimedia.org/ * https://toolsadmin.wikimedia.org/auth/login Horizon is the web UI used to create and manage Cloud VPS. Toolsadmin (also known as striker) is the web UI used to create and maintain Toolforge accounts. We have no estimation right now on when a fix will be available, but several people are actively involved in trying to get things back to normal. regards -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Fwd: Cron jlocal rm -f /data/project/map-of-monuments/generate.*; /usr/bin/jsub -N generate -once -quiet bash /data/project/map-of-monuments/suppo
On 5/3/19 9:59 AM, Martin Urbanec wrote: > Hi everyone, > > I get those mails from time to time. Is there a way to prevent them? > That error doesn't sound familiar to me. Please open a Phabricator ticket with all the information you have and we will investigate further. regards -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Electric maintenance on 2019-05-16
Hi! on 2019-05-16 13:00 UTC there will be a maintenance operation in one of the Wikimedia Foundation datacenter racks that affects 2 of our servers running virtual machines [0]. There is a risk that this maintenance operation can result in power loss of the servers, affecting the virtual machines running on it. However, there is no way to know for sure if there will be any outage at all. If you are an admin of any of the VMs in the list and you want the VM to be reallocated into other servers previous to the operation, please get in touch with us as soon as possible. Remember that, right now, reallocating the VM to other server means shutting down the VM briefly. Here is a list of affected virtual machines: cloudvirt1028.eqiad.wmnet: af-puppetdb01.automation-framework.eqiad.wmflabs bastion-eqiad1-02.bastion.eqiad.wmflabs fridolin.catgraph.eqiad.wmflabs cloud-puppetmaster-02.cloudinfra.eqiad.wmflabs cloudstore-dev-01.cloudstore.eqiad.wmflabs commtech-nsfw.commtech.eqiad.wmflabs clm-test-01.community-labs-monitoring.eqiad.wmflabs cyberbot-exec-iabot-01.cyberbot.eqiad.wmflabs deployment-db05.deployment-prep.eqiad.wmflabs deployment-memc05.deployment-prep.eqiad.wmflabs deployment-sca01.deployment-prep.eqiad.wmflabs deployment-pdfrender02.deployment-prep.eqiad.wmflabs ign.ign2commons.eqiad.wmflabs integration-slave-docker-1050.integration.eqiad.wmflabs integration-castor03.integration.eqiad.wmflabs api.openocr.eqiad.wmflabs osmit-umap.osmit.eqiad.wmflabs builder-envoy.packaging.eqiad.wmflabs jmm-buster.puppet.eqiad.wmflabs a11y.reading-web-staging.eqiad.wmflabs adhoc-utils01.security-tools.eqiad.wmflabs util-abogott-stretch.testlabs.eqiad.wmflabs canary1028-01.testlabs.eqiad.wmflabs stretch.thumbor.eqiad.wmflabs tools-worker-1023.tools.eqiad.wmflabs tools-proxy-04.tools.eqiad.wmflabs tools-docker-builder-06.tools.eqiad.wmflabs tools-sgewebgrid-generic-0904.tools.eqiad.wmflabs tools-sgeexec-0942.tools.eqiad.wmflabs tools-sgeexec-0941.tools.eqiad.wmflabs tools-sgeexec-0940.tools.eqiad.wmflabs tools-sgeexec-0939.tools.eqiad.wmflabs tools-sgeexec-0937.tools.eqiad.wmflabs tools-sgeexec-0929.tools.eqiad.wmflabs tools-sgeexec-0921.tools.eqiad.wmflabs tools-sgeexec-0920.tools.eqiad.wmflabs tools-sgeexec-0911.tools.eqiad.wmflabs tools-sgeexec-0909.tools.eqiad.wmflabs toolsbeta-proxy-01.toolsbeta.eqiad.wmflabs vconverter-instance.videowiki.eqiad.wmflabs perfbot.webperf.eqiad.wmflabs wdhqs-1.wikidata-history-query-service.eqiad.wmflabs cloudvirt1014.eqiad.wmnet: commonsarchive-prod.commonsarchive.eqiad.wmflabs deployment-imagescaler03.deployment-prep.eqiad.wmflabs dumps-5.dumps.eqiad.wmflabs dumps-4.dumps.eqiad.wmflabs incubator-mw.incubator.eqiad.wmflabs webperformance.integration.eqiad.wmflabs saucelabs-01.integration.eqiad.wmflabs integration-puppetmaster01.integration.eqiad.wmflabs maps-puppetmaster.maps.eqiad.wmflabs maps-wma.maps.eqiad.wmflabs mwoffliner3.mwoffliner.eqiad.wmflabs mwoffliner1.mwoffliner.eqiad.wmflabs phlogiston-5.phlogiston.eqiad.wmflabs discovery-testing-01.shiny-r.eqiad.wmflabs snuggle-enwiki-01.snuggle.eqiad.wmflabs canary-1014-01.testlabs.eqiad.wmflabs tools-sgeexec-0901.tools.eqiad.wmflabs wdqs-test.wikidata-query.eqiad.wmflabs Toolforge won't be affected by this operation. You can read more details about the datacenter operation itself in phabricator [1]. Sorry for the short notice, regards. [0] Cloud Services: reallocate workload from rack B5-eqiad https://phabricator.wikimedia.org/T223148 [1] Install new PDUs into b5-eqiad https://phabricator.wikimedia.org/T223126 -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Electric maintenance on 2019-05-16
On 5/14/19 2:16 PM, Arturo Borrero Gonzalez wrote: > Hi! > > on 2019-05-16 13:00 UTC there will be a maintenance operation in one of the > Wikimedia Foundation datacenter racks that affects 2 of our servers running > virtual machines [0]. There is a risk that this maintenance operation can > result > in power loss of the servers, affecting the virtual machines running on it. > However, there is no way to know for sure if there will be any outage at all. > Hi!, This has been done with no issues detected. All clear. regards. -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Where is webchat for wikimedia-labs?
On 5/24/19 12:44 AM, Thomas Stieve wrote: > Hello all, > > Could someone tell me where webchat for wikimedia-labs is now? > https://webchat.freenode.net/?channels=wikimedia-labs > It has been a while since we migrated to #wikimedia-cloud, and we may need to update some leftover references in some docs. Where did you find a reference to this channel? regards. -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] The sql command (Was: Re: Where is webchat for wikimedia-labs?)
On 5/24/19 8:37 AM, Valerio Bozzolan via Cloud wrote: > Hello Thomas, > > Just try to type 'mysql' instead of 'sql'. I don't know any 'sql' command. > > Regards > > On May 24, 2019 12:59:33 AM GMT+02:00, Maximilian Doerr > wrote: >> You may need to point the command to the location by calling an >> absolute path. Use “which sql” to figure out where the command is >> located. >> >> Cyberpower678 >> English Wikipedia Account Creation Team >> English Wikipedia Administrator >> Global User Renamer >> >>> On May 23, 2019, at 18:57, Thomas Stieve >> wrote: >>> >>> Also, my question for the webchat was about how to run commands using >> a bash file. I used to be able to run: >>> >>> sql enwiki_p 'select * from logging where log_title = "A.S._Roma" and >> log_namespace = 0 and log_timestamp > 20160101000 and log_action = >> "move"' > A.S._Roma.txt; >>> >>> Now, I just just get command not found. >>> You can read more about the `sql` command here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connecting_to_the_database_replicas It's a custom wrapper aiming to ease interaction with the DB. -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] cloudservices1003 rebuild on 2019-06-03
Hi! On 2019-06-03 UTC+2 14:00 (next monday) we will be rebuilding the cloudservices1003 server, that holds the designate service which serves DNS request for CloudVPS and Toolforge. We have a backup server -cloudservices1004-, so we don't expect a lot of downtime. But DNS queries are really fast, and there may be a lot of them that will fail while we stabilize the DNS service. Please reach out to the WMCS team if you need more details or have any doubts. regards. -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] cloudservices1003 rebuild on 2019-06-03
On 5/28/19 8:11 PM, Arturo Borrero Gonzalez wrote: > Hi! > > On 2019-06-03 UTC+2 14:00 (next monday) we will be rebuilding the > cloudservices1003 server, > that holds the designate service which serves DNS request for CloudVPS and > Toolforge. > > We have a backup server -cloudservices1004-, so we don't expect a lot of > downtime. But DNS queries are really fast, and there may be a lot of them that > will fail while we stabilize the DNS service. > > Please reach out to the WMCS team if you need more details or have any doubts. > Just a heads up, this is starting now. -- Arturo Borrero Gonzalez Operations Engineer / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] PDU upgrades in the eqiad datacenter affects CloudVPS hypervisor
Hi there! There is an ongoing maintenance in the eqiad datacenter that involves changing power connectors of the servers. More info in this phabricator task: T226778 [0]. The PDU upgrade could potentially leave our hypervisors without power briefly. For some hypervisors, we plan to take the risks of leaving them running. For some other hypervisors (those running important DBs in the form of virtual machines) we will probably do a controlled shutdown before the operations to ensure no data corruption happen in the databases. The PDU upgrades will happen this very week (see phab task [0]) and it could potentially affect every virtual machine we run in CloudVPS. This includes Toolforge. In the case of power loss, we expect the disruptions to be very briefly and to don't cause extended downtime in any case. Please, let us know any issue you may find related to this operation. regards. [0] https://phabricator.wikimedia.org/T226778 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] PDU upgrades in the eqiad datacenter affects CloudVPS hypervisor
On 7/23/19 8:31 PM, Arturo Borrero Gonzalez wrote: > Hi there! > > There is an ongoing maintenance in the eqiad datacenter that involves changing > power connectors of the servers. More info in this phabricator task: T226778 > [0]. [..] > > [0] https://phabricator.wikimedia.org/T226778 > This has been delayed and will be scheduled likely starting in 2 weeks. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] CloudVPS reboots for security updates
Hi there! Unrelated to other operations that were communicated recently (datacenter PDU upgrades, operating system upgrades, etc) we need to reboot all the cloudvirt servers to introduce some security updates for CPU vulnerabilities. Along with the physical hardware reboot we also need to reboot all the virtual machines running in CloudVPS. This operation is a bit disruptive but very quick and should not lead to any unexpected errors (is just a reboot). We already tried the same upgrades in some other servers. We will be doing the reboots during this week (starting 2019-07-29). If you see any problems related to this, please contact us. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS reboots for security updates
On 7/29/19 3:22 PM, Maximilian Doerr wrote: > Aww man. I was hoping to push 365 days of continuous up time for my VMs. > > Cyberpower678 > English Wikipedia Account Creation Team > English Wikipedia Administrator > Global User Renamer > Well, a reboot once in a while prevents other major issues :-P Bonus: some people say that the industry standard is 100 days as the maximum uptime you may have in your servers. Some unix tools (like htop) will warn you if the uptime is >100 (only an asterisk though). regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Networking incident today in CloudVPS (ferm update)
Hi, today 2019-09-30 we were doing an operation in all CloudVPS virtual machines to update ferm to fix a bug [0]. Ferm is a firewalling utility. The fleet-wide operation resulted in ferm being installed in every VM, even in those VMs not requiring it. This resulted in a network outage for most of the virtual machines and projects that were not previously configured to use ferm. Many Toolforge tools (webservices, grid jobs, etc) stopped working, database connection were lost, proxy reported bad gateway errors, etc. To resolve the issue, we quickly removed ferm from every VM and run puppet agent to install it just in the VMs that had ferm in their puppet manifests. As soon as we did this, everything went back to normal. This incident lasted 1h, give or take. Please, get in contact in case you see any issue or have any doubts about this incident. regards. [0] https://phabricator.wikimedia.org/T153468 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] CloudVPS maintenance on Wednesday 2019-10-09 (round of cloudvirt reboots)
Hi there, Next Wednesday 2019-10-09 at 09:00 UTC we will be doing a maintenance operation on some of our cloudvirt servers (the hypervisor servers) that involves rebooting both the physical servers and the virtual machines running on them. The reason is that we need to update the running linux kernel version they have. In this window we will reboot 4 hypervisors: * cloudvirt1008 * cloudvirt1009 * cloudvirt1012 * cloudvirt1013 The procedure will be to reboot a server, wait for it to come back online (could take up to 5 minutes) and wait for all the VMs to come back online. Then move to the next server. Toolforge users may see their tools and webservices briefly disrupted due to several components of the Toolforge infrastructure being rebooted in this operation. If nothing changes (reallocated or new virtual machine, etc) this is the list of affected VM instances in each hypervisor: * cloudvirt1008: VM: tools-sgebastion-09 PROJECT: tools VM: tools-k8s-master-01 PROJECT: tools VM: deployment-cache-upload05 PROJECT: deployment-prep VM: toolsbeta-paws-worker-1002 PROJECT: toolsbeta VM: toolsbeta-puppetmaster-02 PROJECT: toolsbeta VM: tools-mail-02 PROJECT: tools VM: tools-prometheus-02 PROJECT: tools VM: tools-elastic-01 PROJECT: tools VM: tracker1 PROJECT: lta-tracker VM: tools-clushmaster-02 PROJECT: tools VM: tools-worker-1020 PROJECT: tools VM: tools-k8s-etcd-01 PROJECT: tools VM: tools-worker-1010 PROJECT: tools VM: tools-worker-1008 PROJECT: tools VM: tools-worker-1007 PROJECT: tools VM: tools-worker-1003 PROJECT: tools VM: tools-sgeexec-0937 PROJECT: tools * cloudvirt1009: VM: toolsbeta-paws-master-01 PROJECT: toolsbeta VM: tools-elastic-02 PROJECT: tools VM: tools-paws-worker-1005 PROJECT: tools VM: tools-prometheus-01 PROJECT: tools VM: tools-paws-worker-1002 PROJECT: tools VM: puppet-lta PROJECT: lta-tracker VM: tools-flannel-etcd-03 PROJECT: tools VM: tools-worker-1017 PROJECT: tools VM: tools-k8s-etcd-02 PROJECT: tools VM: tools-worker-1013 PROJECT: tools VM: tools-worker-1012 PROJECT: tools VM: tools-worker-1009 PROJECT: tools VM: tools-worker-1006 PROJECT: tools VM: tools-worker-1004 PROJECT: tools * cloudvirt1012: VM: tools-paws-master-01 PROJECT: tools VM: deployment-ms-be06 PROJECT: deployment-prep VM: toolsbeta-worker-1001 PROJECT: toolsbeta VM: deployment-cumin02 PROJECT: deployment-prep VM: toolsbeta-k8s-master-01 PROJECT: toolsbeta VM: toolsbeta-k8s-etcd-01 PROJECT: toolsbeta VM: toolsbeta-puppetdb-01 PROJECT: toolsbeta VM: tools-redis-1002 PROJECT: tools VM: tools-paws-worker-1003 PROJECT: tools VM: tools-paws-worker-1001 PROJECT: tools VM: tools-elastic-03 PROJECT: tools VM: tools-worker-1025 PROJECT: tools VM: tools-worker-1026 PROJECT: tools VM: tools-worker-1022 PROJECT: tools VM: tools-worker-1019 PROJECT: tools VM: tools-worker-1018 PROJECT: tools VM: tools-k8s-etcd-03 PROJECT: tools VM: tools-worker-1016 PROJECT: tools VM: tools-flannel-etcd-01 PROJECT: tools VM: tools-worker-1014 PROJECT: tools VM: phlogiston-5 PROJECT: phlogiston VM: dumps-3 PROJECT: dumps VM: codesearch4 PROJECT: codesearch VM: wikispeech-wiki-stretch PROJECT: wikispeech VM: ores-worker-01 PROJECT: ores VM: puppet-jmm-kernel-stretch2 PROJECT: puppet VM: mcr-base PROJECT: mcr-dev VM: rel2 PROJECT: search VM: mc-clusterA-2 PROJECT: test-twemproxy VM: wikibrain-embeddings-02 PROJECT: wikibrain VM: qube-node1 PROJECT: k8splay VM: cindy PROJECT: pluggableauth VM: cvn-apache9 PROJECT: cvn VM: zk1-2 PROJECT: analytics * cloudvirt1013: VM: tools-flannel-etcd-02 PROJECT: tools VM: paws-ext-lb-01 PROJECT: paws VM: abogott-puppetclient PROJECT: testlabs VM: tools-worker-1028 PROJECT: tools VM: tools-worker-1005 PROJECT: tools VM: cloudstore-dev-02 PROJECT: cloudstore VM: cloudstore-puppetmaster-01 PROJECT: cloudstore VM: deployment-aqs03 PROJECT: deployment-prep VM: osmit-test PROJECT: osmit VM: tools-sgewebgrid-lighttpd-0927 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0926 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0925 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0924 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0923 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0922 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0920 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0917 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0909 PROJECT: tools VM: tools-sgeexec-0925 PROJECT: tools VM: tools-sgeexec-0923 PROJECT: tools VM: tools-sgeexec-0910 PROJECT: tools VM: cyberbot-db-01 PROJECT: cyberbot regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] CloudVPS maintenance on Wednesday 2019-10-09 (round of cloudvirt reboots)
Hi, a remainder, this is happening now! On 10/2/19 11:02 AM, Arturo Borrero Gonzalez wrote: > Hi there, > > Next Wednesday 2019-10-09 at 09:00 UTC we will be doing a maintenance > operation > on some of our cloudvirt servers (the hypervisor servers) that involves > rebooting both the physical servers and the virtual machines running on them. > The reason is that we need to update the running linux kernel version they > have. > > In this window we will reboot 4 hypervisors: > * cloudvirt1008 > * cloudvirt1009 > * cloudvirt1012 > * cloudvirt1013 > > The procedure will be to reboot a server, wait for it to come back online > (could > take up to 5 minutes) and wait for all the VMs to come back online. Then move > to > the next server. > > Toolforge users may see their tools and webservices briefly disrupted due to > several components of the Toolforge infrastructure being rebooted in this > operation. > > If nothing changes (reallocated or new virtual machine, etc) this is the list > of > affected VM instances in each hypervisor: > > * cloudvirt1008: > > VM: tools-sgebastion-09 PROJECT: tools > VM: tools-k8s-master-01 PROJECT: tools > VM: deployment-cache-upload05 PROJECT: deployment-prep > VM: toolsbeta-paws-worker-1002 PROJECT: toolsbeta > VM: toolsbeta-puppetmaster-02 PROJECT: toolsbeta > VM: tools-mail-02 PROJECT: tools > VM: tools-prometheus-02 PROJECT: tools > VM: tools-elastic-01 PROJECT: tools > VM: tracker1 PROJECT: lta-tracker > VM: tools-clushmaster-02 PROJECT: tools > VM: tools-worker-1020 PROJECT: tools > VM: tools-k8s-etcd-01 PROJECT: tools > VM: tools-worker-1010 PROJECT: tools > VM: tools-worker-1008 PROJECT: tools > VM: tools-worker-1007 PROJECT: tools > VM: tools-worker-1003 PROJECT: tools > VM: tools-sgeexec-0937 PROJECT: tools > > * cloudvirt1009: > > VM: toolsbeta-paws-master-01 PROJECT: toolsbeta > VM: tools-elastic-02 PROJECT: tools > VM: tools-paws-worker-1005 PROJECT: tools > VM: tools-prometheus-01 PROJECT: tools > VM: tools-paws-worker-1002 PROJECT: tools > VM: puppet-lta PROJECT: lta-tracker > VM: tools-flannel-etcd-03 PROJECT: tools > VM: tools-worker-1017 PROJECT: tools > VM: tools-k8s-etcd-02 PROJECT: tools > VM: tools-worker-1013 PROJECT: tools > VM: tools-worker-1012 PROJECT: tools > VM: tools-worker-1009 PROJECT: tools > VM: tools-worker-1006 PROJECT: tools > VM: tools-worker-1004 PROJECT: tools > > * cloudvirt1012: > > VM: tools-paws-master-01 PROJECT: tools > VM: deployment-ms-be06 PROJECT: deployment-prep > VM: toolsbeta-worker-1001 PROJECT: toolsbeta > VM: deployment-cumin02 PROJECT: deployment-prep > VM: toolsbeta-k8s-master-01 PROJECT: toolsbeta > VM: toolsbeta-k8s-etcd-01 PROJECT: toolsbeta > VM: toolsbeta-puppetdb-01 PROJECT: toolsbeta > VM: tools-redis-1002 PROJECT: tools > VM: tools-paws-worker-1003 PROJECT: tools > VM: tools-paws-worker-1001 PROJECT: tools > VM: tools-elastic-03 PROJECT: tools > VM: tools-worker-1025 PROJECT: tools > VM: tools-worker-1026 PROJECT: tools > VM: tools-worker-1022 PROJECT: tools > VM: tools-worker-1019 PROJECT: tools > VM: tools-worker-1018 PROJECT: tools > VM: tools-k8s-etcd-03 PROJECT: tools > VM: tools-worker-1016 PROJECT: tools > VM: tools-flannel-etcd-01 PROJECT: tools > VM: tools-worker-1014 PROJECT: tools > VM: phlogiston-5 PROJECT: phlogiston > VM: dumps-3 PROJECT: dumps > VM: codesearch4 PROJECT: codesearch > VM: wikispeech-wiki-stretch PROJECT: wikispeech > VM: ores-worker-01 PROJECT: ores > VM: puppet-jmm-kernel-stretch2 PROJECT: puppet > VM: mcr-base PROJECT: mcr-dev > VM: rel2 PROJECT: search > VM: mc-clusterA-2 PROJECT: test-twemproxy > VM: wikibrain-embeddings-02 PROJECT: wikibrain > VM: qube-node1 PROJECT: k8splay > VM: cindy PROJECT: pluggableauth > VM: cvn-apache9 PROJECT: cvn > VM: zk1-2 PROJECT: analytics > > * cloudvirt1013: > > VM: tools-flannel-etcd-02 PROJECT: tools > VM: paws-ext-lb-01 PROJECT: paws > VM: abogott-puppetclient PROJECT: testlabs > VM: tools-worker-1028 PROJECT: tools > VM: tools-worker-1005 PROJECT: tools > VM: cloudstore-dev-02 PROJECT: cloudstore > VM: cloudstore-puppetmaster-01 PROJECT: cloudstore > VM: deployment-aqs03 PROJECT: deployment-prep > VM: osmit-test PROJECT: osmit > VM: tools-sgewebgrid-lighttpd-0927 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0926 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0925 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0924 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0923 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0922 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0920 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0917 PROJECT: tools > VM: tools-sgewebgrid-lig
[Cloud] CloudVPS maintenance on Wednesday 2019-10-16 (round 2 of cloudvirt reboots)
VM: wikidata-misc PROJECT: wikidata-dev VM: packaging PROJECT: thumbor VM: neon PROJECT: rcm VM: oxygen PROJECT: rcm VM: hafnium PROJECT: rcm VM: hound-app-01 PROJECT: hound VM: mediawiki2latex PROJECT: collection-alt-renderer VM: deployment-sca02 PROJECT: deployment-prep VM: deployment-memc04 PROJECT: deployment-prep VM: deployment-fluorine02 PROJECT: deployment-prep VM: deployment-mcs01 PROJECT: deployment-prep VM: deployment-parsoid09 PROJECT: deployment-prep VM: deployment-sca04 PROJECT: deployment-prep VM: deployment-kafka-jumbo-2 PROJECT: deployment-prep VM: deployment-kafka-main-1 PROJECT: deployment-prep VM: deployment-mediawiki-09 PROJECT: deployment-prep VM: deployment-webperf12 PROJECT: deployment-prep VM: deployment-deploy02 PROJECT: deployment-prep VM: deployment-deploy01 PROJECT: deployment-prep VM: deployment-maps04 PROJECT: deployment-prep VM: twlight-tracker PROJECT: twl VM: encoding02 PROJECT: video VM: encoding03 PROJECT: video VM: wikispeech-tts-dev PROJECT: wikispeech VM: pub2 PROJECT: wikiapiary VM: integration-slave-jessie-1001 PROJECT: integration VM: ores-staging-01 PROJECT: ores-staging VM: ve-font PROJECT: design VM: visualeditor-test2 PROJECT: visualeditor VM: ores-redis-02 PROJECT: ores VM: quarry-worker-01 PROJECT: quarry VM: fastcci-new-master PROJECT: fastcci VM: cvn-app8 PROJECT: cvn regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS maintenance on Wednesday 2019-10-16 (round 2 of cloudvirt reboots)
On 10/9/19 1:45 PM, Arturo Borrero Gonzalez wrote: > Hello! > > Next Wednesday 2019-10-16 at 09:00 UTC we will be doing another maintenance > operation on some of our cloudvirts servers (the hypervisor servers) that > involves rebooting both the physical servers and the virtual machines running > on > them. > The reasons is that we ned to update the running linux kernel version they > have. > > In this window we will reboot 4 hypervisors: > * cloudvirt1028 > * cloudvirt1029 > * cloudvirt1030 > > The procedure will be to reboot a server, wait for it to come back online > (could > take up to 5 minutes) and wait for all the VMs to come back online. Then move > to > the next server. > > Toolforge users may see their tools and webservices briefly disrupted due to > several components of the Toolforge infrastructure being rebooted in this > operation. > Remainder, this is happening today in about 10 minutes! regards -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] CloudVPS maintenance on Wednesday 2019-10-23 (round 3 of cloudvirt reboots)
PROJECT: tools VM: canary1025-01 PROJECT: testlabs VM: mathosphere PROJECT: math VM: social-tools3 PROJECT: social-tools VM: togetherjs PROJECT: visualeditor VM: language-mleb-legacy PROJECT: language VM: women-in-red PROJECT: globaleducation VM: ntp-01 PROJECT: cloudinfra VM: mc-clusterA-1 PROJECT: test-twemproxy VM: wikifarm PROJECT: pluggableauth VM: login-test PROJECT: catgraph VM: puppenmeister PROJECT: planet * cloudvirt1026: VM: integration-agent-docker-1016 PROJECT: integration VM: wikidata-new-wbterm PROJECT: wikidata-dev VM: incubator-test PROJECT: incubator VM: cloudinfra-internal-puppetmaster01 PROJECT: cloudinfra VM: cloudinfra-db01 PROJECT: cloudinfra VM: tools-checker-03 PROJECT: tools VM: tools-static-13 PROJECT: tools VM: wp1 PROJECT: mwoffliner VM: pk8s PROJECT: planet VM: arturo-k8s-test-4-1 PROJECT: openstack VM: banner PROJECT: wikidumpparse VM: packager01 PROJECT: packaging VM: tools-package-builder-02 PROJECT: tools VM: canary1026-02 PROJECT: testlabs VM: security-checker1 PROJECT: packagist-mirror VM: logstack02 PROJECT: security-tools VM: logstack01 PROJECT: security-tools VM: mediawiki2latex-large PROJECT: collection-alt-renderer VM: tools-sge-services-03 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0928 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0921 PROJECT: tools VM: tools-sgewebgrid-generic-0903 PROJECT: tools VM: tools-sgeexec-0938 PROJECT: tools VM: tools-sgeexec-0936 PROJECT: tools VM: tools-sgeexec-0935 PROJECT: tools VM: tools-sgeexec-0919 PROJECT: tools VM: tools-sgeexec-0917 PROJECT: tools VM: tools-sgeexec-0916 PROJECT: tools VM: tools-sgeexec-0915 PROJECT: tools VM: tools-sgeexec-0914 PROJECT: tools VM: tools-paws-worker-1010 PROJECT: tools VM: tools-paws-worker-1019 PROJECT: tools VM: openstack-puppetmaster-01 PROJECT: openstack VM: web1 PROJECT: graphql VM: etytree-b PROJECT: etytree VM: canary1026-01 PROJECT: testlabs VM: db-instance PROJECT: videowiki VM: tools-sgeexec-0906 PROJECT: tools VM: mwoffliner5 PROJECT: mwoffliner regards -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS maintenance on Wednesday 2019-10-16 (round 2 of cloudvirt reboots)
On 10/16/19 12:42 PM, Zoran Dori wrote: > Hi, > you said 4 servers but also you said cloudvirt1028, cloudvirt1029 and > cloudvirt1030. Where is fourth? > That's a typo. Sorry for that. We are rebooting *3* cloudvirts. Good catch! :-P regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS maintenance on Wednesday 2019-10-23 (round 3 of cloudvirt reboots)
Follow-up: We just discovered cloudvirt1014 doesn't require reboot, so this operation is only for cloudvirt1025 and cloudvirt1026. regards. On 10/16/19 1:10 PM, Arturo Borrero Gonzalez wrote: > Hello! > > Next Wednesday 2019-10-23 at 09:00 UTC we will be doing another maintenance > operation on some of our cloudvirts servers (the hypervisor servers) that > involves rebooting both the physical servers and the virtual machines running > on > them. > The reasons is that we ned to update the running linux kernel version they > have. > > In this window we will reboot 2 hypervisors: > * cloudvirt1025 > * cloudvirt1026 > > The procedure will be to reboot a server, wait for it to come back online > (could > take up to 5 minutes) and wait for all the VMs to come back online. Then move > to > the next server. > > Toolforge users may see their tools and webservices briefly disrupted due to > several components of the Toolforge infrastructure being rebooted in this > operation. > > If nothing changes (reallocated or new virtual machine, etc) this is the list > of > affected VM instances in each hypervisor: > > * cloudvirt1025: > > VM: integration-agent-docker-1006 PROJECT: integration > VM: striker-deploy04 PROJECT: striker > VM: rec-wiki-2 PROJECT: recommendation-api > VM: deployment-ms-fe03 PROJECT: deployment-prep > VM: deployment-poolcounter05 PROJECT: deployment-prep > VM: deployment-ms-be05 PROJECT: deployment-prep > VM: readers-web-stephen PROJECT: reading-web-staging > VM: traffic-upload-stretch PROJECT: traffic > VM: traffic-recdns-anycast PROJECT: traffic > VM: deployment-maps05 PROJECT: deployment-prep > VM: gerrit-sizzle PROJECT: security-tools > VM: tools-sgewebgrid-generic-0901 PROJECT: tools > VM: shinken-puppetmaster-01 PROJECT: shinken > VM: osmit-due PROJECT: osmit > VM: deployment-acme-chief03 PROJECT: deployment-prep > VM: meza-cindy PROJECT: pluggableauth > VM: accounts-db4 PROJECT: account-creation-assistance > VM: krenair-clientpackages-py3-jessie PROJECT: testlabs > VM: deployment-sessionstore01 PROJECT: deployment-prep > VM: paws-worker-04 PROJECT: paws > VM: paws-ext-lb-02 PROJECT: paws > VM: paws-int-lb-01 PROJECT: paws > VM: paws-master-03 PROJECT: paws > VM: paws-master-01 PROJECT: paws > VM: language-readership PROJECT: language > VM: wmde-wikidiff2-patched-stretch PROJECT: wikidiff2-wmde-dev > VM: tools-sgebastion-08 PROJECT: tools > VM: compiler1002 PROJECT: puppet-diffs > VM: phragile-db PROJECT: phragile > VM: cloud-puppetmaster-01 PROJECT: cloudinfra > VM: chicotest-cappy01 PROJECT: chicotestproject > VM: visualeditor-prototype2 PROJECT: visualeditor > VM: programs-and-events-dashboard PROJECT: globaleducation > VM: osmit-uno PROJECT: osmit > VM: tools-sgewebgrid-lighttpd-0904 PROJECT: tools > VM: canary1025-01 PROJECT: testlabs > VM: mathosphere PROJECT: math > VM: social-tools3 PROJECT: social-tools > VM: togetherjs PROJECT: visualeditor > VM: language-mleb-legacy PROJECT: language > VM: women-in-red PROJECT: globaleducation > VM: ntp-01 PROJECT: cloudinfra > VM: mc-clusterA-1 PROJECT: test-twemproxy > VM: wikifarm PROJECT: pluggableauth > VM: login-test PROJECT: catgraph > VM: puppenmeister PROJECT: planet > > * cloudvirt1026: > > VM: integration-agent-docker-1016 PROJECT: integration > VM: wikidata-new-wbterm PROJECT: wikidata-dev > VM: incubator-test PROJECT: incubator > VM: cloudinfra-internal-puppetmaster01 PROJECT: cloudinfra > VM: cloudinfra-db01 PROJECT: cloudinfra > VM: tools-checker-03 PROJECT: tools > VM: tools-static-13 PROJECT: tools > VM: wp1 PROJECT: mwoffliner > VM: pk8s PROJECT: planet > VM: arturo-k8s-test-4-1 PROJECT: openstack > VM: banner PROJECT: wikidumpparse > VM: packager01 PROJECT: packaging > VM: tools-package-builder-02 PROJECT: tools > VM: canary1026-02 PROJECT: testlabs > VM: security-checker1 PROJECT: packagist-mirror > VM: logstack02 PROJECT: security-tools > VM: logstack01 PROJECT: security-tools > VM: mediawiki2latex-large PROJECT: collection-alt-renderer > VM: tools-sge-services-03 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0928 PROJECT: tools > VM: tools-sgewebgrid-lighttpd-0921 PROJECT: tools > VM: tools-sgewebgrid-generic-0903 PROJECT: tools > VM: tools-sgeexec-0938 PROJECT: tools > VM: tools-sgeexec-0936 PROJECT: tools > VM: tools-sgeexec-0935 PROJECT: tools > VM: tools-sgeexec-0919 PROJECT: tools > VM: tools-sgeexec-0917 PROJECT: tools > VM: tools-sgeexec-0916 PROJECT: tools > VM: tools-sgeexec-0915 PROJECT: tools > VM: tools-sgeexec-0914 PROJECT: tools > VM: tools-paws-worker-1010 PROJECT: tools > VM: tools-paws-worker-1019 PROJECT: tools > VM: openstack-pup
[Cloud] [Toolforge] Proxy maintenance operation next Monday 2019-10-28 @ 14:30 UTC
Hi there! Next Monday 2019-10-28 @ 14:30 UTC we will do a maintenance operation on Toolforge which consists in rebuilding the main front proxy [0] used to serve webservices. We expect this to be done within a 30 minutes window. The operation consists on replacing the old virtual machines supporting the proxy (currently running Debian Jessie) with more modern instances running Debian Buster. Both Grid/Kubernetes backends are affected by this change. We don't expect a lot of service downtime, but there is a key point in the operation which is migrating data stored in Redis which can be tricky. The o Examples of things affected by this change: * Browsing Toolforge webservices * Browsing to https://tools.wmflabs.org/ * Browsing to https://tools.wmflabs.org/admin/ (Toolforge landing page) * Browsing PAWS (to some extent, since it shares part of the toolforge proxy) Example of things not affected by this change: * webservices backend operations * SSH bastions * grid queues, grid jobs * wiki-replicas, toolsdb * other CloudVPS projects regards. [0] https://phabricator.wikimedia.org/T235627 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS maintenance on Wednesday 2019-10-23 (round 3 of cloudvirt reboots)
On 10/16/19 3:34 PM, Arturo Borrero Gonzalez wrote: > Follow-up: > > We just discovered cloudvirt1014 doesn't require reboot, so this operation is > only for cloudvirt1025 and cloudvirt1026. > Reminder: this is happening in a few minutes! regards -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] Brief ToolsDB Outage - Thursday 10/24 @11am UTC
On 10/21/19 9:49 PM, Brooke Storm wrote: > With a redundant power supply upgrade going on this week in the datacenter > that > could affect the VM that Toolsdb runs on, we anticipate a brief outage > Thursday > 10/24 @11am UTC of the mysql service to protect data in case anything goes > wrong. This may require a restart of a tool to reconnect to the database. We > do > not anticipate any worse disruptions, but if there is any disruption beyond > what > is planned, a failover may be necessary, which will not include the > non-replicated tables mentioned > here > https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups_and_Replication > > > The maintenance requiring this notice and action is detailed > here https://phabricator.wikimedia.org/T227540. The VM resides on the > cloudvirt1019 hypervisor, which is why it is in scope. > > We sincerely apologize for the short notice. > Reminder, this is happening in a few minutes! -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Toolforge] Proxy maintenance operation next Monday 2019-10-28 @ 14:30 UTC
On 10/21/19 7:56 PM, Martin Urbanec wrote: > Is there something you missed to say? > > "operation which is migrating data stored in Redis which can be tricky. The o" That's a typo/leftover from me rewording that sentence. Sorry for that :-) -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Toolforge] Proxy maintenance operation next Monday 2019-10-28 @ 14:30 UTC
On 10/21/19 12:16 PM, Arturo Borrero Gonzalez wrote: > Hi there! > > Next Monday 2019-10-28 @ 14:30 UTC we will do a maintenance operation on > Toolforge which consists in rebuilding the main front proxy [0] used to serve > webservices. We expect this to be done within a 30 minutes window. > > The operation consists on replacing the old virtual machines supporting the > proxy (currently running Debian Jessie) with more modern instances running > Debian Buster. Both Grid/Kubernetes backends are affected by this change. We > don't expect a lot of service downtime, but there is a key point in the > operation which is migrating data stored in Redis which can be tricky. The o > > Examples of things affected by this change: > > * Browsing Toolforge webservices > * Browsing to https://tools.wmflabs.org/ > * Browsing to https://tools.wmflabs.org/admin/ (Toolforge landing page) > * Browsing PAWS (to some extent, since it shares part of the toolforge proxy) > > Example of things not affected by this change: > > * webservices backend operations > * SSH bastions > * grid queues, grid jobs > * wiki-replicas, toolsdb > * other CloudVPS projects > > regards. > > [0] https://phabricator.wikimedia.org/T235627 > Reminder, this is happening now. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] MOSTLY COMPLETE cloud-vps maintenance Thursday, 2019-12-12
On 12/12/19 11:50 AM, Andrew Bogott wrote: > We are still chasing down stray issues (in particular, some of the dump and > scratch mounts on toolforge or now wrong) but for almost all use cases things > should be back to normal. > > -Andrew > We consider the operations finished. Everything has been done. NFS, being one of the weakest components of our infra, suffered during today's operation. We were force to reboot most of Toolforge servers, so grid jobs and webservices in both the web grid and kubernetes have most likely been restarted and may present error log entries corresponding to the window of this operation. Other CloudVPS projects users of NFS (dumps shares, maps, etc) might also require some checking. Please get in touch if you are a project admin of such project. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] cloud-vps maintenance Tuesday, 2019-01-14
On 1/7/20 6:12 AM, Andrew Bogott wrote: > We'll be upgrading the cloud services OpenStack install next Tuesday, > beginning > at 12:00 noon UTC > > The entire upgrade process may take an hour or two. Early on in the process, > Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to > 30 > minutes.) There may also be brief network interruptions during the upgrade. > > Toolforge and existing VMs should be largely unaffected apart from possible > network hiccups. > Reminder, this will be happening in about 30 minutes! regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Webservice down
On 1/22/20 11:32 AM, David Richfield wrote: > Hi all, > > The parliament diagram tool ( > https://tools.wmflabs.org/parliamentdiagram/parlitest.php ) is down. > Last time it happened was a week ago: I just restarted the webservice > like Alex did, but now it's down again and I'm at work, so I can't log > in for the next six hours or so. Can someone restart it for me? > here you go! tools.parliamentdiagram@tools-sgebastion-07:~$ webservice status Your webservice is not running tools.parliamentdiagram@tools-sgebastion-07:~$ webservice start Starting webservice... tools.parliamentdiagram@tools-sgebastion-07:~$ webservice status Your webservice of type lighttpd is running > Also, how can I find out why it keeps going down? > Try inspecting log files. For example: tools.parliamentdiagram@tools-sgebastion-07:~$ wc -l error.log 283822 error.log You have plenty of information there, including some "funny" things like: Traceback (most recent call last): File "/data/project/parliamentdiagram/public_html//westminster.py", line 81, in sumdelegates['left'], sumdelegates['right'])/float(optionlist['wingrows']['left']))) ZeroDivisionError: float division by zero Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] [Toolforge] 2020 Kubernetes cluster automatic migration phase beginning
On 2/21/20 5:14 PM, Arthur Smith wrote: > One question - I seem to be getting some more timeout-related 500 server > errors. > Was there a change in how that is handled somehow (i.e. reduced time limit for > response from the server)? I realize it's good practice to respond quickly, > just > some of the existing cases don't at the moment and I'm hitting them > occasionally. > There are at least 3 proxies involved in serving Toolforge webservices requests: 1) tool main front proxy (dynamicproxy) (http) 2) kubernetes front haproxy (tcp) 3) kubernetes nginx-ingress (http) and perhaps kube-proxy (tcp) More information here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Networking_and_ingress This is to say, yes, serving your request as soon as possible should help the different proxy connections to don't die and work smoothly. As of this email, we don't have any particular metrics or insights on proxies performances and this is something we could explore in the near future (create a specific grafana dashboard or something). regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] [Toolforge] 2020 Kubernetes cluster automatic migration phase beginning
On 2/23/20 8:51 PM, Arthur Smith wrote: > Actually I am beginning to suspect the 500 server errors are caused by an > out-of-memory condition. Do the new kubernetes containers have lower memory > usage limits than the old ones? > Yes, you are right: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#Lower_default_resource_limits_for_webservice hope that helps. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Changes to CloudVPS web proxy (XFF) on 2020-04-15
Hi there! If you use a CloudVPS web proxy, this email is for you. Toolforge developers/users can ignore this email. We are introducing a change to eliminate the 'X-Forwarded-For' HTTP header that the CloudVPS web proxy adds when forwarding the HTTP request to your instance. This header contains the original IP address of the internet client that sent the request. This is private information that we would like to reduce in our environment [0]. You use the web proxy if you have a public web endpoint hosted in CloudVPS under the wmflabs.org domain. These are generally configured using Horizon in the DNS > Web Proxies section. Examples of web proxy names: * accounts.wmflabs.org * glampipe.wmflabs.org * incubator.wmflabs.org Full list can be seen in the Openstack Browser tool [1]. We are ready to introduce this change [2], but wanted to give some heads up for projects that do require this information for whatever reason. We would like to hear from you in the next couple of weeks. Please contact us in the phabricator task [0] and include some rationale why you need the XFF header. This is the timeline this change will follow: * 2020-04-01: this email, start collecting list of things that require XFF * 2020-04-07: start evaluating list of things that require XFF * 2020-04-15: introduce the change, with proper case whitelisting When the change is introduced, in two weeks from now, proxy backends that were not whitelisted will stop receiving the XFF header. Please reach out for any questions or comments. regards. [0] https://phabricator.wikimedia.org/T135046 [1] https://openstack-browser.toolforge.org/project/project-proxy [2] https://gerrit.wikimedia.org/r/c/operations/puppet/+/583098 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] CloudVPS network change (routing source IP) on 2020-04-13
Hi there! In a few days from now (2020-04-13), the CloudVPS network will see a change happening that will likely go unnoticed, but it is important enough to share it with you beforehand. We will be changing the IPv4 address that we use as the main source NAT for egress connections (initiated in the VM instances). This change won't affect VM instances using floating IPs. Old IP address: 185.15.56.1 New IP address: 208.80.155.92 If you know of anywhere (a firewall, ACL or any other mechanism) that had this address hardcoded, you will need to update it. See this wikitech page for more details: https://wikitech.wikimedia.org/wiki/News/CloudVPS_NAT_change Please reach out if you have any doubts, questions, or any other issue. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS network change (routing source IP) on 2020-04-13
On 4/6/20 8:00 PM, Arturo Borrero Gonzalez wrote: > Hi there! > > In a few days from now (2020-04-13), the CloudVPS network will see a change > happening that will likely go unnoticed, but it is important enough to share > it > with you beforehand. > > We will be changing the IPv4 address that we use as the main source NAT for > egress connections (initiated in the VM instances). This change won't affect > VM > instances using floating IPs. > > Old IP address: 185.15.56.1 > New IP address: 208.80.155.92 > > If you know of anywhere (a firewall, ACL or any other mechanism) that had this > address hardcoded, you will need to update it. > > See this wikitech page for more details: > > https://wikitech.wikimedia.org/wiki/News/CloudVPS_NAT_change > Finally, it has been decided this change will not happen. You can safely ignore the information that was initially shared. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Toolforge: new domain toolforge.org
Hi! We are happy to announce the new domain 'toolforge.org' is now ready to be adopted by our Toolforge community. There is a lot of information related to this change in a wikitech page we have for this: https://wikitech.wikimedia.org/wiki/News/Toolforge.org The most important change you will see happening is a new domain/scheme for Toolforge-hosted webservices: * from https://tools.wmflabs.org// * to https://.toolforge.org/ A live example of this change can be found in our internal openstack-browser webservice tool: * legacy URL: https://tools.wmflabs.org/openstack-browser/ * new URL:https://openstack-browser.toolforge.org This domain change is something we have been working on for months previous to this announcement. Part of our work has been to ensure we have a smooth transition from the old domain (and URL scheme) to the new canonical one. However, we acknowledge the ride might be bumpy for some folks, due to technical challenges or cases we didn't consider when planning this migration. Please reach out intermediately if you find any limitation or failure anywhere related to this change. The wikitech page also contains a section with information for common problems. You can check now if your webservice needs any specific change by creating a temporal redirection to the new canonical URL: $ webservice --canonical --backend=kubernetes start [..] $ webservice --canonical --backend=gridengine start [..] The --canonical switch will create a temporal redirect that you can turn on/off. Please use this to check how your webservice behaves with the new domain/URL scheme. If you start the webservice without --canonical, the temporal redirect will be removed. We aim to introduce permanent redirects for the legacy URLs on 2020-06-15. We expect to keep serving legacy URLs forever, by means of redirections to the new URLs. More information on the redirections can also be found in the wikitech page. The toolforge.org domain is finally here! <3 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
On 4/13/20 4:18 PM, Maarten Dammers wrote: > We sure like to rename and move around. I hope Toolforge.org lasts a lot > longer! > Hi Maarten, I think I understand your concern. Sometimes, naming things is hard :-) However, let me point out that toolserver and toolforge, while similar in spirit and scope, are different services, with different technologies involved, and more things to offer to the users (developers). The new domain, for me, means the service is evolving even more, in the good sense. Also, please note the change is not only a cosmetic one. It involves a more secure approach to host each tool webservice, from an all-shared domain to a domain per tool. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] Changes to CloudVPS web proxy (XFF) on 2020-04-15
On 4/14/20 6:25 PM, Jason Sherman wrote: > Hi there, > > I was wondering if you were planning on exposing some kind of rate-limiting > option for the web proxies in horizon? I'm thinking this will effectively mean > no more rate-limiting per remote address at the instance level. Every once in > a > while, our project gets hammered by script kiddies and our application service > gets brought down. I've gone ahead and implemented rate limiting in nginx that > has a very high limit set across all ip addresses that should basically work, > but typically I would set the limits to be per-client-ip to the extent allowed > by the practicalities of NAT. This is not a blocker in any way for us, and I'd > rather make do with less user info wherever possible. > Hi there! What you did seems correct to me, that is, implementing the controls on your own servers. That being said, I understand your concern. We have mechanisms in place for banning concrete abusers. If we detected a more wide-spread problems we could introduce other mechanisms and controls to ensure service availability. Should you detect someone is hammering your servers in CloudVPS, please contact us. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Changes to CloudVPS web proxy (XFF) on 2020-04-15
On 4/1/20 2:16 PM, Arturo Borrero Gonzalez wrote: > Hi there! > > If you use a CloudVPS web proxy, this email is for you. Toolforge > developers/users can ignore this email. > > We are introducing a change to eliminate the 'X-Forwarded-For' HTTP header > that > the CloudVPS web proxy adds when forwarding the HTTP request to your instance. > This header contains the original IP address of the internet client that sent > the request. This is private information that we would like to reduce in our > environment [0]. > > You use the web proxy if you have a public web endpoint hosted in CloudVPS > under > the wmflabs.org domain. These are generally configured using Horizon in the > DNS >> Web Proxies section. > > Examples of web proxy names: > * accounts.wmflabs.org > * glampipe.wmflabs.org > * incubator.wmflabs.org > > Full list can be seen in the Openstack Browser tool [1]. > > We are ready to introduce this change [2], but wanted to give some heads up > for > projects that do require this information for whatever reason. We would like > to > hear from you in the next couple of weeks. Please contact us in the > phabricator > task [0] and include some rationale why you need the XFF header. > > This is the timeline this change will follow: > > * 2020-04-01: this email, start collecting list of things that require XFF > * 2020-04-07: start evaluating list of things that require XFF > * 2020-04-15: introduce the change, with proper case whitelisting > > When the change is introduced, in two weeks from now, proxy backends that were > not whitelisted will stop receiving the XFF header. > > Please reach out for any questions or comments. > > regards. > > [0] https://phabricator.wikimedia.org/T135046 > [1] https://openstack-browser.toolforge.org/project/project-proxy > [2] https://gerrit.wikimedia.org/r/c/operations/puppet/+/583098 > Hi there! This change is being applied now! regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
Hi there! This is a reminder about the ongoing migration for the new domain and URL/path scheme for webservices running in Toolforge. On 2020-05-31 we have the soft deadline for this migration period. For trying the change you only need to run the webservice command with the --canonical argument. Please review the documentation here: https://wikitech.wikimedia.org/wiki/News/Toolforge.org Early adopting this change is interesting for many reasons, specially a more secure environment by means of proper domain isolation. Also, the new domain better reflects the identity of the Toolforge service :-) Moreover, during the compatibility period, we would like to collect feedback and bug reports from our users before the soft deadline. As of today, we have 40 tool webservices that are running in the new domain and using the new path scheme, find them here: https://replag.toolforge.org https://testwikis.toolforge.org https://meetingtimes.toolforge.org https://speedpatrolling.toolforge.org https://wiki-tennis.toolforge.org https://xslack.toolforge.org https://urbanecm-test-1.toolforge.org https://wdmm.toolforge.org https://quickcategories.toolforge.org https://wikiportretdev.toolforge.org https://zppixbot-test.toolforge.org https://anticompositetools.toolforge.org https://james.toolforge.org https://bd808-ruby.toolforge.org https://ytcleaner.toolforge.org https://wd-shex-infer.toolforge.org https://github-pr-closer.toolforge.org https://wikistream.toolforge.org https://secwatch.toolforge.org https://giftbot.toolforge.org https://pagepile-visual-filter.toolforge.org https://zppixbot.toolforge.org https://stashbot.toolforge.org https://moedata.toolforge.org https://sal.toolforge.org https://covid-obit.toolforge.org https://docker-registry.toolforge.org https://wb2rdf.toolforge.org https://massmailer.toolforge.org https://wd-image-positions.toolforge.org https://versions.toolforge.org https://lexeme-forms.toolforge.org https://ukbot.toolforge.org https://machtsinn.toolforge.org https://bikeshed.toolforge.org https://gmt.toolforge.org https://ipcheck.toolforge.org https://signatures.toolforge.org https://templatedata-filler.toolforge.org https://wordcount.toolforge.org Please reach out for any comments, doubts or questions. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Toolforge grid now using tesseract-ocr 4.1.1
Hi there! We just deployed tesseract-ocr v4.1.1 in the Toolforge grid. The context of this update is the phabricator task T247422 [0]. Please report any issue you may find. regards! [0] https://phabricator.wikimedia.org/T247422 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
Hi there! The soft deadline for migrating to the toolforge.org domain was two days ago, on 2020-05-31. For adopting the change in a controlled way you only need to run the webservice command with the --canonical argument. Please review the documentation here: https://wikitech.wikimedia.org/wiki/News/Toolforge.org Next event is the hard deadline. In about 2 weeks (on 2020-06-15) we will introduce forced redirection from the legacy URL to the new one. Please contact your fellow tool developers if you think they aren't aware of this migration. And please reach out to us if you have doubts or need help. I checked today, and at least about 110 webservices are already running on the new domain and URL scheme: https://alphatest.toolforge.org/ https://anticompositetools.toolforge.org/ https://author-disambiguator.toolforge.org/ https://base-encode.toolforge.org/ https://bd808-ruby.toolforge.org/ https://bd808-test2.toolforge.org/ https://bd808-test.toolforge.org/ https://bikeshed.toolforge.org/ https://bookreader.toolforge.org/ https://cdnjs-beta.toolforge.org/ https://cdnjs.toolforge.org/ https://chie-bot.toolforge.org/ https://copypatrol.toolforge.org/ https://covid-obit.toolforge.org/ https://dna.toolforge.org/ https://docker-registry.toolforge.org/ https://event-streams.toolforge.org/ https://fastilybot-reports.toolforge.org/ https://fist.toolforge.org/ https://flickr2commons.toolforge.org/ https://flickrdash.toolforge.org/ https://fontcdn.toolforge.org/ https://fountain-test.toolforge.org/ https://fountain.toolforge.org/ https://ftools.toolforge.org/ https://giftbot.toolforge.org/ https://github-pr-closer.toolforge.org/ https://global-search-test.toolforge.org/ https://global-search.toolforge.org/ https://globalsearch.toolforge.org/ https://gmt.toolforge.org/ https://grantmetrics.toolforge.org/ https://hgztools.toolforge.org/ https://ia-upload.toolforge.org/ https://indic-wscontest.toolforge.org/ https://indic-wsstats.toolforge.org/ https://interaction-timeline.toolforge.org/ https://intersect-contribs.toolforge.org/ https://ipcheck.toolforge.org/ https://ip-range-calc.toolforge.org/ https://itwikinews-rss.toolforge.org/ https://james.toolforge.org/ https://k8s-status.toolforge.org/ https://langviews.toolforge.org/ https://ldap.toolforge.org/ https://lexeme-forms.toolforge.org/ https://machtsinn.toolforge.org/ https://majavah-bot.toolforge.org/ https://massmailer.toolforge.org/ https://meetingtimes.toolforge.org/ https://mix-n-match.toolforge.org/ https://moedata.toolforge.org/ https://morfeusz.toolforge.org/ https://musikanimal.toolforge.org/ https://mwph-api.toolforge.org/ https://mwversion.toolforge.org/ https://mysql-php-session-test.toolforge.org/ https://pagepile-visual-filter.toolforge.org/ https://pageviews-test.toolforge.org/ https://pageviews.toolforge.org/ https://pathoschild-contrib.toolforge.org/ https://phabsearchemail.toolforge.org/ https://phabulous.toolforge.org/ https://plagiabot.toolforge.org/ https://plnode.toolforge.org/ https://qrcode-generator.toolforge.org/ https://quickcategories.toolforge.org/ https://replacer.toolforge.org/ https://replag.toolforge.org/ https://sal.toolforge.org/ https://searchsbl.toolforge.org/ https://section-links.toolforge.org/ https://secwatch.toolforge.org/ https://signatures.toolforge.org/ https://siteviews.toolforge.org/ https://speedpatrolling.toolforge.org/ https://sql-optimizer.toolforge.org/ https://stashbot.toolforge.org/ https://superzerocool.toolforge.org/ https://svgtranslate-test.toolforge.org/ https://svgtranslate.toolforge.org/ https://taxoboxalyzer.toolforge.org/ https://templatedata-filler.toolforge.org/ https://testwikis.toolforge.org/ https://text2hash.toolforge.org/ https://tool-db-usage.toolforge.org/ https://toolviews.toolforge.org/ https://ukbot.toolforge.org/ https://urbanecm-test-1.toolforge.org/ https://url-converter.toolforge.org/ https://versions.toolforge.org/ https://wb2rdf.toolforge.org/ https://wd-image-positions.toolforge.org/ https://wdmm.toolforge.org/ https://wd-shex-infer.toolforge.org/ https://wikicontrib.toolforge.org/ https://wikidata-externalid-url.toolforge.org/ https://wikifile-transfer.toolforge.org/ https://wikiportretdev.toolforge.org/ https://wikisource-bot.toolforge.org/ https://wikistream.toolforge.org/ https://wiki-tennis.toolforge.org/ https://wiki-topic.toolforge.org/ https://wordcount.toolforge.org/ https://wsexport.toolforge.org/ https://xn--dk8hv9g.toolforge.org/ https://xslack.toolforge.org/ https://xtools.toolforge.org/ https://ytcleaner.toolforge.org/ https://zppixbot.toolforge.org/ -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
Hi there! The hard deadline for migrating to the toolforge.org domain was 3 days ago, on 2020-06-15. We are aware of some folks from the community still working on finishing up this migration, and we will give an additional 2 weeks before introducing the legacy-redirector that will force-redirect all the legacy URLs to the new domain and URL scheme. If you need additional context about this migration, please read: https://wikitech.wikimedia.org/wiki/News/Toolforge.org We are tracking missing webservices OAuth grants for for the new domain in this phabricator task: https://phabricator.wikimedia.org/T254857 If your tool is unchecked, it means it requires additional work to make sure OAuth will work with the new domain. Please contact your fellow tool developers if you think they aren't aware of this migration. And please, reach out to us if you have doubts or need help. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Screen sessions
On 2020-06-23 16:47, Isaac Johnson wrote: > I'm interested in running some long-ish scripts that loop through the dump > replicas on Toolforge. Eventually, this sort of thing might move to crontab, > but > for now it would be nice to run a screen session as we test / debug the > scripts. > The problem is that if I run the scripts from my tool account (i.e. after > "become "), I get the following error: Cannot open your terminal > '/dev/pts/28' - please check. > You shouldn't run such script on the bastions. The grid is the way to go in this case: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid Run your script with jsub and it will be scheduled in a grid worker node to run until it finishes. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] Toolforge email server now enforcing ratelimiting
Hi, we just enabled email ratelimiting in our MTA server [0] in Toolforge. Please, report any problem or issue you may find related to this. The current limit is 100 messages per hour per sender address. We may tune the value as we observe the behavior of the system and the users. regards. [0] https://en.wikipedia.org/wiki/Message_transfer_agent -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services announce mailing list cloud-annou...@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
Hi there! Tomorrow 2020-07-06 at about 10:00 UTC we will enable the legacy redirector and this migration will be completed. All requests to tools.wmflabs.org/ will be permanently redirected to .toolforge.org. If you need additional context about this, please read: https://wikitech.wikimedia.org/wiki/News/Toolforge.org Please reach out if you need help or have doubts. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge: new domain toolforge.org
On 2020-07-06 17:59, Arturo Borrero Gonzalez wrote: > Hi there! > > Tomorrow 2020-07-06 at about 10:00 UTC we will enable the legacy redirector > and > this migration will be completed. > > All requests to tools.wmflabs.org/ will be permanently redirected to > .toolforge.org. > > If you need additional context about this, please read: > > https://wikitech.wikimedia.org/wiki/News/Toolforge.org > > Please reach out if you need help or have doubts. > This has been done! regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] unscheduled keystone maintenance
Hi there, we need to perform some unscheduled keystone maintenance right now. Authentication to some cloud services, in particular Horizon, might be interrupted during this maintenance period. We expect such maintenance to don't last more than 1h. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] General CloudVPS network maintenance on 2020-10-29
Hi! There will be a general CloudVPS network maintenance on 20202-10-29, from 16:00 UTC to 17:00 UTC. During the operation window, all cloud services might be intermittently down, inaccessible. This operation affects all CloudVPS projects, including Toolforge, PAWS and Quarry. Services running in the cloud might fail to contact external entities, and connections to ToolsDB, NFS, wiki-replicas or LDAP might be affected as well. In the best case scenario, the changes (and downtime) will be barely noticed. The maintenance consist on introducing new hardware equipment in to the CloudVPS edge network. You can find additional details in Phabricator [0]. regards. [0] https://phabricator.wikimedia.org/T265288 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] General CloudVPS network maintenance on 2020-10-29
On 2020-10-22 17:41, Arturo Borrero Gonzalez wrote: > Hi! > > There will be a general CloudVPS network maintenance on 20202-10-29, from > 16:00 > UTC to 17:00 UTC. > > During the operation window, all cloud services might be intermittently down, > inaccessible. > > This operation affects all CloudVPS projects, including Toolforge, PAWS and > Quarry. Services running in the cloud might fail to contact external entities, > and connections to ToolsDB, NFS, wiki-replicas or LDAP might be affected as > well. > > In the best case scenario, the changes (and downtime) will be barely noticed. > The maintenance consist on introducing new hardware equipment in to the > CloudVPS > edge network. You can find additional details in Phabricator [0]. > > regards. > > [0] https://phabricator.wikimedia.org/T265288 > Reminder, this is happening now. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] General CloudVPS network maintenance on 2020-10-29
On 2020-10-29 16:59, Arturo Borrero Gonzalez wrote: > On 2020-10-22 17:41, Arturo Borrero Gonzalez wrote: >> Hi! >> >> There will be a general CloudVPS network maintenance on 20202-10-29, from >> 16:00 >> UTC to 17:00 UTC. >> >> During the operation window, all cloud services might be intermittently down, >> inaccessible. >> >> This operation affects all CloudVPS projects, including Toolforge, PAWS and >> Quarry. Services running in the cloud might fail to contact external >> entities, >> and connections to ToolsDB, NFS, wiki-replicas or LDAP might be affected as >> well. >> >> In the best case scenario, the changes (and downtime) will be barely noticed. >> The maintenance consist on introducing new hardware equipment in to the >> CloudVPS >> edge network. You can find additional details in Phabricator [0]. >> >> regards. >> >> [0] https://phabricator.wikimedia.org/T265288 >> > > Reminder, this is happening now. > The operation is now completed. There was a brief interruption of the service, but should be recovered now. Let us know if you see anything weird matching the timing or somehow related to this operation window. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] General CloudVPS network maintenance on 2020-10-29
On 2020-10-30 00:04, Maarten Dammers wrote: > Hi Arturo, > > On 29-10-2020 18:30, Arturo Borrero Gonzalez wrote: >> Let us know if you see anything weird matching the timing or somehow related >> to >> this operation window. > > This was announced as network maintenance, but the tools-sgebastion-08 > rebooted > at Thu Oct 29 17:15. Is this related or did the server happen to crash in the > same window? > Hi there, due to the network maintenance, we had NFS issues on some VMs, particularly affecting our Grid service. So indeed we rebooted a couple of servers (including tools-sgebastion-08) to get to a stable situation and make sure everything was working as expected. thanks for double checking. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] General CloudVPS network maintenance on 2020-11-09 (today)
Hi!, There will be a general CloudVPS network maintenance on 2020-10-09 @ 12:30 UTC. The operation window will last for 1h. During the operation, all cloud services will be inaccessible or intermittently down. This operation affects all CloudVPS projects, including Toolforge, PAWS and Quarry. Services running in the cloud might fail to contact external entities, and connections to ToolsDB, NFS, wiki-replicas or LDAP will be affected as well. The operation we are doing today is a followup to what we did two weeks ago [0], and involves changing the IP addressing of the network that connects the CloudVPS network to the internet. Sorry for the short notice, we couldn't avoid scheduling this to today. regards. [0] https://phabricator.wikimedia.org/T265288 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] PAWS kubernetes upgrade
Hi there, we are about to upgrade the kubernetes version that runs PAWS, from 1.6 to 1.17. We don't expect any interruptions major on the service, perhaps only some hiccups when pods are restarted/rescheduled. More information is available in this phabricator ticket: https://phabricator.wikimedia.org/T268669 The operation may take something between 30 minutes and 1 hours, and we are starting soon after I finish sending this email. Please, ping us if you see anything wrong. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Toolforge kubernetes maintenance today 2020-12-10 @ 15:30 UTC
Hi there! Today 2020-12-10 @ 15:30 UTC we will perform an upgrade of the Toolforge kubernetes cluster [0]. We don't expect any major disruption of the service, but we detected in past upgrades that some components might be restarted, causing brief interruptions of network flows. Given the amount of worker nodes we have, more than 50, the operation will take us at least a couple of hours. Tools maintainers: you don't have to do anything during this operation, but if you detect anything weird please contact us either in the phabricator task [0], in the IRC channel #wikimedia-cloud or in the cloud@lists.wikimedia.org [1] mailing list. regards. [0] https://phabricator.wikimedia.org/T263284 [1] https://lists.wikimedia.org/mailman/listinfo/cloud -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Toolforge kubernetes maintenance today 2020-12-10 @ 15:30 UTC
On 12/10/20 1:33 PM, Arturo Borrero Gonzalez wrote: Hi there! Today 2020-12-10 @ 15:30 UTC we will perform an upgrade of the Toolforge kubernetes cluster [0]. We don't expect any major disruption of the service, but we detected in past upgrades that some components might be restarted, causing brief interruptions of network flows. Given the amount of worker nodes we have, more than 50, the operation will take us at least a couple of hours. Tools maintainers: you don't have to do anything during this operation, but if you detect anything weird please contact us either in the phabricator task [0], in the IRC channel #wikimedia-cloud or in the cloud@lists.wikimedia.org [1] mailing list. regards. [0] https://phabricator.wikimedia.org/T263284 [1] https://lists.wikimedia.org/mailman/listinfo/cloud This is starting right now! -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Change to how Cloud VPS and Toolforge contact Wikis
Hello, we are planning to change how Cloud VPS instances and Toolforge tools contact WMF-hosted wikis, in particular the source IP address for the network connection. The new IP address that wikis will see is 185.15.56.1. The change is scheduled to go live on 2021-02-08. More detailed information in wikitech: https://wikitech.wikimedia.org/wiki/News/CloudVPS_NAT_wikis If you are a Cloud VPS user or Toolforge developer, check your tools after that date to make sure they are properly running. If you detect a block, a rate-limit or similar, please let us know. If you are a WMF SRE or engineer involved with the wikis, be informed that this address could generate a significant traffic volume, perhaps about 30%-40% total wiki edits. We are trying to smooth the change as much as possible, so please send your feedback if you think there is something we didn't account for yet. Thanks, best regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Ops] Change to how Cloud VPS and Toolforge contact Wikis
On 1/28/21 9:50 PM, Martin Urbanec wrote: Hi Arturo, a quick question: MediaWIki has a strict limit on bad logins. If all of WMCS will be NATed, that would mean that /any/ bot having too many bad login attempts could block all other bots from logging in. Is that prevented through technical measures, somehow? Hi, do you know where this limit configuration can be found? thanks for the heads up. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Ops] Change to how Cloud VPS and Toolforge contact Wikis
On 1/29/21 10:29 AM, Amir Sarabadani wrote: This is sorta (under-)documented in https://www.mediawiki.org/wiki/Manual:$wgRateLimits <https://www.mediawiki.org/wiki/Manual:$wgRateLimits> I made a patch for it but I'm not sure if I did it correctly. Excellent, thanks! Could you please share a link to gerrit so I can have such patch in my radar? regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Change to how Cloud VPS and Toolforge contact Wikis
On 1/25/21 11:55 AM, Arturo Borrero Gonzalez wrote: Hello, we are planning to change how Cloud VPS instances and Toolforge tools contact WMF-hosted wikis, in particular the source IP address for the network connection. The new IP address that wikis will see is 185.15.56.1. The change is scheduled to go live on 2021-02-08. More detailed information in wikitech: https://wikitech.wikimedia.org/wiki/News/CloudVPS_NAT_wikis Hi there, based on the feedback we have collected so far, we decided to extend the timeline. This change won't go live on 2021-02-08 but at a later date instead. We will use this extended timeline to review a few unexpected config changes that we need to introduce previous to this operation. The exact new date is still to be decided, and we will share it once it is known. Thanks to everyone for providing valuable feedback. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] Network change for VMs contacting NFS dumps
Hello, today 2021-02-23 in about ~30 minutes (16:00 UTC) we will change how virtual machine instances running in Cloud VPS contact NFS dump servers [0]. There is no action required on your side. We anticipate little to no impact as a result of the network changes. But in case you notice something is not properly working with dumps NFS in Cloud VPS (or Toolforge) please contact us [1] as soon as possible. The relevant phabricator ticket [2] is T272397. regards. [0] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Dumps [1] https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction#Communication_and_support [2] https://phabricator.wikimedia.org/T272397 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] [Cloud-announce] cloud-vps maintenance at 14:00 UTC
On 4/27/21 5:34 PM, Roy Smith wrote: I'm getting timeouts and 502's on both https://spi-tools.toolforge.org/ <https://spi-tools.toolforge.org/> and https://spi-tools-dev.toolforge.org/ <https://spi-tools-dev.toolforge.org/>. Also: ssh: connect to host dev.tools.wmflabs.org <http://dev.tools.wmflabs.org> port 22: Network is unreachable On a related note, I suggest you switch to using 'dev.toolforge.org' and 'login.toolforge.org' for your SSH connections. Stuff in the old tools.wmflabs.org domain may stop working at some point in the future as we deprecate such domain. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] CloudVPS / Toolforge edge network maintenance 2021-05-06 @ 15:00 UTC
Hello there, We will be doing an upgrade to the CloudVPS edge network Thursday 2021-05-06 @ 15:00 UTC that will likely impact user experience, including Toolforge. We scheduled an 1h operation window. During that time, intermittent network interruption, packet loss and other network problems are to be expected. The edge network maintenance will affect how virtual machines (and Toolforge tools) contact NFS, wiki-replicas, wikis API endpoints, and, in general, any network traffic that flows leaving or entering the cloud (also known as north-south traffic). More information on the operation can be found in phabricator [0] and in wikitech [1]. Regards. [0]https://phabricator.wikimedia.org/T270704 [1] https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/2020_Network_refresh -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Expired certificates for wmflabs.org and wmcloud.org
On 5/6/21 11:20 AM, Sebastian Berlin wrote: I'm getting errors regarding expired certificates for wmflabs.org <http://wmflabs.org> and wmcloud.org <http://wmcloud.org>, e.g. https://wikispeech.wmflabs.org <https://wikispeech.wmflabs.org> and https://codesearch.wmcloud.org <https://codesearch.wmcloud.org>. Is this related to the maintenance later today or has something gone wrong? Here's an example from curl: Good catch! The certificate expired because acme-chief failed to renew them. Apparently it is a known bug. I just force-restarted acme-chief and everything worked. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS / Toolforge edge network maintenance 2021-05-06 @ 15:00 UTC
On 5/3/21 11:27 AM, Arturo Borrero Gonzalez wrote: Hello there, We will be doing an upgrade to the CloudVPS edge network Thursday 2021-05-06 @ 15:00 UTC that will likely impact user experience, including Toolforge. We scheduled an 1h operation window. During that time, intermittent network interruption, packet loss and other network problems are to be expected. The edge network maintenance will affect how virtual machines (and Toolforge tools) contact NFS, wiki-replicas, wikis API endpoints, and, in general, any network traffic that flows leaving or entering the cloud (also known as north-south traffic). More information on the operation can be found in phabricator [0] and in wikitech [1]. Regards. [0]https://phabricator.wikimedia.org/T270704 [1] https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/2020_Network_refresh Reminder, this is happening now! See you on the other side :-) -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] CloudVPS / Toolforge edge network maintenance 2021-05-06 @ 15:00 UTC
On 5/6/21 5:00 PM, Arturo Borrero Gonzalez wrote: Reminder, this is happening now! See you on the other side :-) Hello from the other side. This is now done. Sorry for the bumpy ride in Toolforge bastions. regards -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Re: [Cloud] Expired certificates for wmflabs.org and wmcloud.org
On 5/7/21 7:40 AM, Sascha Brawer wrote: Curious, does the Wikimedia cloud have some kind of monitoring system that could have noticed and send an alert? Yeah, we have monitoring. We could always do better with monitoring in general, of course. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
[Cloud] [Cloud-announce] wiki replicas maintenance on 2021-07-22
Hi there, on Thurs July 22nd at 15:00 UTC (08:00 PDT / 11:00 EDT / 17:00 CEST) there is a planned network maintenance that will affect the availability of the wiki replica database service. The expected operation window is of about 5 minutes long and it will affect any wiki replicas users including Toolforge tools, PAWS, and any other Cloud VPS project using them. More information can be found on phabricator: https://phabricator.wikimedia.org/T286614 regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Cloud-announce] 2021-11-02: Cloud VPS network outage
Hi, Today 2021-11-02 we had a severe network outage on Cloud VPS and Toolforge. Several network connections were affected from 11:40 UTC to 13:20 UTC (1h40m duration). As of this writing the problem has been corrected. Detailed information can be seen in Phabricator: https://phabricator.wikimedia.org/T294853 Sorry for the inconvenience. regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Re: preferred proxy
On 1/26/22 17:30, Taavi Väänänen wrote: On 1/26/22 18:26, Tim Moody wrote: Which is the preferred domain for dns proxies wmcloud.org <http://wmcloud.org> or wmflabs.org <http://wmflabs.org>? Hi, for new services wmcloud.org is preferred. Hi, thanks Taavi for the clarification. It is true, wmcloud.org is the current domain and wmflabs.org is considered 'legacy' and in the [slow] process of being removed. Hey @Tim, can you point to documentation or some information that needs updates that could be source of confusion in the future? thanks, regards. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Re: [Cloud-announce] [IMPORTANT] Announcing Toolforge Debian Stretch Grid Engine deprecation
On 2/15/22 21:46, Maarten Dammers wrote: Hi, Why are we upgrading to Buster instead of Bullseye? According to https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy Buster will be end of life around August this year. So we're either stuck with an older version for a while or we have to do this whole exercise again much sooner than we would like. Can you explain? Hi there, Legit question. I'm happy to elaborate: * this was all discussed back in September 2021 in phabricator, see https://phabricator.wikimedia.org/T277653#7378774 and https://phabricator.wikimedia.org/T277653#7381146. Our conclusion was to don't skip Buster. * we are hoping that there wont be a Buster->Bullseye migration for the grid. Hopefully by the time we need to remove Buster the Kubernetes backend will be 100% suitable solution for every tool. * this migration work started before Debian Bullseye was released, with our intention being to complete it before the release. For a couple of reasons the project was delayed. * in the grid case, the engineering effort to do a N+1 upgrade is lower than doing a N+2 upgrade. If we had tried a N+2 upgrade directly, things would have been much slower and difficult for us. Your concern about doing the migration dance twice is 100% valid, and the only way to future-proof your tool is to remove dependency on GridEngine and migrate it to the Kubernetes backend. regards. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Re: [Cloud-announce] [IMPORTANT] Announcing Toolforge Debian Stretch Grid Engine deprecation
On 2/16/22 17:34, Russell Blau wrote: Also, it is not possible to load Pywikibot in the tf-python39 runtime because a required module (requests, fromhttps://python-requests.org) is not available. What is the process for requesting (no pun intended) that this (or any other resource) be added to the image? See some documentation here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Kubernetes_python_jobs I just created it, and may need some polishing, but it should work! We will review pywikibot specific workflows and documents soon. regards. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Re: Kubernetes-based jobs engine -- how to use Python virtual environments?
On 4/2/22 14:53, Martin Urbanec wrote: Hello, I just received a dozen emails about grid engine migration. I tried to migrate my personal tool (tool.martin-urbanec) first. This tool currently generates a Jupyter-notebook based report daily. I do that by calling jupyter nbconvert --to html --execute community_configuration_usage.ipynb from a virtual environment where Jupyter is installed, together with a couple of other Python modules. I managed to create new virtual environment that works from the new Buster bastion, and it works when executed directly from the bastion, but I can't get it to execute via the k8s-based engine: Your problem may be related to bootstrapping the venv. See if this information can help you: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Kubernetes_python_jobs This is very similar to what JMC89 replied in the other email. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Toolforge] some updates on toolforge-jobs command line interface
Hi there, wanted to share a few small updates for the toolforge-jobs command line interface. The changes are being deployed right now. 1) listing jobs now shows less columns, use --long to show all columns. 2) the `containers` action has been renamed to `images`. A compatibilty period will exists, and you will see a warning if you use `containers`. 3) when listing images, the table header no longer mentions "Docker". These changes should be mostly cosmetic, and no functional or behavioral change is expected. Please report any problems you may find. regards. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Network operations today 2022-04-06
Hi there, Today 2022-04-06 we're performing some network maintenance operations on Cloud VPS that could affect all cloud egress/ingress traffic, including Toolforge. The cuts, if noticeable, should last a few minutes at most. Some operations were also conducted yesterday (without this email notice), and some unexpected hiccups occurred. That's why the email today. regards. -- Arturo Borrero Gonzalez Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Cloud-announce] Network maintenance
Hi there, We are currently working on replacing older hardware servers with newer ones, in particular those dedicated to cloud networking [0]. We have discovered a few shortcomings related mostly to network interface naming in the newer servers, and the latest openstack version behaving differently to what it used to be, and also some base operating system (debian) bugs [1]. Some of these are hardware-dependant and difficult to reproduce/anticipate in our staging environment. The result is that we are having a more challenging and noisy migration than we would like. We already had a few (brief) network outages trying to introduce the new servers into service. We'll try to keep things as stable as possible in the next few days until the migration is completed, but we can't discard having some more (brief) network outages until we are safely on the other side of the transition. I'll send another note when we finish this network maintenance is over. regards. [0] https://phabricator.wikimedia.org/T316284 [1] https://bugs.debian.org/989162 -- Arturo Borrero Gonzalez Senior Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Cloud-announce] Re: Network maintenance
On 10/6/22 12:04, Arturo Borrero Gonzalez wrote: Hi there, We are currently working on replacing older hardware servers with newer ones, in particular those dedicated to cloud networking [0]. We have discovered a few shortcomings related mostly to network interface naming in the newer servers, and the latest openstack version behaving differently to what it used to be, and also some base operating system (debian) bugs [1]. Some of these are hardware-dependant and difficult to reproduce/anticipate in our staging environment. The result is that we are having a more challenging and noisy migration than we would like. We already had a few (brief) network outages trying to introduce the new servers into service. We'll try to keep things as stable as possible in the next few days until the migration is completed, but we can't discard having some more (brief) network outages until we are safely on the other side of the transition. I'll send another note when we finish this network maintenance is over. Hi there, this has been completed. Should you see any network problem starting now, consider it unexpected and I invite you to report it. regards. -- Arturo Borrero Gonzalez Senior Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Some Cloud VPS virtual machines briefly unavailable today (and rebooted)
Hi there, Today 2022-11-22 at about 12:25 UTC, as part of a routine operation I reimaged/reformated a cloudvirt hypervisor without relocating all the virtual machines first. The data survived the reimage, but the 32 (!) affected virtual machines were briefly unavailable and then hard-rebooted. All virtual machines are now ACTIVE (up and running) from the openstack point of view, but please, let me know if you need assistance recovering them in any way. As of this writing we don't have any automation to ensure we only reimage empty hypervisors, but we're working on it, to prevent this kind of human errors in the future. regards. (and sorry!) (!) Affected virtual machines are: - ID: 78782628-4f9f-4263-84fc-06e767b3bfe1 Name: mx-wiki - ID: 1fa9f0d9-42e8-4273-bdb1-a7d49998c13f Name: synapse01 - ID: 2382fda0-e683-4d0c-95b6-bbbf323904d9 Name: canary1048-04 - ID: 4b570277-e51f-459d-bea2-394c5ad7bc92 Name: tools-sgeexec-10-16 - ID: 66529c1b-f3a3-4ff8-b30d-785f4f274965 Name: feature-store-test - ID: e153f69a-46a0-458a-ab50-de3d86aa861b Name: toolsbeta-test-k8s-worker-7 - ID: c3a2d1a9-f811-4da9-afba-3a113c8ff729 Name: wbregistry-02 - ID: 2b56c575-08a5-4def-87cb-bee5bd43e4f9 Name: prod - ID: 141ac13c-f0fa-46d3-9d2a-cede8bc854c6 Name: devtools-puppetdb1001 - ID: fdb15c24-0b41-42d6-9c4a-82afd1d9dcb9 Name: tools-sgeweblight-10-31 - ID: 56e55a31-8d32-455e-b650-b7194e71d2fd Name: runner-1023 - ID: cb4a87e4-264e-4c8f-8197-3efff54346de Name: runner-1022 - ID: 5b6b5733-565d-456e-a4fc-85ce669d3fd2 Name: deployment-mdb02 - ID: 75dce76d-36ad-4f9e-85e9-8a11ff6744db Name: wikibase-product-testing-2022 - ID: 868d3dca-3e5c-4089-89a9-2c7e756c3e31 Name: toolsbeta-cumin-1 - ID: 42ac6d8a-453a-4620-b4b7-9c97994c98fb Name: integration-agent-docker-1030 - ID: 084da652-503d-49a7-9ffa-98a0cd5335fd Name: toolsbeta-sgeexec-10-5 - ID: f098fe82-18b6-49a9-962d-9b8f1f989b14 Name: pcc-worker1001 - ID: 8eb272dc-8006-4e93-a966-5035809324d9 Name: deployment-mx03 - ID: e67d0e4c-e07c-4d9a-8ddb-cb0bc8efa388 Name: deployment-docker-api-gateway01 - ID: b958511a-10cb-4e62-bdbb-6da5013dd62f Name: soweego - ID: 62045cf9-59ed-44b9-a268-1c9f171b5aae Name: tools-package-builder-04 - ID: 0127e905-f52e-4ed4-b60d-260102a8e625 Name: pontoon-lb-02 - ID: 827bf744-262a-458b-951d-f2e9a377e075 Name: toolsbeta-test-k8s-ingress-3 - ID: 3e6c31d7-b4db-4a5f-a610-a74d0013f631 Name: pki-test01 - ID: 8893ba32-fb5c-4567-a242-b6c676978b7d Name: deployment-urldownloader03 - ID: f72e5b18-6376-4ccd-9e59-64447759e53f Name: deployment-deploy03 - ID: 006dea0a-a1eb-4de3-bf45-1a071ad87152 Name: kafka-test-cloud-2 - ID: e05220d7-8ca1-4d9f-a933-01a843286ea8 Name: toolsbeta-docker-imagebuilder-01 - ID: 416f445a-cad4-45c2-b32e-f17100f93eac Name: cloud-puppetmaster-05 - ID: 4e492051-25a3-4442-b8b9-1959f42917fe Name: tools-k8s-worker-76 - ID: df18863a-2da7-4951-aa69-936b3d889592 Name: deployment-docker-cpjobqueue01 -- Arturo Borrero Gonzalez Senior Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Puppet error emails on 2022-11-28
Hi there! On 2022-11-28 and 2022-11-29 there has been some misleading emails being sent: you may have receive one (or more) emails about puppet failures on your Cloud VPS virtual machine. Moreover, such emails were a bit contradictory, with messages like "No failed resources", and "No exceptions happened". There was a problem in the way the puppet errors were calculated that has been now fixed [0]. This does not affect Toolforge. sorry for the noise, regards. [0] https://gerrit.wikimedia.org/r/c/operations/puppet/+/861805/ -- Arturo Borrero Gonzalez Senior Site Reliability Engineer Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Toolforge jobs: briefly maintenance today 2023-01-10 @ 11:30 UTC
Hi there, the Toolforge jobs service [0] (the one you would use via the `toolforge-jobs` command line interface) will have a brief maintenance today 2023-01-10 @ 11:30 UTC (in about 15 minutes). We need to restart the API service and it will be down for a couple of minutes (perhaps even less). During that time, using the toolforge-jobs command line interface will most likely fail. regards. [0] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] New toolforge-jobs features
Hi there, The Toolforge jobs framework just got upgraded with a few new features: * support for custom logs * support for job failure retry policy * new behavior with job image listing * some initial validation of YAML files The documentation should be mostly up-to-date in wikitech: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework You can stop reading here unless you want more details :-) The custom log files feature will allow you do things like: * using a custom directory to store log files * merging stdout/stderr logs together into a single file * ignoring one of the two log streams The job retry policy allows to instruct the computing engine to restart jobs that failed, up to 5 times. Job images are now listed in a different format, and deprecated images are hidden by default, to encourage usage of newer ones. Regarding the YAML validation, the toolforge-jobs utility will now emit a warning if some key is unknown. We plan to make this more robust in the future, also providing a schema file. We don't usually announce upgrades, but this one in particular contained much awaited features. This is the result of hard work by several folks, in particular Taavi (community member) and Raymond (WMF contractor). Happy `toolforging`. Regards. -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] Toolforge: brief network maintenance today 2023-03-06
Hi there! Today 2023-03-06, in a few minutes, we will restart the Toolforge internal network, A brief interruption of network communications is expected during the maintenance. This is because we're re-deploying calico to the kubernetes cluster [0]. No action required on your side. regards. [0] https://phabricator.wikimedia.org/T328539 -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Cloud-announce] Re: Toolforge Kubernetes upgrade on 2023-04-03 (new date: 2023-04-10)
On 3/28/23 00:13, Taavi Väänänen wrote: Hi, We will be upgrading the Toolforge Kubernetes cluster next Monday (2023-04-03) starting at around 10:00 UTC. The expected impact is that tools running on the Kubernetes cluster will get restarted a couple of times over the course of the few hours it takes for us to upgrade the entire cluster. The ability to manage tools will remain operational. Since the version we're upgrading to (1.22) removes a bunch of deprecated Kubernetes APIs, tools that use kubectl and raw Kubernetes resources directly may want to check that they're on the latest available versions. The vast majority of tools that are only using the Jobs framework and/or the webservice command are not affected by these changes. This has been rescheduled to Monday 2023-04-10 to leave room for the other operations we have. regards. -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
[Cloud] [Cloud-announce] Re: Toolforge Kubernetes upgrade on 2023-04-03 (new date: 2023-04-10)
On 3/30/23 12:42, Arturo Borrero Gonzalez wrote: On 3/28/23 00:13, Taavi Väänänen wrote: Hi, We will be upgrading the Toolforge Kubernetes cluster next Monday (2023-04-03) starting at around 10:00 UTC. The expected impact is that tools running on the Kubernetes cluster will get restarted a couple of times over the course of the few hours it takes for us to upgrade the entire cluster. The ability to manage tools will remain operational. Since the version we're upgrading to (1.22) removes a bunch of deprecated Kubernetes APIs, tools that use kubectl and raw Kubernetes resources directly may want to check that they're on the latest available versions. The vast majority of tools that are only using the Jobs framework and/or the webservice command are not affected by these changes. This has been rescheduled to Monday 2023-04-10 to leave room for the other operations we have. Hi there! This is happening now! https://phabricator.wikimedia.org/T286856 regards. -- Arturo Borrero Gonzalez Senior SRE / Wikimedia Cloud Services Wikimedia Foundation ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ ___ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/