[Also posting to Bugzilla] According to the ops team, there are a number of separate and unrelated ops issues that have come up in the last few days:
1) Not all users are experiencing slowness, but a subset of users are. There's no definite smoking gun, but the most likely cause are ongoing issues with one of our routers in Tampa. The router will have to be taken down for maintenance to fix this issue, and order to perform this maintenance operation with minimal disruption, we need to have key ops engineers on standby to deal with any issues that may arise. My understanding is that the best available maintenance window is Tuesday next week. 2) There was a software deployment on May 18 which caused an application server overload; it was reverted the same day. 3) The mobile servers are currently intermittently overloaded, throwing internal server errors, and servers to provide additional capacity have been racked today. 4) In case you're looking at it, ganglia.wikimedia.org is not displaying correct server status information (as of yesterday); it's in the process of being fixed. We're still in the process of setting up a new primary data center location in Ashburn, VA, which will give us higher site reliability in general, and also create the possibility of safe failover in maintenance or emergency situations. -- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l