Hi folks, This is Emre from GROU.PS -- we operate an OpenStack Swift cluster since 2011, it's been great.
We have a standard installation with a single proxy server (proxy1) and three storage servers (storage1, storage2, storage3) each with 5x1TB disks. Following a chain of mistakes initiated by our hosting provider, which changed the wrong disk on one of our OpenStack Swift storage servers, we ended up with the following situation: root@proxy1:/etc/swift# swift-ring-builder container.builder container.builder, build version 41 1048576 partitions, 3 replicas, 3 zones, 14 devices, 60.00 balance The minimum number of hours before a partition can be reassigned is 1 Devices: id zone ip address port name weight partitions balance meta 0 1 192.168.1.3 6001 c0d1p1 80.00 262144 37.50 1 1 192.168.1.3 6001 c0d2p1 80.00 262144 37.50 2 1 192.168.1.3 6001 c0d3p1 80.00 262144 37.50 3 2 192.168.1.4 6001 c0d1p1 100.00 238312 -0.00 4 2 192.168.1.4 6001 c0d2p1 100.00 238312 -0.00 5 2 192.168.1.4 6001 c0d3p1 100.00 238312 -0.00 6 3 192.168.1.5 6001 c0d1p1 100.00 209715 -12.00 7 3 192.168.1.5 6001 c0d2p1 100.00 209715 -12.00 8 3 192.168.1.5 6001 c0d3p1 100.00 209715 -12.00 10 2 192.168.1.4 6001 c0d5p1 100.00 238312 -0.00 11 3 192.168.1.5 6001 c0d5p1 100.00 209716 -12.00 14 3 192.168.1.5 6001 c0d6p1 100.00 209715 -12.00 15 1 192.168.1.3 6001 c0d5p1 80.00 262144 37.50 16 2 192.168.1.4 6001 c0d6p1 100.00 95328 -60.00 root@proxy1:/etc/swift# ssh storage1 df -h Filesystem Size Used Avail Use% Mounted on /dev/cciss/c0d0p5 1.8T 38G 1.7T 3% / none 3.9G 220K 3.9G 1% /dev none 4.0G 0 4.0G 0% /dev/shm none 4.0G 60K 4.0G 1% /var/run none 4.0G 0 4.0G 0% /var/lock none 4.0G 0 4.0G 0% /lib/init/rw /dev/cciss/c0d1p1 1.9T 1.9T 239M 100% /srv/node/c0d1p1 /dev/cciss/c0d2p1 1.9T 1.9T 210M 100% /srv/node/c0d2p1 /dev/cciss/c0d3p1 1.9T 1.9T 104K 100% /srv/node/c0d3p1 /dev/cciss/c0d5p1 1.9T 1.2T 643G 66% /srv/node/c0d5p1 /dev/cciss/c0d0p2 92M 51M 37M 59% /boot /dev/cciss/c0d0p3 1.9G 35M 1.8G 2% /tmp root@proxy1:/etc/swift# ssh storage2 df -h Filesystem Size Used Avail Use% Mounted on /dev/cciss/c0d0p5 1.8T 33G 1.7T 2% / none 3.9G 220K 3.9G 1% /dev none 4.0G 0 4.0G 0% /dev/shm none 4.0G 108K 4.0G 1% /var/run none 4.0G 0 4.0G 0% /var/lock none 4.0G 0 4.0G 0% /lib/init/rw /dev/cciss/c0d0p3 1.9G 35M 1.8G 2% /tmp /dev/cciss/c0d0p2 92M 51M 37M 59% /boot /dev/cciss/c0d1p1 1.9T 1.5T 375G 80% /srv/node/c0d1p1 /dev/cciss/c0d2p1 1.9T 1.5T 385G 80% /srv/node/c0d2p1 /dev/cciss/c0d3p1 1.9T 1.5T 382G 80% /srv/node/c0d3p1 /dev/cciss/c0d4p1 1.9T 1.5T 377G 80% /srv/node/c0d5p1 /dev/cciss/c0d5p1 1.9T 519G 1.4T 28% /srv/node/c0d6p1 root@proxy1:/etc/swift# ssh storage3 df -h Filesystem Size Used Avail Use% Mounted on /dev/cciss/c0d0p5 1.8T 90G 1.7T 6% / none 3.9G 224K 3.9G 1% /dev none 4.0G 0 4.0G 0% /dev/shm none 4.0G 112K 4.0G 1% /var/run none 4.0G 0 4.0G 0% /var/lock none 4.0G 0 4.0G 0% /lib/init/rw /dev/cciss/c0d1p1 1.9T 1.1T 741G 61% /srv/node/c0d1p1 /dev/cciss/c0d2p1 1.9T 1.1T 741G 61% /srv/node/c0d2p1 /dev/cciss/c0d3p1 1.9T 1.1T 758G 60% /srv/node/c0d3p1 /dev/cciss/c0d5p1 1.9T 1.1T 765G 59% /srv/node/c0d5p1 /dev/cciss/c0d6p1 1.9T 1.1T 772G 59% /srv/node/c0d6p1 /dev/cciss/c0d0p2 92M 51M 37M 59% /boot /dev/cciss/c0d0p3 1.9G 35M 1.8G 2% /tmp As you can see: * Balances are messed up and they don't get to a normal state no matter how long we wait. Although the behavior for the end-user is still stable. * We tried erasing the contents a disk on storage1 (/dev/cciss/c0d5p1) that was 100% full before all others (others were still 95%) and this disk filled up pretty quickly, while others quickly catching up to 100%. We were expecting each to balance to the same level because storage2 and storage3 (with 5 disks each) are set to 100 in weight, whereas storage1 (with 4 disks only) is set to 80 in weight. * There was a failing disk with storage2, so we replaced that (/dev/cciss/c0d5p1) but it is not filling up as quickly. Storage2 is almost 80% * Storage3 is healthy. * Storage1 is currently taken offline. Because it's been failing constantly and its disk space doesn't balance. What is the best course of action to take in this scenario. I believe, we can either: 1) Completely dump storage1. Remove zone-1 from the proxy. Get a new server with similar setup and add it as a new zone on proxy accordingly. 2) Stop storage1. Erase the contents of full disks on storage1. Switch to proxy, remove the full disks from the cluster, then add them as new devices. (again, a delay may 3) Or something completely different? My fear is, with both the first and second alternatives, if there's a delay between removing the zones or disks, and adding new ones, the other zones/disks would fill up. Therefore I would need to choose the alternative where there would be the minimal amount of delay. Last but not least, please note that this is swift installation is outdated, never been updated since installation. (I am to blame!) Thanks for your suggestions, directions in advance. Cheers, -- Emre
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack