Replications relies on rsync. Check if rsync is working correctly on all swift nodes. If you can please provide me with account-server.conf, container-server.conf, proxy-server.conf. I had plenty of problems with replicators too, so I'll try to help you.
Regards Piotr P.S. try out http://markdown-here.com/ while attaching .conf files. Just a suggestion. :) 2013/9/26 Mike Preston <mike.pres...@synety.com> > I know it is poor form to reply to yourself, but I would appreciate it > if anyone has any insight on this problem.**** > > ** ** > > *Mike Preston* > > Infrastructure Team | SYNETY**** > > www.synety.com**** > > ** ** > > direct: 0116 424 4016**** > > mobile: 07950 892038**** > > main: 0116 424 4000**** > > ** ** > > ** ** > > *From:* Mike Preston [mailto:mike.pres...@synety.com] > *Sent:* 24 September 2013 09:52 > *To:* openstack@lists.openstack.org > > *Subject:* Re: [Openstack] Replication error**** > > ** ** > > root@storage-proxy-01:~/swift# swift-ring-builder object.builder validate* > *** > > root@storage-proxy-01:~/swift# echo $?**** > > 0**** > > ** ** > > I ran md5sum on the ring files on both the proxy (where we generate them) > and the nodes and confirmed that they are identical.**** > > ** ** > > root@storage-proxy-01:~/swift# swift-ring-builder object.builder**** > > object.builder, build version 72**** > > 65536 partitions, 3 replicas, 4 zones, 32 devices, 999.99 balance**** > > The minimum number of hours before a partition can be reassigned is 3**** > > Devices: id zone ip address port name weight partitions > balance meta**** > > 0 1 10.20.15.51 6000 sdb1 3000.00 7123 > 1.44**** > > 1 1 10.20.15.51 6000 sdc1 3000.00 7123 > 1.44**** > > 2 1 10.20.15.51 6000 sdd1 3000.00 7122 > 1.43**** > > 3 1 10.20.15.51 6000 sde1 3000.00 7123 > 1.44**** > > 4 1 10.20.15.51 6000 sdf1 3000.00 7122 > 1.43**** > > 5 1 10.20.15.51 6000 sdg1 3000.00 7123 > 1.44**** > > 6 3 10.20.15.51 6000 sdh1 0.00 1273 > 999.99**** > > 7 3 10.20.15.51 6000 sdi1 0.00 1476 > 999.99**** > > 8 2 10.20.15.52 6000 sdb1 3000.00 7122 > 1.43**** > > 9 2 10.20.15.52 6000 sdc1 3000.00 7122 > 1.43**** > > 10 2 10.20.15.52 6000 sdd1 3000.00 7122 > 1.43**** > > 11 2 10.20.15.52 6000 sde1 3000.00 7122 > 1.43**** > > 12 2 10.20.15.52 6000 sdf1 3000.00 7122 > 1.43**** > > 13 2 10.20.15.52 6000 sdg1 3000.00 7122 > 1.43**** > > 14 3 10.20.15.52 6000 sdh1 0.00 1378 > 999.99**** > > 15 3 10.20.15.52 6000 sdi1 0.00 997 > 999.99**** > > 16 3 10.20.15.53 6000 sas0 3000.00 6130 > -12.70**** > > 17 3 10.20.15.53 6000 sas1 3000.00 6130 > -12.70**** > > 18 3 10.20.15.53 6000 sas2 3000.00 6129 > -12.71**** > > 19 3 10.20.15.53 6000 sas3 3000.00 6130 > -12.70**** > > 20 3 10.20.15.53 6000 sas4 3000.00 6130 > -12.70**** > > 21 3 10.20.15.53 6000 sas5 3000.00 6130 > -12.70**** > > 22 3 10.20.15.53 6000 sas6 3000.00 6129 > -12.71**** > > 23 3 10.20.15.53 6000 sas7 3000.00 6129 > -12.71**** > > 24 4 10.20.15.54 6000 sas0 3000.00 7122 > 1.43**** > > 25 4 10.20.15.54 6000 sas1 3000.00 7122 > 1.43**** > > 26 4 10.20.15.54 6000 sas2 3000.00 7123 > 1.44**** > > 27 4 10.20.15.54 6000 sas3 3000.00 7123 > 1.44**** > > 28 4 10.20.15.54 6000 sas4 3000.00 7122 > 1.43**** > > 29 4 10.20.15.54 6000 sas5 3000.00 7122 > 1.43**** > > 30 4 10.20.15.54 6000 sas6 3000.00 7123 > 1.44**** > > 31 4 10.20.15.54 6000 sas7 3000.00 7122 > 1.43**** > > ** ** > > (We are currently migrating data between boxes due to cluster hardware > replacement, which is why zone 3 is weighted as such on the first 2 nodes) > **** > > ** ** > > Filelist attached (for the objects/ directory on the devices)… **** > > but I see nothing out of place.**** > > ** ** > > I’ll run a full fsck on the drives tonight, try to rule that out.**** > > ** ** > > Thanks for your help.**** > > ** ** > > ** ** > > ** ** > > *Mike Preston* > > Infrastructure Team | SYNETY**** > > www.synety.com**** > > ** ** > > direct: 0116 424 4016**** > > mobile: 07950 892038**** > > main: 0116 424 4000**** > > ** ** > > ** ** > > *From:* Clay Gerrard [mailto:clay.gerr...@gmail.com<clay.gerr...@gmail.com>] > > *Sent:* 23 September 2013 20:34 > *To:* Mike Preston > *Cc:* openstack@lists.openstack.org > *Subject:* Re: [Openstack] Replication error**** > > ** ** > > Run `swift-ring-builder /etc/swift/object.builder validate` - it should > have no errors and exit 0. Can you provide a paste of the output from > `swift-ring-builder /etc/swift/object.builder` as well - it should list > some general info about the ring (number of replicas, and list of devices). > Rebalance the ring and make sure it's been distributed to all nodes.**** > > ** ** > > The particular line you're seeing pop up in the traceback seems to be > looking for all of the nodes for a particular partition it found in the > objects' dir. I'm not seeing any local sanitization [1] around those top > level directory names, so maybe it's just some garbage that created there > outside of swift, or some file system corruption?**** > > ** ** > > Can you provide the output from `ls /srv/node/objects` (or wherever you > have devices configured)**** > > ** ** > > -Clay**** > > ** ** > > 1. https://bugs.launchpad.net/swift/+bug/1229372**** > > ** ** > > On Mon, Sep 23, 2013 at 2:34 AM, Mike Preston <mike.pres...@synety.com> > wrote:**** > > Hi, **** > > **** > > We are seeing a replication error on swift. The error only is seen on a > single node, the other nodes appear to be working fine.**** > > Installed version is debian wheezy with swift 1.4.8-2+deb7u1 **** > > Sep 23 10:33:03 storage-node-01 object-replicator Starting object > replication pass.**** > > Sep 23 10:33:03 storage-node-01 object-replicator Exception in top-level > replication loop: #012Traceback (most recent call last):#012 File > "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 564, in > replicate#012 jobs = self.collect_jobs()#012 File > "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 536, in > collect_jobs#012 self.object_ring.get_part_nodes(int(partition))#012 > File "/usr/lib/python2.7/dist-packages/swift/common/ring/ring.py", line > 103, in get_part_nodes#012 return [self.devs[r[part]] for r in > self._replica2part2dev_id]#012IndexError: array index out of range**** > > Sep 23 10:33:03 storage-node-01 object-replicator Nothing replicated for > 0.728466033936 seconds.**** > > Sep 23 10:33:03 storage-node-01 object-replicator Object replication > complete. (0.01 minutes)**** > > Can anyone shed any light on this or next steps in debugging it or fixing > it?**** > > **** > > **** > > **** > > *Mike Preston***** > > Infrastructure Team | SYNETY**** > > www.synety.com**** > > **** > > direct: 0116 424 4016**** > > mobile: 07950 892038**** > > main: 0116 424 4000**** > > **** > > **** > > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack**** > > ** ** > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack