-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Mark,
nothing can be logged because all disks are stale. I see only an output on the IPMI blocked for more than 120 seconds kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I found there could be a problem with the write caching of the disks. So i have reduced the available dirty cache: vm.dirty_background_ratio = 5 vm.dirty_ratio = 10 I hope this will solve the issue. You're right only a hard reboot can solve the problem because SSH login or other commands can't be executed because the whole system was frozen. That's the problem i can't get any deeper informations was happened. Thanks and cheers Heiko On 07.11.2015 00:19, Mark Kirkwood wrote: > Do you reboot the machine? It might be interesting just restart > the swift storage services and see if that brings everything right > again. > > Also check out the swift logs for what it is doing when things > start to hang, and also what dmesg is saying at the time. > > A bit more info about your setup would be good - I'm guessing you > have 12 swift object devices (on sata) and ? account and container > ones (on ssd)? > > Approx how many containers and objects live on each server (or if > easier tell us how many servers you have + replication level and > how many accounts, containers and objects in total)? > > Regards > > Mark > > On 04/11/15 22:41, Heiko Krämer wrote: >> Hi guys, >> >> we notice on our Swift storage nodes some problems with our >> disks. After some time they blocks all I/O requests to the disks. >> Therefore the server isn't working suddenly and needs a reboot. >> >> Serversetup: * Kernel 3.19.x * Ubuntu 14.04 * Swift (Kilo) * 12 x >> 2TB SATA (JBOD) * 2 x 480GB SSD (Raid1) * 32GB RAM * 8 Cores CPU >> >> The first try was an upgrade of the raid controller firmware and >> drivers. The second one some tests of writes and reads to each >> disk. I can't reproduce this issue but i heard on the last >> summit, Swift can be the problem by this issue. >> >> Do anyone solve this problem ? >> >> >> - Heiko >> >> >> >> _______________________________________________ Mailing list: >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> Post to : openstack@lists.openstack.org Unsubscribe : >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> > > > _______________________________________________ Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post > to : openstack@lists.openstack.org Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack - -- Anynines.com B.Sc. Informatik CIO Heiko Krämer Twitter: @anynines - - ---- Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbrücken Avarteq GmbH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAEBAgAGBQJWQHfNAAoJELxFogM4ixOFZecIAL6s+lYMCLV+Dcs8AlRJ2WrD PT9B3JDiF12vkJYMYs8WiKRQxQ4PXvI86vdKF7HM/xpZsnk44zjVjUYGZQlrF/Uk kuNURH47iZ7g2Kmib9rXyGJgzlWNwIV8pi7cQe9UxLk39kDZJBeO18CavX56L6oT zr4ZL/rcgKBWr9TG2oQvsunUJsPzyIXIA4yAc+C7R3VqAWipTzffyUY8Fgdzgzw2 gCvkuDzFBHMrrvzyh0Gz1q9+x3QA6pmcZkNm8qdcu8F1okGa3wpjkK+79q/hO8wC 0dxFXyPv/w6AcP2NnqtbakX2Htgp5DDjrNhpe9ZHHANJmd26G3NkrfjMsunybhk= =zw2Y -----END PGP SIGNATURE----- _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack