... >> I would really like someone that has these process pause problems to >> test a patch I have posted to see if it rectifies the situation. Our >> significant QE team at Red Hat doesn't see these problems and I can't >> generate them in engineering. It is possible your device drivers are >> taking spinlocks for extended periods or some other kernel problem is >> occurring. >> >> If you feel up to the task of building your own corosync, try out this >> patch: >> >> http://marc.info/?l=openais&m=130989380207300&w=2
I do not see any corosync pauses after applied it (right after it have been posted). Although I had vacations for two weeks, all other time I test cluster under really high CPU load (frankly speaking I lowered it a lot because of optimizations) and did not catch any pause (yet). One more thing I did is updated igb driver and returned its buffers to original 256 (bearing in mind that I originally had pause problem after I increased that buffers to 4096). Do not know if it has influence. > I'd love to test this, but it'll take a few weeks. > The machines are already productive and we don't have comparable test > machines. > I'm currently (acutally ;) having a few days off, and when I'm back at the > office, > I'll update the Corosync version to v1.4.1 (because of the retransmit list > problem) -- does the patch cleanly apply to v1.4.1? yes Best, Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker