On 12-07-03 04:26 PM, David Vossel wrote: > > This is not a definite. Perhaps you are experiencing this given the > pacemaker version you are running
Yes, that is absolutely possible and it certainly has been under consideration throughout this process. I did also recognize however, that I am running the latest stable (1.1.6) release and while I might be able to experiment with with a development branch in the lab, I could not use it in production. So while it would be an interesting experiment, my primary goal had to be getting 1.1.6 to run stably. > and the torture test you are running with all those parallel commands, It is worth keeping in mind that all of those parallel commands are just as parallel with the 4 node cluster as they are with the 8 (4 nodes actively modifying the CIB + 4 completely idle nodes) and 16 node clusters -- both of which failed. Just because I reduced the number of nodes doesn't mean that I reduced the parallelism any. The commands being run on each node are not serialized and are all launched in parallel on the 4 node cluster as much as they were with the 16 node cluster. So strictly speaking, it doesn't seem that parallelism in the CIB modifications are as much of a factor as simply the number of nodes in the cluster, even when some (i.e. in the 8 node test I did) of the nodes are entirely passive and not modifying the CIB at all. > but I wouldn't go as far as to say pacemaker cannot scale to more than a > handful of nodes. I'd totally welcome being shown the error of my ways. > I'm sure you know this, I just wanted to be explicit about this so there is > no confusion caused by people who may use your example as a concrete metric. But of course. In my experiments, it was clear that the cib process could peak a single core on my 12 core Xeons with just 4 nodes in the cluster at times. Therefore it is also clear that some time down the road, assuming CPU is the limiting factor here, it's quite easy to see how a faster CPU core, or multithreading the cib would allow for better scaling, but my point was simply at the current time, and again, assuming (since I don't know for sure what the limiting factor really is) CPU is the limiting factor here, somewhere between 4-8 nodes is the limit with more or less default tunings. > From the deployments I've seen on the mailing list and bug reports, the most > common clusters appear to be around the 2-6 node mark. Which seems consistent. > The messaging involved with keeping the all the local resource operations in > the CIB synced across that many nodes is pretty insane. Indeed, and I most certainly had considered that. What really threw a curve in that train of thought for me though was that even idle, non-CIB-modifying nodes (i.e. turning a working 4 node cluster into a non-working 8 node cluster by adding 4 nodes that do nothing with the CIB) can tip a working configuration over into non-working. I could most certainly see how the contention of 8 nodes all trying to jam stuff into the CIB might be taxing with all of the locking that needs to go on, etc, but for those 4 added idle nodes to add enough complexity to make an working 4 node cluster not work is puzzling. Puzzling enough (granted, to somebody who knows zilch about the messaging that goes on with CIB operations) to make is smell more like a bug than simple contention. > If you are set on using pacemaker, Well, I am not necessarily married to it. It did just seem like the tool with the critical mass behind it. As sketchy as it might seem to ask, (and I only am since you seem to be hinting that there might be a better tool for the job) is there a tool more suited to the job? > the best approach for scaling for your situation would probably be to try and > figure out how to break nodes into smaller clusters that are easier to manage. Indeed, that is what I ended up doing. Now my 16 node cluster is 4 4 node clusters. The problem with that though, is that when a node in a cluster fails, it has only 3 other nodes to spread it's resources around onto, and if 2 should fail, 2 nodes are trying to service twice their normal load. The benefit of larger clusters is clear. in giving pacemaker more nodes to evenly distribute resources to, impacting the load of other the other nodes minimally when one or more nodes of the cluster do fail. > I have not heard of a single deployment as large as you are thinking of. Heh. Not atypical of me to push the envelope I'm afraid. :-/ Cheers, and many thanks for your input. It is valuable to this discussion. b.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org