Hello, I have the following 3 1-socket nodes: node1: 4GB RAM 2-core: rank 0 rank 1 node2: 4GB RAM 4-core: rank 2 rank 3 rank 4 rank 5 node3: 8GB RAM 4-core: rank 6 rank 7 rank 8 rank 9
I have a model that takes a input and produces a output, and I want to run this model for N possible combinations of inputs. N is very big and i am limited by memory capacity. I am using the world collective and I want to know how to distribute N over the 10 ranks, given the mem specs of each node. For now, i have been simply dividing N by the number of ranks and scatter/gather that way. How can I improve without hardcoding the specs in my own c++ code? thanks,