Hi all, I was wondering if any of you can share your insights regarding federations. What unexpected caveats have you encountered?
We have here about about 15 "small" clusters (due to political and technical reasons), and most users have access to more than one cluster. Federation seems like a good solution instead of users running between clusters searching for available resources (we'll probably have 2-4 federations...). I would also want to have a single submission node, but then users will still need to select a cluster (we have an lmod module to select a cluster by setting PATH and SLURM_CONF). The solution I've come up is to create a dummy cluster with a lot of drained resources. But this seem like a not-so-good solution and might confuse users with always pending jobs, and will not work with array jobs. Also, is there a way to set such that by default jobs will be submitted to the current cluster instead of the federation (i.e. -M <cluster> by default)? I guess this can be done by a plugin (can it? or does it run after the sibling submissions?), but I was wondering if there's already a solution. Last question :), are there any issues with plugins? i.e. we have different plugins for different clusters, if they change some of the job parameters, should I be worried about about plugins from the origin cluster or from the sibling cluster? Will the job have several plugins from several clusters activated on it? Thanks in advance for any advice, Yair.