On 20 June 2012 10:40, Jake Carroll <[email protected]> wrote:
> Hi all.
>
> A probably not so uncommon question today with what is probably a simple
> answer forthcoming.
>
> I've currently got a situation where one of the storage arrays I'm using to
> share "big" NFS to my compute nodes is under a significant amount of 10GbE
> I/O strain. The array can't handle the concurrency I'm currently throwing at
> it.
>
> To that end – I started contemplating somehow forcing queues to somehow
> "transfer" the data working sets or resources requested of the storage to
> local /scratch "inter-node". Each node has some decently speedy 15K SAS
> spindles inside it. I thought it'd be nice to see if we could reduce latency
> and contention on the 10GbE connected array a little by doing this.
>
> We found this:
>
> https://www.nbcr.net/pub/wiki/index.php?title=Reduce_I/O_bottleneck_by_using_compute_node_local_scratch_disks
>
> But I am sure there is a lot more to it.
>
> I know of a configuration item I've seen called the "transfer" queue, but
> I've got a feeling it's got nothing to do with this, and is more used as a
> mechanism to programmatically forward jobs to other SGE queues et al.
>
> Looking for some guidance on how we might programmatically enforce the jobs
> at "wire up" time to transfer working sets to node local /scratch to
> increase efficiencies (perhaps?).
>
> Thanks, all!
>
> --JC
>
I've read through all this and I'm still not sure what you are trying
to achieve.
Have a data set used by a lot of jobs available locally on all nodes?
Transfer copies of a dataset used by a parallel job to each node in
the job without involving the central NFS filestore?
Some sort of P2P transfer set up so a job could obtain the dataset it
needs from a node (or nodes) where it already exists?
Prevent people who could use local scratch(ie single node jobs) from using NFS?

William

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to