Re: [gridengine users] automatically use of scratch area on compute node

Reuti Tue, 29 Jan 2013 10:41:38 -0800

Am 29.01.2013 um 12:50 schrieb Stefano Bridi:

> On Tue, Jan 29, 2013 at 9:26 AM, William Hay <[email protected]> wrote:
>> 
>> On 25 January 2013 17:21, Stefano Bridi <[email protected]> wrote:
>>> 
>>> Hi all, is there a way to use the scratch area  (local disk) on the
>>> compute node in a transparent way from the submitted script point of
>>> view?
>>> What I want to do is to copy to and from the compute node scratch area
>>> the job data using the prolog/epilog but I need also to start the
>>> submitted script in the scratch area instead of the cwd.
>>> Is there a way?
>> 
>> Since you are mucking around with prolog and epilog I assume you have
>> administrative control of the cluster.
>> One solution would be to use a starter method to cd to $TMPDIR before
>> execing the real job.  starter_method
>> is a bit of a swiss army chainsaw though (a flexible but dangeous tool).
>> 
>> William
> 
> Yes, I'm the admin: the problem I want to solve in this way is to
> lower the load of the central file server by using local scratch area
> on the "master" node as a scratch area.
> What I mean is that if the job is running in SMP or serial, it is the
> local disk (/scratch) and, if the job is using multiple nodes (mpi),
> it will be the local disk "/scratch" of the first node exported via
> NFS and mounted on the fly via autofs "/net/n0000/scratch/" on the
> other nodes.
> By doing this, the traffic on the central file server ("/home") is
> done only at the start and at the end of the job with the possibility
> to apply some filter to throw away useless redundant huge files
> generated by the software.
> Please don't laugh...Now I'm doing this by copying the files to the
> scratch area of the first node in the "start pe phase" and copying it
> back in the "stop pe phase": It was my first try and I discovered too
> late the existence of  the prolog/epilog way which now I think is
> should be "the way" of doing this.


Whether you do it in the PE script or prolog/epilog is personal taste IMO. If 
it's only necessary in case of a parallel run, the PE scripts might even be the 
more appropriate place.


> Anyway, actually the users need to do a
> 
> cd /net/`hostname -s`/scratch/${USER}.${JOB_ID}

Do you create these directories on your own instead of using the build in 
$TMPDIR?

So this is done also on the machine where the jobscript runs, even so it would 
be accessible in /scratch?

I'm sill not sure about the workflow in detail, but 2 ideas I got and maybe you 
can make any use of them:

a) Submit the job with a hold to modify the -wd:

reuti@pc15370:~> qsub -h -l h=pc15370 test.sh
Your job 5532 ("test.sh") has been submitted
reuti@pc15370:~> qalter -wd /tmp/5532.1.all.q 5532
modified working directory of job 5532
reuti@pc15370:~> qrls 5532
modified hold of job 5532

You need to submit with a hold, as you don't know the jobnumber beforehand. So, 
no `cd` by hand necessary but a wrapper around `qsub` to do these steps for you.

b) Use path aliasing in SGE in the file:

/usr/sge/default/common/sge_aliases

you can put a line for each exechost:

/dummy/                  *           pc15370    /tmp/

reuti@pc15370:~> qsub -h -l h=pc15370 -wd /foobar test.sh
Your job 5533 ("test.sh") has been submitted
reuti@pc15370:~> qalter -wd /dummy/5533.1.all.q 5533
modified working directory of job 5533
reuti@pc15370:~> qrls 5533
modified hold of job 5533

You can submit with a plain /scratch/ there, and it will be replaced before 
execution to /tmp/ (man sge_aliases). Maybe it can be used to map /scratch/ to 
/net/n0000/scratch/ or alike for each exechost.

NB: It looks like a bug that the flag to enable path aliasing isn't set by 
`qalter`, hence already at submission time it's necessary to use -cwd or -wd 
/foobar to set it with an arbitrary path.

-- Reuti


> in the job script they submit in order to keep the mechanism working.
> Now I have a new "user" which in fact is an automated system which I
> prefer not to tweak and so I think to adapt GE to that automated
> system.
> What I'm trying to achieve is to have a system configured in this way
> but "hardcoded" and transparent to the end user.
> I suppose that the prolog/epilog is the right place to do the
> first/last step (copying around data) and the starter_method is the
> right way for doing the other step, now I need to figure out  how to
> do it and what side effects could emerge: any idea on the second
> question?
> 
> Thanks
> Stefano
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] automatically use of scratch area on compute node

Reply via email to