Am 29.01.2013 um 12:50 schrieb Stefano Bridi:
> On Tue, Jan 29, 2013 at 9:26 AM, William Hay <[email protected]> wrote:
>>
>> On 25 January 2013 17:21, Stefano Bridi <[email protected]> wrote:
>>>
>>> Hi all, is there a way to use the scratch area (local disk) on the
>>> compute node in a transparent way from the submitted script point of
>>> view?
>>> What I want to do is to copy to and from the compute node scratch area
>>> the job data using the prolog/epilog but I need also to start the
>>> submitted script in the scratch area instead of the cwd.
>>> Is there a way?
>>
>> Since you are mucking around with prolog and epilog I assume you have
>> administrative control of the cluster.
>> One solution would be to use a starter method to cd to $TMPDIR before
>> execing the real job. starter_method
>> is a bit of a swiss army chainsaw though (a flexible but dangeous tool).
>>
>> William
>
> Yes, I'm the admin: the problem I want to solve in this way is to
> lower the load of the central file server by using local scratch area
> on the "master" node as a scratch area.
> What I mean is that if the job is running in SMP or serial, it is the
> local disk (/scratch) and, if the job is using multiple nodes (mpi),
> it will be the local disk "/scratch" of the first node exported via
> NFS and mounted on the fly via autofs "/net/n0000/scratch/" on the
> other nodes.
> By doing this, the traffic on the central file server ("/home") is
> done only at the start and at the end of the job with the possibility
> to apply some filter to throw away useless redundant huge files
> generated by the software.
> Please don't laugh...Now I'm doing this by copying the files to the
> scratch area of the first node in the "start pe phase" and copying it
> back in the "stop pe phase": It was my first try and I discovered too
> late the existence of the prolog/epilog way which now I think is
> should be "the way" of doing this.
Whether you do it in the PE script or prolog/epilog is personal taste IMO. If
it's only necessary in case of a parallel run, the PE scripts might even be the
more appropriate place.
> Anyway, actually the users need to do a
>
> cd /net/`hostname -s`/scratch/${USER}.${JOB_ID}
Do you create these directories on your own instead of using the build in
$TMPDIR?
So this is done also on the machine where the jobscript runs, even so it would
be accessible in /scratch?
I'm sill not sure about the workflow in detail, but 2 ideas I got and maybe you
can make any use of them:
a) Submit the job with a hold to modify the -wd:
reuti@pc15370:~> qsub -h -l h=pc15370 test.sh
Your job 5532 ("test.sh") has been submitted
reuti@pc15370:~> qalter -wd /tmp/5532.1.all.q 5532
modified working directory of job 5532
reuti@pc15370:~> qrls 5532
modified hold of job 5532
You need to submit with a hold, as you don't know the jobnumber beforehand. So,
no `cd` by hand necessary but a wrapper around `qsub` to do these steps for you.
b) Use path aliasing in SGE in the file:
/usr/sge/default/common/sge_aliases
you can put a line for each exechost:
/dummy/ * pc15370 /tmp/
reuti@pc15370:~> qsub -h -l h=pc15370 -wd /foobar test.sh
Your job 5533 ("test.sh") has been submitted
reuti@pc15370:~> qalter -wd /dummy/5533.1.all.q 5533
modified working directory of job 5533
reuti@pc15370:~> qrls 5533
modified hold of job 5533
You can submit with a plain /scratch/ there, and it will be replaced before
execution to /tmp/ (man sge_aliases). Maybe it can be used to map /scratch/ to
/net/n0000/scratch/ or alike for each exechost.
NB: It looks like a bug that the flag to enable path aliasing isn't set by
`qalter`, hence already at submission time it's necessary to use -cwd or -wd
/foobar to set it with an arbitrary path.
-- Reuti
> in the job script they submit in order to keep the mechanism working.
> Now I have a new "user" which in fact is an automated system which I
> prefer not to tweak and so I think to adapt GE to that automated
> system.
> What I'm trying to achieve is to have a system configured in this way
> but "hardcoded" and transparent to the end user.
> I suppose that the prolog/epilog is the right place to do the
> first/last step (copying around data) and the starter_method is the
> right way for doing the other step, now I need to figure out how to
> do it and what side effects could emerge: any idea on the second
> question?
>
> Thanks
> Stefano
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users