ere?
>>
> The number of slots is normally defined by the slots entry of a queue
> rather than via load sensors or via complex_values of a queue or host.
"slots" was used for a long time for the load_formula to use least used or fill
host strategy - without the necessity to ch
n/bash -e foo.sh
i.e., we don't want a shell by SGE but start our own, which will then read the
parameters on itself.
-- Reuti
>
> --
> Adam
>
> On Tue, Jul 21, 2015 at 10:11 AM, Alex Rothberg wrote:
>> How do I pass arguments to the interpreter rather than to the progr
green computing when you search for
"green computing gridengine". I don't recall whether it was in that
discussion thread, but care must be taken about the allowed power
cycles of a machine. Thermal stress might lead to shorter lifetime of
the hardware.
-- Reuti
Sent f
kkeeping of SGE which uses the additional
group ID to track the consumption of all processes.
I don't know whether there was any approach to change the internals of SGE too,
to use the accumulated information of the cgroups too.
-- Reuti
>
> Many thanks,
> Ondrej
>
> -
neous clusters
consisting of various OS; and Solaris or FreeBSD nodes could also be a part of
it.
-- Reuti
> While cgroups are probably better
> than GIDs for killing jobs.
> One thing we do with the additional GID is control access to GPUs by chgrping
> them from the prolog. Whil
core limit in the queue definition of 0.
-- Reuti
> I guess, that with –noshell, no shell is being started and hence limits
> defined in /etc/security/limits.conf are not effective any longer, right?
> As a simple example job, I used “/usr/bin/yes” + sent signal SIGSEGV to it.
>
> Am 25.01.2016 um 17:06 schrieb Ondrej Valousek :
>
> Thanks,
> H_core defaults to 0 already :-(
This is quite strange, as a normal user can only lower the ulimit. Can you
please submit a job with.
ulimit -aH
echo
ulimit -aS
-- Reuti
> Ondrej
>
> -Original Messa
bmit the job? So when the local jobs finish, the grid can then schedule
> them.
What error do you get in detail right now?
-- Reuti
> Best Regards and thanks is advance
> --
>
> DANIEL FINK
> Software Engineer
>
> This message may contain confidential and privilege
AFAIR there was a discussion before, as for now it's hard coded in the source.
Sure you can use:
$ qrsh "umask 002; umask"
all the time. Or do you refer to batch jobs where a starter_method could help?
-- Reuti
> Am 12.02.2016 um 12:46 schrieb Ondrej Valousek :
&g
, my primary GID will be G2.
> But if I submit a job into GE now, jobs are being submitted with G1 as a
> primary group.
>
> Is there any way to make GE to ‘remember’ the primary GID for jobs?
yes, it can be enabled by:
$ qconf -sconf
...
execd_params USE_QSUB_GID=TRUE
(`man sg
, scheduler, queue or exechost configuration
are live and won't disrupt any running jobs. You can even reboot the qmaster
and won't lose any running or pending jobs.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
s suspended.
>
> Is there a way to make sure that the short job suspends the normal job first
> and not the medium one, i.e. jobs in the least prioritised queue are
> suspended first?
How did you set up the subordination right now - do you use
> Am 23.03.2016 um 15:06 schrieb Gerik Huland :
>
> Hi Reuti,
>
> I will answer for Jakob here as I am the guy who has set up the cluster
> config.
>
> Btw. SGE version is 8.1.8.
>
>> How did you set up the subordination right now - do you use slotwise
&g
lt
in "sge_request"). It's advisable to replace the default runtime with the real
expected one. Despite the FIFO scheduling, backfilling could still occur as
this would use resources which would idle otherwise and won't influence the
FIFO of other jobs.
-- Reuti
other test, you can then lower the runtime of the 3 core job, so that it
fits in the remaining time of the 4 core job. In this case backfilling should
occur again, as the start of the 5 core job won't be delayed by the 3 core job
(as long as the estimated runtime are estimated as best as possi
d this more or less disables it.
> This workaround was already mentioned by Reuti 2 posts before.
Besides using the estimated runtime: attaching an exclusive complex to the
global execution host and requesting it during submission should also prevent
any backfilling, as there can only be on
le included from ../libs/sgeobj/sge_binding.h:43:0,
> from ../libs/sgeobj/sge_binding.c:39:
> ../libs/uti/sge_binding_hlp.h:45:21: fatal error: hwloc.h: No such file or
> directory
> # include
you will need to install the developer package of hwlock on your system.
Do you need Java support in DRMAA? You can otherwise invoke `aimk` with
"-nojava" and disregard this feature.
-- Reuti
> Am 22.06.2016 um 15:13 schrieb Himanshu Joshi :
>
> Thanks a lot Reuti,
>
> Now ./aimk command stuck with
>
> Could not find a path to Java
What does the messages file in the spool directory of the nodes say? Unless
it's local it's in $SGE_ROOT/default/spool/nodeXY/messages
-- Reuti
> Thanks,
> Thomas
> ___
> SGE-discuss mailing list
> SGE-discuss@liv.ac.uk
> ht
Am 20.12.2016 um 22:37 schrieb Thomas Beaudry:
> Hi Reuti,
>
> The jobs stay in the queue forever - and don't get processed. There are no
> messages in the spool directory for these jobs.
The "r" state is already after the "t" state. With NFS problems the
> Am 21.12.2016 um 04:03 schrieb Thomas Beaudry :
>
> Hi Reuti,
>
> It is: loglevel log_warning
Please set it to log_info, then you will get more output in the messages file
(`man sge_conf`). Maybe you get some hints then.
> In case it helps, here
sge_execd and the spawned
sge_shepherd's for all the started processes thereon and their children.
-- Reuti
> perform+ 69850 0.0 0.0 73656 56664 ?DN 11:45 0:01 mnc2nii
> -short -nii
> /NAS/home/perform-admin/ICBM-new-atlas/ICBM/process/transformbylandmark/t1/tal_t1_00256
Am 21.12.2016 um 21:11 schrieb Thomas Beaudry:
> Hi Reuti,
>
> I think it's good:
>
> 69847 ?S 0:00 \_ sge_shepherd-15272 -bg
> 69848 ?SNs0:00 \_ /bin/sh
> /opt/sge/default/spool/perf-hpc04/job_scripts/15272
> 69850 ?DN
tion then with the local file? A copy process might access
the file just sequentially while the application could do random seeks.
-- Reuti
> binding 1:NONE
> scheduling info:(Collecting of scheduler job information is
> turned off)
>
> __
Am 21.12.2016 um 21:47 schrieb Thomas Beaudry:
> Hi Reuti,
>
> My initial guess was that it was the disk access to the NAS since if I run
> the job several times, it will only fail a few times. I'm not quite sure how
> to trouble shoot it since I can't find logs.
his setting in SGE's configuration:
$qconf -sconf
…
execd_params ENABLE_ADDGRP_KILL=TRUE
-- Reuti
> Also, does SoGE have something like ‘qps’ to see processes associated with
> the job id?
>
> Thanks,
>
> Ondrej
> -
>
> The informatio
> Am 03.01.2017 um 10:08 schrieb Ondrej Valousek :
>
> Hi list,
>
> How do I find out on which node my already finished job landed?
>
> Qstat -s z does not show that information.
qacct -j
(at least for serial ones)
-- Reuti
> Thanks,
> Ondrej
>
> -
the exechost about the state of the jobs. If the exechost has no knowledge
of the jobs (possibly due to an emptied spool directory) he will never reply.
You can remove such jobs with:
$ qdel -f
-- Reuti
>
> Thanks,
> Ondrej
>
>
>
> -
>
> The information con
e process jumps out of the process tree) even for processes started by
the job's private `sshd`.
> Am I overlooking any other options?
I see additional problems: at least Open MPI will use a range of ports on its
own after it was started (by whatever means). I would assume that the other M
to use an async
`qrsh` to the nodes and which polls for the existence of the ring afterwards
before it continues.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
ill be deleted therein.
I hope I didn't forget anything.
Once in a while /var/spool/sge/context on the nodes needs to get a
spring-cleaning.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
used SnakeMake, but why is -v not supported? In the plain download I
can't spot -V in the demo files.
In case a long list needs to be exported (with or w/o an assigned value), it
could be put in each job specific directory or the users' home directories in a
.sge_request file -
only one argument given the [ command doesn't treat -z / -n
as an unary conditional operator any longer but as a valid string, which is
true in both cases.
- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iEYEARECAAYFAlj7bwwACgkQo/GbGkBRnRqbfACfXMqRpm9/
alike is mounted on the nodes anyway, and so there is no need to
install anything on new nodes. Just add the exechost temporarily as an admin
host and start the sgeexecd on the nodes. This will automatically create the
necessary spool directory for the new node.
- -- Reuti
-BEGIN PGP S
27;t know anything about the other, you most certainly need
to define an appropriate PE:
https://www.open-mpi.org/faq/?category=sge
BTW: I can't tell you whether openmpi-1.10.3-3.el7 was compiled with GridEngine
support by --with-sge, or whether it's necessary to compile it on your own
you have to define the PE like outlined in the Q&A. Use `qconf -ap
orte` and paste the listed settings.
-- Reuti
> Thanks & Regards
> Yasir Israr
>
> -Original Message-
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: 29 April 2017 05:10 PM
> To:
&g
Am 30.04.2017 um 02:33 schrieb Yasir Israr:
> Hi Reuti
> Its is asking slot and many other option should I take default one or need to
> customize some option ?? Can u share any output file of orte ?.
Please check the explanation in the link I posted before:
https://www.open-mp
smp orte
>
> But when I'm running job using below command it's showing mpirun not found,
> meanwhile when I simple run #mpirun -np 4 a.out it work without any issue.
> qrsh -pe orte 4 -b y mpirun -np 4 a.out
Do you have several nodes and installed Open MPI on all machin
Hi,
> Am 01.05.2017 um 10:07 schrieb ya...@orionsolutions.co.in:
>
> Yes
>
> Thanks & Regards
> Yasir Israr
>
> -Original Message-
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: 01 May 2017 01:29 PM
> To: ya...@orionsolutions.co
html
Unfortunately I have no idea how to prevent this besides changing the source.
- -- Reuti
>
> This also seems to cause issues for DRMAA (at least via the Python
> bindings) as job validation at submission time appears to be enabled by
> default.
>
> Has anyone any suggesti
SGE to handle them correctly. It always says the resource is not
> available.
Do you mean by an RQS?
- -- Reuti
>
> Can someone walk me through the steps required to set this up correctly? The
> docs I have found are rather cryptic.
>
> Mfg,
> Juan Jimenez
> Syste
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I wondered how you attached a complex to an ACL.
Am 16.05.2017 um 22:19 schrieb juanesteban.jime...@mdc-berlin.de:
> i don't know what that means.
>
> Get Outlook for Android
>
>
>
>
> On Tue, May 16, 2017 at 10:1
ONE,[@gpunodes=gpu=TRUE]
A similar syntax can be used xuser_lists and other entries too.
-- Reuti
>
> thanks!
>
> Juan
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ___
> SGE-discuss mailing list
> SGE
this use case?
Can you please post the output of `qhost` and `qconf -sq all.q`(assuming this
is the queue you set up).
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
INFINITY
> h_rss INFINITY
> s_vmemINFINITY
This looks ok. So, you have one job in the pending state. What is the output of:
$ qalter -w v
of this job.
-- Reuti
>
> On Fri, May 19, 2017 at 3:37 PM, Reuti wrote:
> Hi,
>
> > Am 19
such a syntax, but it would mean to run the script in question
under the specified user account. It would be more important: who is allowed to
write at the location where the locking directories are created. Without a
specified user it would be the owner of the particular job.
-- Reut
ase two setup challenges:
- Limit the access to certain nodes/queues.
- Track the usage of the GPUs on these nodes, so that each job gets an unique
one.
As William mentions below: are these nodes exclusively reserved for dedicated
users, or should other users be able to use them, but not the GPU?
l a GPU aware application.
-- Reuti
> Am 19.05.2017 um 16:36 schrieb juanesteban.jime...@mdc-berlin.de:
>
>> As William mentions below: are these nodes exclusively reserved for
>> dedicated users, or should other users be able to use them, but not the GPU?
>
> One node, re
and try to start
the script by hand as an ordinary user on the command line?
-- Reuti
> Mfg,
> Juan Jimenez
> System Administrator, HPC
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
>
>
> ____
> From: Reuti [re...@staff.uni-ma
Univa not to do it that way because it increases qmaster
> workload
I second queue increases the workload? I don't think that this is noticeable.
We have several queues for several $TMPDIR settings and it wasn't a problem up
to now. And I also don't see a high load on thi
Hi,
> Am 19.05.2017 um 15:46 schrieb Mukesh Chawla :
>
> Hi,
>
> Thanks a lot William and Reuti for the answers. Apparently I scheduled a
> .bat (windows bat file) job to check if it could run it and due to that the
> all.q got dropped for the queue list.
>
>
86_64-suse-linux/bin/ld :
> can't find -ldb
This means that libdb (and possibly its developer package) is missing. You can
search for it in YaST and "Search in" [x] Provides.
-- Reuti
> collect2: error: ld returned 1 exit status
> ../libs/spool/Makefile:108: recipe for
wise I could live with one
queue.
https://arc.liv.ac.uk/trac/SGE/ticket/1290
— Reuti
> Each queue and this is not just a problem with Univa Grid Engine increases
> the
> computational load for the scheduler. In small clusters maybe not a problem,
> but running it in huge compute
h_command is not used for a log in, but only with an
additional command on the invocation line. Please try rlogin_{|command,daemon}
which will produce the output for a plain login.
-- Reuti
>
> Mfg,
> Juan Jimenez
> System Administrator, HPC
> MDC Berlin
7 from med-qmaster.mdc-berlin.net
>
> What could be causing this?
Any firewall in place? The SSH login by SGE will use a random port, not
necessarily 22.
-- Reuti
> Again, the fact that qrsh does not make any effort to tell me where the error
> is coming from is making debugging this ha
one
started by SGE runs on a different port.
-- Reuti
> Mfg,
> Juan Jimenez
> System Administrator, BIH HPC Cluster
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
>
>
>
>
> On 29.05.17, 16:39, "Reuti" wrote:
>
>
>> Am 29.05.2017
> Am 29.05.2017 um 18:00 schrieb juanesteban.jime...@mdc-berlin.de:
>
> On 29.05.17, 17:56, "Reuti" wrote:
>
>
>> Am 29.05.2017 um 17:26 schrieb juanesteban.jime...@mdc-berlin.de:
>>
>> I am getting this very specific error:
>>
>> deb
ster need to be restarted when
> I made the change to the conf?
No. These entries are interpreted live. Just wait one or two minutes after the
change before you use it, until all exechosts honor the new setting.
-- Reuti
> Mfg,
> Juan Jimenez
> System Administrator, BIH HPC Cluster
&g
Are the system's limits in effect for these login sessions, which could be
lower. Do the system's limits match these settings?
-- Reuti
> Am 09.06.2017 um 14:02 schrieb "juanesteban.jime...@mdc-berlin.de"
> :
>
> Hi folks,
>
> I modified the SGE co
If your clients are using `qstat` that often, it might be good to throttle the
number of invocations of `qstat`. If they need this to start other jobs, one
could look into using the job_id/job_name top start the next, use `inotify`
(Linux) or `qevent` (SGE) and reducing the poll-load.
-- Reuti
_
ey will finish before the last
necessary slot for the parallel job becomes free.
- -- Reuti
http://gridengine.org/pipermail/users/2012-July/004090.html
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iEYEARECAAYFAlm5mJ0ACgkQo/GbGkBRnRoGhQCeMU0RBHIVWLdeNddKse2sw/jd
RNQAn
des could also be dedicated for the serial jobs only and
the rest of the cluster for the parallel ones.
-- Reuti
> --
> Gary Jackson
>
> On 9/13/17, 4:44 PM, "Reuti" wrote:
>
>-BEGIN PGP SIGNED MESSAGE-
>Hash: SHA1
>
>Hi,
>
>
also sent to the Bash, hence there it should be in the .profile too to
ignore the two signals I think.
-- Reuti
>
>
> --
> Mark Bergman voice: 215-746-4061
>
> mark.berg...@uphs.upenn.edu fax:
don't have all.q!
>
> Where i'm wrong in my configuration?
Did you use by chance "-now n"? This will route the job to a queue of type
BATCH. Specifying INTERACTIVE in a queue definition is more like "immediate"
than "interactive
Hi,
> Am 27.11.2017 um 17:35 schrieb Jerome :
>
> Dear Reuti.
>
> Yes, i use the -now no .
> So, there is no way to limit the time of a interactive job like i do here
It's possible to use a JSV that checks the CLIENT entry. In case it's not
"qsub&quo
builtin
>
> Any help on how to troubleshoot this would be greatly appreciated.
Is Bash really Bash in your setup? Sometimes it's a link to Dash. Although this
shouldn't hurt, it's worth to check before looking further.
-- Reuti
> Regards,
>
> ---
> Syl
d.
>
> ^Cerror: error while waiting for builtin IJS connection: "got select timeout"
Are all nodes connected in the same way to the machine you issued the `qlogin`
on? Is the firewall setting identical on all computing nodes?
-- Reuti
> ... this goes on forever until I type ctrl-C a
ideas ?
Does such a setup work on any machine? You could need several forwardings to
the KVM from the main OS I think - depending on the network setup of the KVM of
course.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.a
d get them back on line.
I agree with Daniel, that you first need to make the nodes accessible again.
Then one can look into the SGE setup.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
anagement Controller from the enclosure.
Does the headnode have two network interfaces? One for the utside world and one
for inside the cluster?
-- Reuti
>
> What are some troubleshooting steps I could try?
>
> thanks
> ___
>
6.2u5 qmaster?
You have a cluster already running with 6.2u5?
The only concern might be the new kernel version. But the 6.2u5 this can just
be edited in the $SGE_ROOT/util/arch script to include the new kernel version.
Would problems did you observe
lways full nodes,
you won't have this problem on a local scratch directory for $TMPDIR though.
===
BTW: did I mention it: no need to be root anywhere.
-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
quot;write $USER $(tty) <<<'Almost finished.'"
(I wonder how to do this is as ordinary user with systemd-run?)
2) Idle by user interaction and/or idle when a background command finished?
Setting a variable in Bash:
TMOUT=
end the shell will exit if there is no input or
73 matches
Mail list logo