We had this problem lots, and I can't quite remember how I solved it - I
think it might've been either a JSV or a qsub wrapper that shoves all
GPU jobs into the superordinate queue.
Now that I'm thinking about this again - does the subordinate queue
setting accept 'queueu@@hostgroup' syntax lik
Hello,
from a kernel/mechanism point of view, it is perfectly possible to
restrict device access using cgroups. I use that on my current cluster,
works really well (both for things like CPU cores and GPUs - you only
see what you request, even using something like 'nvidia-smi').
Sadly, my curre
I was about to ask that :)
$SGE_ROOT ought to be accessible from (the) submit host(s), at least. So
in general, you should be able to access it from there?
(Note that you can also tell qacct where the accounting file lives - it
assumes a default location, but the file does not have be in that l
...or one can just use logrotate (rather than run an extra cron job).
It's surprisingly good at that sort of thing ;)
Tina
On 29/01/2019 16:21, Reuti wrote:
> Hi,
>
>> Am 29.01.2019 um 17:09 schrieb John Young :
>>
>> The gridengine accounting file on our cluster has gotten
>> rather large. I
ed and has a better future?
> > I'm not interested in fancy new things like mesos that have a different
> > programming model or are too new.
> >
> > Dan
> > ___
> > users mailing list
> > users@gridengine.org
this behavior.
>
> Thanks for sharing,
>
> Paul.
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
Research Computing a
t; Thanks,
>
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine
> E: d...@med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690
--
Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
Resear
l > accounting" ?
>
> Thanks for any help.
>
>
>
>
>
> -Noel Benitez, Salk iT Dept.
--
Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
Research Computing and Support Services, Academic IT
IT Services, University of Oxford
http://www.arc.ox.ac.
I dealt with a similar problem by way of using the pam-regex[1] module to
simply transform all entered usernames to lower case... as long as your user
names (on the Linux side) are all *supposed* to be lower case, that should do
the trick :)
(Had the bonus of also solving all sorts of other pr
Only time I ever had problems with duplicate IDs it was simply because
they rolled over - been a while ago though (might've been SGE6.2,
actually - I think that might've hit max job ID at 99 ). You'd have
to run through a very large amount of jobs to hit that monthly, though.
Tina
On 14/0
me reason.
My trick won't work for the random port on the submit host.
William
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
ny thoughts you might have are appreciated.
Thanks
Biggles
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Cam
___
users mailing list
users@gridengine.org <mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Compu
Would spare me rather a lot of helpdesk calls :)
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged materi
that node.
Any suggestions?
Thanks,
Mark
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail a
.wipro.com
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Com
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contai
_
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confiden
MoD ("metrics on demand") look pretty interesting.
Would love to chat about how people have made XDMoD and other variants
work with Grid Engine(s) -- can we get a little thread going on best
practices and recommendations for 3rd party reporting/metrics tools?
Suspect there wo
the user community to see of people have run into this before.
I'll write a new parser or converter if I have to. Don't' want to
reinvent the wheel if I don't' have to ...
Regards,
Chris
___
users mailing list
users@gridengine.or
trying to do anything fancy like people connecting to other
nodes though; I simply needed a X forwarding capable qlogin :)
Tina
On 13/10/14 22:45, Prentice Bisbal wrote:
Is that ssh conf dynamically generated to limit access only to nodes
that SGE has assigned to that user?
Prentice
On 10/13/2
ay to configure SSH to service qrsh and
qlogin but don't expose SSH directly to the users?
Regards,
Derrick
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Admi
__
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachm
queue instances it could run
on in the 'cannot run in queue' stanzas, so unfortunately I'm not much
wiser. (qstat -F for a queue instance that would fit does give me
'qc:slots=8').
Tina
On 07/07/14 17:24, Tina Friedrich wrote:
Okay, I checked. All jobs in the queue h
7:10, Tina Friedrich wrote:
Hi William,
On 07/07/14 15:22, William Hay wrote:
On Fri, 4 Jul 2014 10:37:56 +
Tina Friedrich wrote:
Hello list,
I have a couple of jobs sitting in the queue (been there for ages)
that never seem to start (they're in qw).
qalter -w p #JOBNO says "verifica
Hi William,
On 07/07/14 15:22, William Hay wrote:
On Fri, 4 Jul 2014 10:37:56 +
Tina Friedrich wrote:
Hello list,
I have a couple of jobs sitting in the queue (been there for ages)
that never seem to start (they're in qw).
qalter -w p #JOBNO says "verification: foun
sn't get scheduled, still.
Tina
On 04/07/14 13:16, Roberto Nunnari wrote:
Il 04.07.2014 12:37, Tina Friedrich ha scritto:
Hello list,
I have a couple of jobs sitting in the queue (been there for ages) that
never seem to start (they're in qw).
qalter -w p #JOBNO says "verification
running? (It's not a license or anything either, I know that).
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
Hi Reuti,
On 27/06/14 12:05, Reuti wrote:
Hi,
Am 27.06.2014 um 12:37 schrieb Tina Friedrich:
maybe someone here has an idea where to look for this...
We have some software - I think its a bash script that calls a python script.
Up until very recently, it ran just fine. And then it started
after the python script starts (and should print something),
in the runs where it works a 'write' is called and if it fails it doesn't.
Anyone has any idea?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Inn
blue-bolt.com> |
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any
list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
d or enabled.
Thanks,
Joseph
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e
ost
Cluster Nodes
Master Host
|
Any insight would be helpful
Thanks
Varun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond
s horrendous!
It does doesn't it? I've gone the consumable way, and will simply have
to not use MPI. Such hardship :)
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail a
lowing up.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 7
jobpernode) with 'slots
1'.
-Hugh
-Original Message-
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On
Behalf Of Skylar Thompson
Sent: Wednesday, April 02, 2014 11:04 AM
To: Tina Friedrich
Cc: users@gridengine.org
Subject: Re: [gridengine users] array j
hich option I favour, still.
Tina
On 02/04/14 16:04, Skylar Thompson wrote:
An exclusive host consumable is the right way to approach the problem. If
the task elements might be part of a parallel environment, then you'll want
to set the scaling to JOB as well.
On Wed, Apr 02, 2014 a
that's the
best way to handle this?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material,
Hi Reuti,
On 01/04/14 15:40, Reuti wrote:
Hi Tina,
Am 31.03.2014 um 17:08 schrieb Tina Friedrich:
On 31/03/14 15:05, Reuti wrote:
Am 31.03.2014 um 14:14 schrieb Tina Friedrich:
On 31/03/14 12:47, Reuti wrote:
Hi,
Am 31.03.2014 um 12:22 schrieb Tina Friedrich:
just double checking
Hi Reuti,
On 31/03/14 15:05, Reuti wrote:
Am 31.03.2014 um 14:14 schrieb Tina Friedrich:
On 31/03/14 12:47, Reuti wrote:
Hi,
Am 31.03.2014 um 12:22 schrieb Tina Friedrich:
just double checking - there still is no way to use anything
but a user's primary group for ACLs etc?
(Directl
Hi Reuti,
On 31/03/14 12:47, Reuti wrote:
Hi,
Am 31.03.2014 um 12:22 schrieb Tina Friedrich:
just double checking - there still is no way to use anything but a user's
primary group for ACLs etc?
(Directly use, I mean. Without resorting to duplicating information in SGE
setup, or us
Hello All,
just double checking - there still is no way to use anything but a
user's primary group for ACLs etc?
(Directly use, I mean. Without resorting to duplicating information in
SGE setup, or using a JSV, or wrapping qsub, or ...)
Tina
--
Tina Friedrich, Computer Sy
I was about to say, that sounds like something got missed.
Glad it all worked!
Tina
On 28/03/14 01:11, Kevin Buckley wrote:
On 28 March 2014 13:48, Kevin Buckley
wrote:
On 28 March 2014 00:37, Tina Friedrich wrote:
(Sorry I sound a bit vague, it's one of these things I do every once
On 27/03/14 05:18, Kevin Buckley wrote:
On 26 March 2014 23:34, Tina Friedrich wrote:
It does sound as if you need to move the SGE_ROOT file system from one
host to the next as well,
Yes, we do.
I'd say stopping everything & simply syncing it should work.
Yes, that's wha
ic
spooling.
Don't even think you need to bother with the IP - changing 'act_qmaster'
ought to do the trick.
Tina
On 26/03/14 09:57, Tina Friedrich wrote:
Can't you just install the new one, make it one of the shadow masters,
and call a migrate? I've never done th
ria University of Wellington
New Zealand
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus
tting more and
more of a problem (likely simply more noticeable with higher cluster load).
Tina
On 03/03/14 12:46, Reuti wrote:
Am 03.03.2014 um 12:59 schrieb Tina Friedrich:
I was about to ask a similar question; we have the same sort of setup - high,
medium and low priority queues - and
kill -CONT -- $1
and parameter $1 is $job_pid from the pseudo variables for these interfaces.
-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Ligh
HELL" >> $OUTFILE
. /etc/profile.d/modules.sh
module load fast_ep/1175
echo "$(which fast_ep)" >> $OUTFILE
echo "PATH: $PATH" >> $OUTFILE
(I use this to test the nodes have the correct /etc/profile.d/modules.sh - so
at least for us, this works :) $H
much useful information yet.
Any recommendations on a source for info on installing / configuring arco
Or another alternative ?
Isaac Jessop
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Frie
ilman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedri
uld that be fixed in SGE8.1.3?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the
learly for me.
Thanks,
E
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and an
luster install goes.
Cheers.
Oops. I didn't even consider PXE/kickstart 'OS dependent'. I would
consider a combination of PXE, kickstart (or whatever installation
scripting system you are using) and Puppet/Chef/CFEngine/... satisfy my
'OS independence' requirement
same kickstart file. So to us, it all boils down to
configuration management - hence CFEngine.
I'd second Dave's requirement 4 - I wouldn't really go for anything
that's coupled to the OS.
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamon
Regards
Lionel
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
ren't virtual machines great :) ) so I didn't really see that
as a problem - other than that, yes both version ran in parallel quite
happily, jobs could finish, ...
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovat
installation, but as I couldn't manage to detect
the current jobs I reverted back to SGE.
Txema
El 10/09/13 11:40, Tina Friedrich escribió:
Hi Txema,
I recently upgrades our Grid Engine from SGE6.2u4 (I think it was) to
OGE8.1.3. No rocks though, so I don't know any details on that.
s in advance,
Txema
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail
mpany
E-Mail: ffer...@univa.com | Phone: +49.9471.200.195 | Mobile:
+49.170.819.7390
Where Grid Engine lives
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, D
___
> > SGE-discuss mailing list
> > sge-disc...@liv.ac.uk <mailto:sge-disc...@liv.ac.uk>
> > https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
>
>
___
users mailing list
u
works. I just used to have it set to 'builtin' for qrsh etc (anything
not qlogin) - ssh isn't set up for host authentication (don't want
that), and things like MPI of course require non-authenticated qrsh (if
I'm not mistaken).
Tina
--
Tina Friedrich, Computer Systems A
quot;)
Didn't have that problem on my old 6.2 installation :)
Check the logs (messages files) for any clues. I don't think it has
those symptoms, but there appears to be a race in the threading of the
builtin startup that appears on recent Ubuntu, for instance, but doesn't
on RHE
recent Ubuntu, for instance, but doesn't
on RHEL5 or 6 in our experience. You can still use ssh per
remote_startup(5).
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any a
(as it were) with the standard workstation setup
(and hence, things that work on workstations not working on the cluster or vice
versa) is - to us - much more of a concern. So, cluster nodes get upgraded
along with the rest of the estate.
--
Tina Friedrich, Computer Systems Administrator, Dia
ng on
both. A lot of it compiled (and/or written) in house, and in a central
location. So the risk of said libraries being out of sync (as it were)
with the standard workstation setup (and hence, things that work on
workstations not working on the cluster or vice versa) is - to us - much
mor
is nicely agnostic
to all of this, it Just Works(TM) - well, at least it worked with RHEL5
and RHEL6.) Plus I've installed hwlock in a non-standard location (and
currently have to tell the execd process where it is). Is there an
option for aimk to build statically linked binaries? (I
eem
to have a preference to collide on one host.
I'll sit down & review the full configuration, I think. Just to make
sure I haven't got an obvious bug somewhere.
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science an
when making a scheduling decision.
I have tried a load sensor - basically counting the number of jobs in
the queue on a machine - but that didn't seem to make a difference;
which might be due to the weighting, I suppose.
Anyone got any bright ideas?
Tina
--
Tina Friedrich, Computer System
Hi Reuti,
discussion veering off a bit :)
On 08/04/13 13:33, Reuti wrote:
Am 08.04.2013 um 14:28 schrieb Tina Friedrich:
Hi Reuti,
On 08/04/13 13:12, Reuti wrote:
Hi Tina,
Am 08.04.2013 um 11:16 schrieb Tina Friedrich:
is it possible to restrict access to a queue by anything but ACL or
Hi Reuti,
On 08/04/13 13:12, Reuti wrote:
Hi Tina,
Am 08.04.2013 um 11:16 schrieb Tina Friedrich:
is it possible to restrict access to a queue by anything but ACL or project? A
complex/resource would be a favourite.
A forced boolean complex attached to a queue?
Ah, no. Thought of that
eue (might be the easiest).
I could introduce a project for this I suppose; however, if there was a way to
solve it with a resource I'd prefer that. Any suggestions?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation C
in highest priority queue (might be the easiest).
I could introduce a project for this I suppose; however, if there was a
way to solve it with a resource I'd prefer that. Any suggestions?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harw
//gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for
rmation could possibly be
recovered? typically, this has happened a day before a huge deadline -
so time is not on our side.
-paul
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
--
Tina Friedrich, Comput
stand
before
applying (and had forgotten about). Can anyone cast more light on
it?
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, c
ler traffic to the "users@gridengine.org" list,
> as it is easier to handle just 1 list.
>
> Rayson
>
>
>>
>> thanks,
>> -Alan
>>
>>
>> --
>> “Don't eat anything you've ever seen advertised on TV”
>> - Michael Pollan, author
78 matches
Mail list logo