from:"William Hay"

Re: [SGE-discuss] exit status lost when using parallel environment

2015-06-22 Thread William Hay

On Wed, 17 Jun 2015 08:50:10 +
Alexis Huxley  wrote:

 
> So now to my question:
> 
> Is it a bug that the exit status is lost when running in a PE or have
> I misunderstood something?
> 
I don't know but since returning the exit status of the job's master
process could be useful you could open it as an 'enhancement' bug in
the bug tracker.

William 


pgpJpvI6AtxiX.pgp
Description: OpenPGP digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] 'load_formula slots' and get_load_value()

2015-07-14 Thread William Hay

On Tue, 14 Jul 2015 09:10:58 +
Alexis Huxley  wrote:

 
> then the output is:
> 
>   Tue 2015-07-14 10:55:32 CEST: sge_qmaster[42156]:
> get_load_value: attrname=slots Tue 2015-07-14 10:55:32 CEST:
> sge_qmaster[42156]: get_load_value: get_attribute_by_name(...,
> "slots", ...) failed Tue 2015-07-14 10:55:32 CEST:
> sge_qmaster[42156]: get_load_value: attrname=slots Tue 2015-07-14
> 10:55:32 CEST: sge_qmaster[42156]: get_load_value:
> get_attribute_by_name(..., "slots", ...) failed
> 
> So now I'm puzzled. Is there something special about 'slots' that
> means it's not defined in the normal way? This is a pristine
> installation, have I forgotten to configure slots somewhere?
> 
The number of slots is normally defined by the slots entry of a queue
rather than via load sensors or via complex_values of a queue or host.

For it to work in a load formula it would need to be reported via a
load sensor or possibly in a host complex_values setting.  

How is slots defined on the hosts of your cluster?

William



pgpz_V6K5hSok.pgp
Description: OpenPGP digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] 'load_formula slots' and get_load_value()

2015-07-14 Thread William Hay

On Tue, 14 Jul 2015 11:07:01 +
Alexis Huxley  wrote:

> > How is slots defined on the hosts of your cluster?
> 
> Well, I had thought that the 'qconf -sq all.q' output above meant
> that it was defined where it was supposed to be defined, but you

Well that is where it normally is defined but the load_formula is
calculated per host not per queue (instance) so any value you want to
use in a load_formula has to be defined at a host level.

pgpdWUAojOJP9.pgp
Description: OpenPGP digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] SGE 8.1.8 CGROUP question

2015-12-07 Thread William Hay

On Mon, Dec 07, 2015 at 03:20:08PM +, Ondrej Valousek wrote:
> SGE can use CGROUPS for this bookkeeping you mentioned, too. It is just a 
> different filesystem, so it is quite simple to figure out which process 
> belongs to which cgroup.
> To me, when using cgroups, additional GID should not be needed any longer. At 
> least per:
> 
> http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
> 
> Ondrej
I think the cgroups implementation on SoGE differs from the OGS one. 
It doesn't do everything the  OGS one does.

Supplementary GIDS are more portable than cgroups and have a natural fit to the 
traditional unix access control model.  While cgroups are probably better
than GIDs for killing jobs.

One thing we do with the additional GID is control access to GPUs by chgrping 
them from the prolog.  While I believe you can do similar tricks with cgroups 
its a little more involved.

Is the additional GID actually causing you any problems?

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] external jobs taking all licenses

2016-02-09 Thread William Hay

On Tue, Feb 09, 2016 at 04:01:03PM +0100, Daniel Fink (PDF) wrote:
>Hello list,
> 
> 
> 
>I have a grid setup where some jobs use a license (complex) that is very
>restricted. Most of the time there are only 5 licenses available.
> 
>I use the oelsen script to keep the complexes in the SGE synched with the
>licenses available:
> 
>http://wiki.gridengine.info/wiki/index.php/Olesen-FLEXlm-Integration
> 
> 
> 
>Now there were some instances where users use all licenses in jobs that
>were not submitted to the grid.
> 
>When this happens, the available complex count in the SGE gets set to
>zero. Now the jobs which need those licenses can no longer be submitted to
>the grid at all.
> 
>And I get complaints and questions from users why they cannot submit them.
> 
> 
> 
>Is there a way to ignore the temporary license/complex shortage and still
>submit the job? So when the local jobs finish, the grid can then schedule
>them.
It sounds like you have set -w p or -w v for job submission.  You could change 
this
in the cluster sge_request file and do any other validation you still want in a 
jsv.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] ENABLE_ADDGRP_KILL parameter

2016-02-10 Thread William Hay

On Wed, Feb 10, 2016 at 09:57:54AM +, Ondrej Valousek wrote:
>Hi List,
> 
> 
> 
>We are have set the ENABLE_ADDGRP_KILL=false parameter of the exec daemon
>(also using CGROUPS, btw).
> 
>It seems that execd is still killing processes left behind on the
>execution host, in spite of the parameter above.
> 
> 
> 
>We are using version 8.1.8
> 
> 
> 
>What could be wrong?
> 
>Ondrej
> 
> - 
>   
>   
>   
>   
>   
>   
>   
>   
Nothing is wrong.  ENABLE_ADDGRP_KILL changes the mechanism by which Grid 
Engine determines which processes
are part of the job.  Setting it to false doesn't disable killing all processes 
of a job it just changes
the mechanism used to determine which processes to kill.  Since you have 
CGROUPS enabled the shepherd 
will be using that to select processes to kill.  

If you want to kill only the lead process of a job for some reason then the 
best option is probably 
a custom terminate_method which will normally override the grid engine defaults.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] ENABLE_ADDGRP_KILL parameter

2016-02-10 Thread William Hay

On Wed, Feb 10, 2016 at 10:32:23AM +, Ondrej Valousek wrote:
> Thanks William,
> 
> Disabling Cgroups did the job, indeed. I did not know stepherd is using 
> cgroups for tracking purposes as well.
> BTW - if it does, why it still adds the supplementary group IDs to the jobs 
> if Cgroups are enabled? I do not understand it.

The groups are useful for other purposes.  We change the ownership of various 
device files and directories to grant specific jobs access to them.

> 
> Also, could you (or anyone else) advise how the terminate_method could look 
> like?
> I basically want to achieve the same behaviour as of GE 6.2.u4, but needs to 
> keep Cgroups enabled - just to keep better control over the resources.
> 
A little tricky  without knowing exactly what you want. But if you want to kill 
only the lead process then configure grid engine to pass it the jobs pid 
as first argument and the send a SIGKILL to said first argument.  Write in 
whatever programming/scripting language you like.  This may have bad 
interactions
with cgroups enabled as grid engine may still try to clean up the cgroup I 
think.  

Trying to control resources with cgroups while having grid engine leave 
processes lying around is a bit odd.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] [SGE-bugs] [SGE] #1569: qsub delay

2016-03-22 Thread William Hay

On Tue, Mar 22, 2016 at 12:36:07PM +, SGE wrote:
> #1569: qsub delay
> --+--
>  Reporter:  Narsimha  |  Owner:  Narsimha
>  Type:  defect| Status:  new
>  Priority:  normal|  Milestone:
> Component:  sge   |Version:  8.1.8
>  Severity:  minor |   Keywords:
> --+--
>  Hi all,
> 
>  I am facing an issue with qsub command, when a script is issued with qsub
>  as shown below:-
> 
>  time qsub test.sh
>  Your job 12973 ("test.sh") has been submitted
> 
>  real0m59.665s
>  user0m0.126s
>  sys 0m39.754s
> 
>  It took 59 sec to generate the job id. Initially there is no delay. Since
>  a week, we are facing this issue.
> 
>  Kindly suggest how to resolve the issue.
>
One thing that can cause slowdowns like this IIRC is if the filesystem the 
qmaster is using as a
spool is slow.  This happens more easily if the spool is mounted on the qmaster 
via NFS.  For this
reason I usually have the spool local to the qmaster and NFS exported to the 
rest of the cluster.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] [SGE] #1569: qsub delay

2016-03-23 Thread William Hay

On Wed, Mar 23, 2016 at 03:39:59AM +, SGE wrote:
> #1569: qsub delay
> --+-
>  Reporter:  Narsimha  |   Owner:  Narsimha
>  Type:  defect|  Status:  closed
>  Priority:  normal|   Milestone:
> Component:  sge   | Version:  8.1.8
>  Severity:  minor |  Resolution:  worksforme
>  Keywords:|
> --+-
> 
> Comment (by Narsimha):
> 
>  Thank you for the reply.
> 
>  I am having my spool directory located in the local disks on master and
>  all compute nodes also.
> 
>  And initially when sge is installed it used to work well without any delay
>  but since a few days i am facing this issue.

If the qmaster has a local spool I'd check if there are any other processes
putting a lot of load on the qmaster host. 

Also try using qping against the qmaster.

Sending to sge-discuss as this is still more likely an operational issue than a 
bug per se.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Resource Reservation

2016-04-05 Thread William Hay

On Tue, Apr 05, 2016 at 09:49:09AM +0530, Narsimha Reddy wrote:
>Dear Sir,
>Thank you for reply.
> 
>But my requirement is to stop the backfilling and enable a pure FIFO model
>in grid engine. In the above case it is should block the job 194 and allow
>193 to get 1 more core to run the job.
> 
>As mentioned in the mail i have verified the job with 4, 5 & 3 cores but
>after submission of the job the 4 and 3 core jobs are submitted at a time
>and 5 core is in qw till it gets 5 cores. This is normal grid engine
>behavior, can we change this working.
> 
>Can you let me know how to get it done.

Grid engine doesn't directly support disabling backfill.  You can
probably fake it by submitting jobs held and then using a cron job or
similar to ensure only one queued job (the oldest) has the hold released.
Be careful of job number wraparound.

Using documented methods AFAICT it is only possible to submit a job with
a user hold which the job owner can remove.  If you trust your users
this should be fine.  It might be possible to get the server side jsv
to add a system or operator hold but this is not a documented feature
and I've never tried.it myself.

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Resource Reservation

2016-04-06 Thread William Hay

On Wed, Apr 06, 2016 at 09:49:42AM +0530, Narsimha Reddy wrote:
>Dear Sir,
>Thank you for the reply.
>How can a user request a reservation and how to set the wait time for jobs
You request a reservation by adding -R y  to the qsub command line.  You could 
add
this to ${SGE_ROOT}/${SGE_CELL}/common/sge_request to make it the default.  You 
need
max_reservations >0 in the scheduler config 

By setting wait time for priority I meant using high values for 
weight_waiting_time
and weight_urgency in the sge scheduler configuaration which means that a job 
that has been 
waiting a long time will be highest priority and therefore get the reservation.

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] "Decoding gridengine" workshop

2016-08-24 Thread William Hay

On Wed, Aug 24, 2016 at 10:20:06AM +0100, Mark Dixon wrote:
> Hi there,
> 
> Is there any interest for a meeting in the UK looking at the internals of
> gridengine? Potential topics might be:
> 
> * Building from source
> * How the code is organised
> * How to debug or develop gridengine
> 
> The principles discussed ought to be applicable to any flavour of gridengine
> that you happen to have the source for.

While the idea of being open to "any flavour of gridengine" has some
appeal  I wonder if it might be better to focus on SoGE.  If you aren't
using SoGE then presumably you either have a support contract from 
Univa or Scalable or you've been using the same version for a long time
and are presumably willing to live with its bugs.

William 

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Can no longer view cluster config on non-admin hosts?

2016-08-25 Thread William Hay

On Thu, Aug 25, 2016 at 11:34:48AM +0100, Mark Dixon wrote:
> Hi there,
> 
> Playing around with CentOS 7 + SoGE 8.1.9, just noticed that attempts to
> view the cluster config from a non-admin host fails:
> 
>   $ qconf -sconf
>   denied: host "" is not an admin host
> 
> True for all the '-s*' switches I tried.
> 
> Is this intentional or desirable?
> 
> Personally, I quite like letting submit hosts view configuration - it helps
> canny users to help themselves.
> 
> Cheers,
> 
> Mark

I think this has already been reported as a bug and Dave says he'll have to 
redo the change that broke it.

https://arc.liv.ac.uk/trac/SGE/ticket/1579

For some reason the trac bug seems to have been opened with the second messages 
to sge-bugs not the first
which makes the description a trifle cryptic.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] SGE + kerberos

2016-09-26 Thread William Hay

On Fri, Sep 23, 2016 at 07:23:00PM +, Thomas Beaudry wrote:
>Hi,
> 
>I am running into problems when cluster users submit a job to execution
>hosts that don't have the users kerberos ticket to access NFS shares.  I
>tried copying the users's ticket to the excution host but that didn't fix
>the problem.  The only thing that is working is if I have the user login
>to the execution host first before launching the job.
> 
>What is the best solution to this problem?
> 
>Thanks!
> 
>Thomas Beaudry
You may be able to use the various features built into grid engine for 
integration with
AFS (set_token_cmd,pag_cmd,token_extend_time).  Alternatvely there is the AUKS 
project
that is specifically designed for integrating batch schedulers (like SGE) with 
Kerberos.

Last heard of at http://sourceforge.net/projects/auks/ but someone seems to be 
tweaking it a
little more recently here: https://github.com/hautreux/auks although it looks 
like they are more
slurm focussed.

Also general questions are probably better directed at us...@gridengine.org 
keeping this list
for things you know are specific to the SoGE fork.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Error at the time of Distribution staging

2016-10-13 Thread William Hay

On Thu, Oct 13, 2016 at 11:07:28AM +0530, Himanshu Joshi wrote:
> 
>The error again is
> 
>Error: Unable to access jarfile ./util/gui-installer/installer.jar

IIRC you were having issues with building with Java earlier.  I suspect the 
above may be a
result of that.  Possibly you just need to make sure all the prerequisites are 
installed
before building and then you'll get the java parts built.

> 
>I tried through command line as well
>and ran the following command
>./inst_sge -m -x -csp
> 
>again the error was
> 
>"sed: can't read dist/util/install_modules/inst_common.sh: No such file or
>directory
>[3;J"

Not sure where that error message appeared relative to the messages below.  I'd 
suggest
fixing the localhost issue and seeing if the above perissts.

>with the following display
> 
>Welcome to the Grid Engine installation
>---
> 
>Grid Engine qmaster host installation
>-
> 
>Before you continue with the installation please read these hints:
> 
>   - Your terminal window should have a size of at least
> 80x24 characters
> 
>   - The INTR character is often bound to the key Ctrl-C.
> The term >Ctrl-C< is used during the installation if you
> have the possibility to abort the installation
> 
>The qmaster installation procedure will take approximately 5-10 minutes.
> 
>Hit  to continue >>
>after hitting return
>the message appears like
> 
>"Unsupported local hostname
>--
> 
>The current hostname is resolved as follows:
> 
>Hostname: localhost
>Aliases: localhost.localdomain localhost4 localhost4.localdomain4
>localhost.localdomain localhost6 localhost6.localdomain6
>Host Address(es): 127.0.0.1 127.0.0.1
> 
>It is not supported for a Grid Engine installation that the local hostname
>contains the hostname "localhost" and/or the IP address "127.0.x.x" of the
>loopback interface.
>The "localhost" hostname should be reserved for the loopback interface
>("127.0.0.1") and the real hostname should be assigned to one of the
>physical or logical network interfaces of this machine.

Update the system hostname to something other than localhost.  Assuming this
is a linux box then modifying /etc/hostname should change it from the next 
boot.  You can use the hostname command (or hostnamectl with a systemd based
system) to change it from now until next boot.  Other unix like systems should
be fairly similar.

You also need to change what IP address this hostname refers to.  Usually just 
edit
the /etc/hosts file to include a line referencing the ip address of a 
non-loopback interface
the hostname and the fqdn of the machine.

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Error at the time of Distribution staging

2016-10-13 Thread William Hay

On Thu, Oct 13, 2016 at 03:41:27PM +0530, Himanshu Joshi wrote:
>Thanks William,
> 
>As per your suggestion I had changed the hostname to MBIALJPJ
>hostnamectl status command says
> 
>   Static hostname: mbialjpj
>   Pretty hostname: MBIALJPJ
> Icon name: computer-desktop
>   Chassis: desktop
>Machine ID: 431da268159243088e0e02874e8d36bf
>   Boot ID: f3bb3c227eea4390a1d306b23ba5e25b
>  Operating System: Red Hat Enterprise Linux
>   CPE OS Name: cpe:/o:redhat:enterprise_linux:7.2:GA:workstation
>Kernel: Linux 3.10.0-327.el7.x86_64
>  Architecture: x86-64
> 
>Still
> 
>It does not accepts cell name as default and asks for changing the name
>I had changed this also to- mbialjpj
You mean you specified default as the Cell Name and it rejected it?  That's a 
little odd.


> 
>But still the GUI says
I'd try with the text mode installer it is a little easier to cut and paste the 
output of any problems
into an e-mail.

> 
>FAILED: Task failed.
> 
>OUTPUT:
> 
>...
>
>
> 
>Error: Cannot create keystore
>/var/sgeCA/port6444/mbialjpj/private/keystore
>Error: keystore directory does not exist:
>/var/sgeCA/port6444/mbialjpj/private
>./util/install_modules/inst_qmaster.sh: line 1159:
>/var/sgeCA/port6444/mbialjpj/private/keystore.password: No such file or
>directory
>chown: invalid user: `default'
> 
>Kindly suggest the needful
Not sure.  Did you specify default as the user to install as somewhere?
Anyway copying to sge-disc...@liverpool.ac.uk.


William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Error at the time of Distribution staging

2016-10-14 Thread William Hay

On Thu, Oct 13, 2016 at 08:23:13PM +0530, Himanshu Joshi wrote:
>Lets SGE-discuss answer the question,
>As you have rightly pointed out I would like to mention that at the time
>of first installation,  I specified "default" as the cell name. But
>default was never used as qmaster host in  any of my installation trial.
> 
>Subsequently I had deleted the default folder from my $SGE_ROOT directory.
>And again put cell name as default
>The same error continues
> 
>Data Base Updated
>Error: Cannot create keystore /var/sgeCA/port6444/default/private/keystore
Is the directory above on a NFS filesystem perhaps?

Did you specify a user other than root to install as?  It looks like you are 
installing
as a user called 'default' but that user doesn't exist.  Not sure how that 
happened as 
if you entered it by hand the existence of the user should have been checked 
for.  

One possibility is that the installer picked up the username from a directory 
imported
from a network filesystem of some sort where the user is valid.  If network 
filesystems are
involved you'll want to ensure that usernames and group names, uids and gids  
either 
match across the cluster or are reliably translated.

Are there any directories that appear to be owned by user 'default'  
does getent passwd default return anything?

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Error at the time of Distribution staging

2016-10-18 Thread William Hay

On Tue, Oct 18, 2016 at 09:48:21AM +0530, Himanshu Joshi wrote:
>Thanks Love,
>Apologies for mixing up the two different pipelines.
>Let me start from scratch and follow the procedure recommended by you.
>I am hereby requesting you to share your recommended procedure/pipeline
>for the installation of SGE on my workstation first and later I would like
>to implement it on a server client setup.
> 
>Looking forward for receiving the procedure recommended by SGE.

I think the recommended process is to install the RPMs/debs Dave provides 
then run inst_sge with the appropriate options for the sort of node you 
are installing.  

If you are using something similar to RHEL5/RHEL6 or a Debianish version of 
Linux then the appropriate packages can be found here:

https://arc.liv.ac.uk/downloads/SGE/releases/8.1.7/

If you are using RHEL7 or similar then they are available here:
http://copr.fedoraproject.org/coprs/loveshack/SGE/

At UCL we install the qmaster node and have it export /opt/sge/default/common
and its spool.

New nodes have the appropriate rpms installed, mount the exported filesystems
and run a copy of the execd startup script at boot.  Don't even need to 
run inst_sge on the workers or submit nodes.

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Error at the time of Distribution staging

2016-10-20 Thread William Hay

On Thu, Oct 20, 2016 at 11:44:42AM +0530, Himanshu Joshi wrote:
>Thanks William and Love,
>Now I had downloaded gridengine-8.1.9-1.el6.src
>and performed rpm -Uvh gridengine-8.1.9-1.el6.src in mu /opt/sge folder as
>a super user
> 
>warning: gridengine-8.1.9-1.el6.src.rpm: Header V3 RSA/SHA1 Signature, key
>ID 92258035: NOKEY
>Updating / installing...
>   1:gridengine-8.1.9-1.el6   #
>[100%]

I assume you then rebuilt and installed the rpms?  IIRC rpm -Uvh on a .src.rpm
will just unpack the sources.  Where possible I would start with the binary RPMs
Dave provides.  Still you seem to be getting further than before.

> 
>During the process of installation through ./inst_sge -m -x command I got
>the following error

> 
>qmaster startup script
>--
> 
>We can install the startup script that will
>start qmaster at machine boot (y/n) [y] >>
>after hitting Return the following error came
>cp /opt/sge/default/common/sgemaster
>/etc/init.d/sgemaster.mbialjpj_cluster
>/usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj_cluster
> 
>Command failed: /usr/lib/lsb/install_initd
>/etc/init.d/sgemaster.mbialjpj_cluster
> 
>Probably a permission problem. Please check file access permissions.
>Check root read/write permission. Check if SGE daemons are running.

Pick a different cluster name.  This one causes a violation of 
https://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA.junk/scrptnames.html
but might be perfectly fine with some non-Linux OS.

I've added a ticket to Dave's bugtracker to warn about this potential problem 
earlier:
https://arc.liv.ac.uk/trac/SGE/ticket/1586


William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-10-20 Thread William Hay

On Thu, Oct 20, 2016 at 07:47:38PM +0530, Himanshu Joshi wrote:
>-- Forwarded message --
>From: William Hay 
>Date: Thu, Oct 20, 2016 at 6:41 PM
>Subject: Re: [SGE-discuss] Error at the time of Distribution staging
>To: Himanshu Joshi 
>Cc: sge-disc...@liverpool.ac.uk
> 
>On Thu, Oct 20, 2016 at 11:44:42AM +0530, Himanshu Joshi wrote:
>>Thanks William and Love,
>>Now I had downloaded gridengine-8.1.9-1.el6.src
>>and performed rpm -Uvh gridengine-8.1.9-1.el6.src in mu /opt/sge
>folder as
>>a super user
>>
>>warning: gridengine-8.1.9-1.el6.src.rpm: Header V3 RSA/SHA1
>Signature, key
>>ID 92258035: NOKEY
>>Updating / installing...
>>   1:gridengine-8.1.9-1.el6 
> #
>>[100%]
> 
>I assume you then rebuilt and installed the rpms?  IIRC rpm -Uvh on a
>.src.rpm
>will just unpack the sources.  Where possible I would start with the
>binary RPMs
>Dave provides.  Still you seem to be getting further than before.
> 
>I suppose -U option shall Upgrade if previous version is ther otherwise
>it  installs the rpm
> 
>there are warning messages as well during installation of the above
>package as
> 
>  warning: user mockbuild does not exist - using root
>  warning: group mockbuild does not exist - using root
>  warning: user mockbuild does not exist - using root
>  warning: group mockbuild does not exist - using root
> 
> 
>  Is that what you were referring to with the term "rebuild and install"

No that just looks like the user and group Dave used when prepping the src.rpm.
Since they don't exist on your system rpm whinges a bit.  Doesn't matter a whole
lot when installing a .src.rpm.

> 
> 
>  If that is the case, kindly opine how to build this package

I would avoid doing that if at all possible.  If you can use the binary rpms 
appropriate
to the OS you have.  See my previous message (plus Dave's correction of the 
version number)
for where to find them.  You'll need the gridengine and gridengine-qmaster rpms
to install the qmaster IIRC.

If your RPM based OS is not supported by any of the existing binary RPMS then 
there are 
fairly generic instructions for building binary rpms from a .src.rpm here:

https://wiki.centos.org/HowTos/RebuildSRPM

And remember to pick a cluster name without underscores or other funny 
characters
when running inst_sge.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-10-31 Thread William Hay

On Fri, Oct 21, 2016 at 09:53:21AM +0530, Himanshu Joshi wrote:
>I am sorry for bothering you so much William,
No problem.
> 
>I think rpm file is not getting installed on my system, forget about
>building it. i tried with other rpm file
>(adobe-release-x86_64-1.0-1.noarch.rpm) required for my system, which I
>was sucessfully able to install with rpm -i command
>The link you have shared is also not working for the

Do you mean you cannot access the link or that you encounter issues
following the instructions therein?

>gridengine-8.1.9-1.el6.src.rpm file installation the errror is same
>warning: gridengine-8.1.9-1.el6.src.rpm: Header V3 RSA/SHA1 Signature, key
>ID 92258035: NOKEY
>warning: user mockbuild does not exist - using root
>warning: group mockbuild does not exist - using root
>warning: user mockbuild does not exist - using root
>warning: group mockbuild does not exist - using root
>warning: user mockbuild does not exist - using root
>warning: group mockbuild does not exist - using root
>warning: user mockbuild does not exist - using root
>warning: group mockbuild does not exist - using root

Those are warnings not an error.  Should be fairly harmless
> 
>please advice the needful

What OS/version of Linux are you using.  The simplest way to install
SoGE is to use the binary packages Dave or your distribution provides
rather than building from source.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-04 Thread William Hay

On Fri, Nov 04, 2016 at 07:47:41PM +0530, Himanshu Joshi wrote:
>On Mon, Oct 31, 2016 at 3:21 PM, William Hay  wrote:
> 
>  On Fri, Oct 21, 2016 at 09:53:21AM +0530, Himanshu Joshi wrote:
>  >I am sorry for bothering you so much William,
>  No problem.
>  >
>  >I think rpm file is not getting installed on my system, forget
>  about
>  >building it. i tried with other rpm file
>  >(adobe-release-x86_64-1.0-1.noarch.rpm) required for my system,
>  which I
>  >was sucessfully able to install with rpm -i command
>  >The link you have shared is also not working for the
> 
>  Do you mean you cannot access the link or that you encounter issues
>  following the instructions therein?
> 
>The link is fine but the file "gridengine-8.1.9-1.el6.src.rpm " was
>successfully built with the same warning given below
> 
> 
>  >file installation the errror is same
>  >warning: gridengine-8.1.9-1.el6.src.rpm: Header V3 RSA/SHA1
>  Signature, key
>  >ID 92258035: NOKEY
>  >warning: user mockbuild does not exist - using root
>  >warning: group mockbuild does not exist - using root
>  >warning: user mockbuild does not exist - using root
>  >warning: group mockbuild does not exist - using root
>  >warning: user mockbuild does not exist - using root
>  >warning: group mockbuild does not exist - using root
>  >warning: user mockbuild does not exist - using root
>  >warning: group mockbuild does not exist - using root
> 
>  Those are warnings not an error.  Should be fairly harmless
> 
>After building
> 
>when I went to location /home/JPJ/sge-8.1.9/source/dist
>an performed
>"./inst_sge -m -x"

The src rpm builds binary RPMs.  You need to install those.  At which point
you should have a copy of gridengine installed in /opt/sge.  If you have
a preexisting build of gridengine there move it out of the way first before 
installing the binary rpms.

Once you've installed it the copy of inst_sge in the /opt/sge tree should
configure the installed grid engine.

> 
>The error is
> ./util/install_modules/inst_common.sh: line 74:
>./utilbin/lx-amd64/uidgid: No such file or directory
>./util/install_modules/inst_common.sh: line 75: ./utilbin/lx-amd64/uidgid:
>No such file or directory
>Can't find binaries for architecture: lx-amd64!
>Please check your binaries. Installation failed!
>Exiting installation.
> 
>Please suggest the needful

The information I requested below would be really helpful.

>  What OS/version of Linux are you using.  The simplest way to install
>  SoGE is to use the binary packages Dave or your distribution provides
>  rather than building from source.


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-07 Thread William Hay

On Sat, Nov 05, 2016 at 10:55:38AM +0530, Himanshu Joshi wrote:
>Redhat enterprise Linux 7.2 with X86-64 architecture
>Please find the requested information with other relevant info
>hostnamectl status
>   Static hostname: mbialjpj
>   Pretty hostname: MBIALJPJ
> Icon name: computer-desktop
>   Chassis: desktop
>Machine ID: 431da268159243088e0e02874e8d36bf
>   Boot ID: 24057a4a63554a72b9c7b4b7d9e72b74
>  Operating System: Red Hat Enterprise Linux
>   CPE OS Name: cpe:/o:redhat:enterprise_linux:7.2:GA:workstation
>Kernel: Linux 3.10.0-327.el7.x86_64
>  Architecture: x86-64
> 
>I was able to initiate the installation but now stuck up in the same error
>reported on October 20

> 
>qmaster startup script
>--
> 
>We can install the startup script that will
>start qmaster at machine boot (y/n) [y] >>
> 
>cp /opt/sge/default/common/sgemaster /etc/init.d/sgemaster.mbialjpj55
>/usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55
> 
>Command failed: /usr/lib/lsb/install_initd
>/etc/init.d/sgemaster.mbialjpj55
Does /usr/lib/lsb/install_initd exist?  
On my RHEL7 box this is a relative symlink pointing to /sbin/chkconfig.  
Does it exist on your machine and to what does it point?
What are the permissions on the file to which it points?
> 
>Probably a permission problem. Please check file access permissions.
>Check root read/write permission. Check if SGE daemons are running.
> 
>Looking forward to receive binary packages from Dave because I do not know
>how to look for the one which my distribution provides
>
Dave's packages for RHEL7 are available by downloading the file at:
https://copr.fedorainfracloud.org/coprs/loveshack/SGE/repo/epel-7/loveshack-SGE-epel-7.repo
and placing it in /etc/yum.repos.d
Then 
yum install gridengine gridengine-qmaster gridengine-qmon gridengine-execd

These install into /opt/sge so if you do switch to using these (which will 
simplify future
upgrades) then remove any grid engine install you have there first.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-07 Thread William Hay

On Mon, Nov 07, 2016 at 05:41:54PM +0530, Himanshu Joshi wrote:
>On Mon, Nov 7, 2016 at 3:09 PM, William Hay  wrote:
> 
>  On Sat, Nov 05, 2016 at 10:55:38AM +0530, Himanshu Joshi wrote:
>  >Redhat enterprise Linux 7.2 with X86-64 architecture
>  >Please find the requested information with other relevant info
>  >hostnamectl status
>  >   Static hostname: mbialjpj
>  >   Pretty hostname: MBIALJPJ
>  > Icon name: computer-desktop
>  >   Chassis: desktop
>  >Machine ID: 431da268159243088e0e02874e8d36bf
>  >   Boot ID: 24057a4a63554a72b9c7b4b7d9e72b74
>  >  Operating System: Red Hat Enterprise Linux
>  >   CPE OS Name:
>  cpe:/o:redhat:enterprise_linux:7.2:GA:workstation
>  >Kernel: Linux 3.10.0-327.el7.x86_64
>  >  Architecture: x86-64
>  >
>  >I was able to initiate the installation but now stuck up in the
>  same error
>  >reported on October 20
> 
>  >
>  >qmaster startup script
>  >--
>  >
>  >We can install the startup script that will
>  >start qmaster at machine boot (y/n) [y] >>
>  >
>  >cp /opt/sge/default/common/sgemaster
>  /etc/init.d/sgemaster.mbialjpj55
>  >/usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55
>  >
>  >Command failed: /usr/lib/lsb/install_initd
>  >/etc/init.d/sgemaster.mbialjpj55
>  Does /usr/lib/lsb/install_initd exist?
> 
>Yes it is a folder owned by root
> 
>  On my RHEL7 box this is a relative symlink pointing to /sbin/chkconfig.
> 
>Yes exactly because the command " ls -la /usr/lib/lsb | grep "\->" "
>provides
>the output as
> 
>lrwxrwxrwx.  1 root root23 Jun  1  2015 install_initd ->
>../../../sbin/chkconfig
>lrwxrwxrwx.  1 root root23 Jun  1  2015 remove_initd ->
>../../../sbin/chkconfig
> 
>  Does it exist on your machine and to what does it point?
> 
>Yes it exists with file permissions and it points to /sbin/chkconfig
> 
> 
>  What are the permissions on the file to which it points?
> 
>The following command "ls -l /sbin/chkconfig"  says
>-rwxr-xr-x. 1 root root 41136 Apr 29  2016 /sbin/chkconfig
> 
>  >
>  >Probably a permission problem. Please check file access
>  permissions.
>  >Check root read/write permission. Check if SGE daemons are running.
> 
>How to check whether SGE daemons is running?
> 
>  >
>  >Looking forward to receive binary packages from Dave because I do
>  not know
>  >how to look for the one which my distribution provides
>  >
>  Dave's packages for RHEL7 are available by downloading the file at:
>  
> https://copr.fedorainfracloud.org/coprs/loveshack/SGE/repo/epel-7/loveshack-SGE-epel-7.repo
>  and placing it in /etc/yum.repos.d
> 
>I had made a document file named "loveshack-SGE.repo" and pasted it in
>/etc/yum.repos.d
> 
>  Then
>  yum install gridengine gridengine-qmaster gridengine-qmon
>  gridengine-execd
> 
>Then I went into /opt/sge and followed the above command
> 
>This resolved many dependencies and enabled sufficient repositories
> 
>  These install into /opt/sge so if you do switch to using these (which
>  will simplify future
>  upgrades) then remove any grid engine install you have there first.
> 
>Again  the command "./inst_sge -m -x"" reached upto the process of
>We can install the startup script that will
>start qmaster at machine boot (y/n) [y] >>
> 
>but landed up in the same error i.e.
> 
>cp /opt/sge/default/common/sgemaster /etc/init.d/sgemaster.mbialjpj55
>/usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55

I'd try running the command 

/usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55||echo $?

To see if it produces any output.


> 
>Command failed: /usr/lib/lsb/install_initd
>/etc/init.d/sgemaster.mbialjpj55
> 
>Probably a permission problem. Please check file access permissions.
>Check root read/write permission. Check if SGE daemons are running.
> 
>I have found the file "sgeqmaster.mbialjpj55" in  the location described
>as /etc/init.d
> and ls -l command gives the file permissions as
> 
>-rwxr-xr-x. 1 root root 24883 Nov  7 17:27 sgemaster.mbialjpj55
> 
>How to check if SGE Daemons is running because command "service
>--status-all" reveals
ps ax |grep sge

should reveal any sge daemons

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-08 Thread William Hay

On Tue, Nov 08, 2016 at 11:30:35AM +0530, Himanshu Joshi wrote:
>  I'd try running the command
> 
>  /usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55||echo $?
> 
>  To see if it produces any output.
> 
>Yes the output for this command is
>1
Annoyingly silent error.

What does 
ls -l /etc/rc.d/rc3.d/*sge*
output if anything?
> 
>command "ps ax |grep sge" says
> 
>17870 pts/4S+ 0:00 grep --color=auto sge
>26341 ?S10557:34 /bin/sh ./inst_sge -m -x
You have a copy of inst_sge running eating that amount of cpu time?  Was that 
intentionally still running?

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-09 Thread William Hay

On Wed, Nov 09, 2016 at 11:25:42AM +0530, Himanshu Joshi wrote:
>On Tue, Nov 8, 2016 at 9:38 PM, William Hay  wrote:
> 
>  On Tue, Nov 08, 2016 at 11:30:35AM +0530, Himanshu Joshi wrote:
>  >  I'd try running the command
>  >
>  >  /usr/lib/lsb/install_initd /etc/init.d/sgemaster.mbialjpj55||echo
>  $?
>  >
>  >  To see if it produces any output.
>  >
>  >Yes the output for this command is
>  >1
>  Annoyingly silent error.
> 
>Ya true..
> 
>  What does
>  ls -l /etc/rc.d/rc3.d/*sge*
>  output if anything?
> 
>It says " no match"
> i.e. /etc/rc.d/rc3.d folder has no file with *sge*
> 
>  >
>  >command "ps ax |grep sge" says
>  >
>  >17870 pts/4S+ 0:00 grep --color=auto sge
>  >26341 ?S10557:34 /bin/sh ./inst_sge -m -x
>  You have a copy of inst_sge running eating that amount of cpu time?  Was
>  that intentionally still running?
> 
>I was not running it intentionally , and system monitor also does not show
>any process with name "inst_sge". I had tried closing all the terminals
>and restarted the system
> 
>now the output is
>8160 pts/0S+ 0:00 grep --color=auto sge
IIRC the installation of the init script is the last thing inst_sge does so
if this is the only thing blocking the install then you just need to 
set the file up by hand

Try the install_initd command by hand again now that there isn't a running 
inst_sge
If that doesn't work try: 

chkconfig --add sgemaster.mbialjpj55
chkconfig sgemaster.mbialjpj55 on
service sgemaster.mbialjpj55 start

Try running 
/etc/init.d/sgemaster.mbialjpj55 start 
by hand does it produce output?

cat /etc/init.d/sgemaster.mbialjpj55


William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-09 Thread William Hay

On Wed, Nov 09, 2016 at 04:59:18PM +0530, Himanshu Joshi wrote:
>On Wed, Nov 9, 2016 at 2:18 PM, William Hay  wrote:
> 
>  On Wed, Nov 09, 2016 at 11:25:42AM +0530, Himanshu Joshi wrote:
>  >On Tue, Nov 8, 2016 at 9:38 PM, William Hay 
>  wrote:
>  >
>  >  On Tue, Nov 08, 2016 at 11:30:35AM +0530, Himanshu Joshi wrote:
>  >  >  I'd try running the command
>  >  >
>  >  >  /usr/lib/lsb/install_initd
>  /etc/init.d/sgemaster.mbialjpj55||echo
>  >  $?
>  >  >
>  >  >  To see if it produces any output.
>  >  >
>  >  >Yes the output for this command is
>  >  >1
>  >  Annoyingly silent error.
>  >
>  >Ya true..
>  >
>  >  What does
>  >  ls -l /etc/rc.d/rc3.d/*sge*
>  >  output if anything?
>  >
>  >It says " no match"
>  > i.e. /etc/rc.d/rc3.d folder has no file with *sge*
>  >
>  >  >
>  >  >command "ps ax |grep sge" says
>  >  >
>  >  >17870 pts/4S+ 0:00 grep --color=auto sge
>  >  >26341 ?S10557:34 /bin/sh ./inst_sge -m -x
>  >  You have a copy of inst_sge running eating that amount of cpu
>  time?  Was
>  >  that intentionally still running?
>  >
>  >I was not running it intentionally , and system monitor also does
>  not show
>  >any process with name "inst_sge". I had tried closing all the
>  terminals
>  >and restarted the system
>  >
>  >now the output is
>  >8160 pts/0S+ 0:00 grep --color=auto sge
>  IIRC the installation of the init script is the last thing inst_sge does
>  so
>  if this is the only thing blocking the install then you just need to
>  set the file up by hand 
> 
>  Try the install_initd command by hand again now that there isn't a
>  running inst_sge
> 
>The ./install_initd says 
If you leave out the ./ it will search the path.

>Command not found
>I think this file (install_initd) is not available in /opt/sge that is why
>command not found
> 
> 
>  If that doesn't work try:
> 
>  chkconfig --add sgemaster.mbialjpj55
>  chkconfig sgemaster.mbialjpj55 on
>  service sgemaster.mbialjpj55 start
> 
> 
> 
>  Try running
>  /etc/init.d/sgemaster.mbialjpj55 start
>  by hand does it produce output?
> 
>It worked and then the output of "ps ax | grep sge" is
>29305 ?Sl 0:00 /opt/sge/bin/lx-amd64/sge_qmaster
>29974 pts/0S+ 0:00 grep --color=auto sge
> 
>Now the below 3 commands are immaterial
>chkconfig --add sgemaster.mbialjpj55
>chkconfig sgemaster.mbialjpj55 on
>service sgemaster.mbialjpj55 start
>as these commands say
Well the first two make sure it will start on reboot.

> 
>"sge_qmaster with PID 29305 is already running"
> 
> 
> 
>  cat /etc/init.d/sgemaster.mbialjpj55
> 
> 
> 
>  This command displays the contents of sgemaster.mbialjpj55 executable
>  file in terminal

> 
> 
> 
>  William
> 
>Thanks William...
> 
>But now also, I am not sure about installation,If it is done or not

If you installed Dave's RPMS then it is installed.  inst_sge despite the name
really just does an initial config.
> 
>Kindly suggest the needful
I suspect you probably want to use inst_sge to configure the node as an execd 
as well.

>--
>Himanshu Joshi


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-11 Thread William Hay

On Thu, Nov 10, 2016 at 02:26:35PM +0530, Himanshu Joshi wrote:
>  I suspect you probably want to use inst_sge to configure the node as an
>  execd as well.
> 
>Is there any documentation available for doing that because I do not have
>any idea how to do it
http://arc.liv.ac.uk/SGE/howto/commontasks.html
If you are just making the initial qmaster into an execution host as well
then changing to $SGE_ROOT and running ./install_execd should do it.

Make sure you have SGE_ROOT set correctly first (see below)


>And I tried some computations with the current setup but some of the
>errors were
> 
>Error: which: no qconf in (/usr/local..   )
>Warning SGE_ROOT environment variable is set but Grid Engine software is
>not found, will run locally
If you installed Dave's packages then they install into /opt/sge by default so 
set the 
SGE_ROOT environment variable to point to that.

sourcing /opt/sge/default/common/seetings.sh should set up the enironment.


> 
>And there is no folder gridengine in usr/share/doc
>Thus it indicates the software is not at all installed
Dave's packages are designed to be installed under /opt and don't
stick things into /usr/share/doc.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-14 Thread William Hay

On Mon, Nov 14, 2016 at 06:03:43PM +0530, Himanshu Joshi wrote:
>Thanks William
>On Fri, Nov 11, 2016 at 10:31 PM, William Hay  wrote:
> 
>  On Thu, Nov 10, 2016 at 02:26:35PM +0530, Himanshu Joshi wrote:
>  >  I suspect you probably want to use inst_sge to configure the node
>  as an
>  >  execd as well.
>  >
>  >Is there any documentation available for doing that because I do
>  not have
>  >any idea how to do it
>  http://arc.liv.ac.uk/SGE/howto/commontasks.html
>  If you are just making the initial qmaster into an execution host as
>  well
>  then changing to $SGE_ROOT and running ./install_execd should do it.
> 
> 
>It worked, Now Execution daemon installed successfully. But I am not sure
>whether the nodes are configured or not...
> 
>  Make sure you have SGE_ROOT set correctly first (see below)
> 
>  >And I tried some computations with the current setup but some of
>  the
>  >errors were
>  >
>  >Error: which: no qconf in (/usr/local..   )
>  >Warning SGE_ROOT environment variable is set but Grid Engine
>  software is
>  >not found, will run locally
>  If you installed Dave's packages then they install into /opt/sge by
>  default so set the
>  SGE_ROOT environment variable to point to that.
> 
>  sourcing /opt/sge/default/common/seetings.sh should set up the
>  enironment.
>  changes done in .bashrc file as suggested
>  >
>  >And there is no folder gridengine in usr/share/doc
>  >Thus it indicates the software is not at all installed
>  Dave's packages are designed to be installed under /opt and don't
>  stick things into /usr/share/doc.
> 
>  William
> 
>Please find below the outputs of few of the configuration commands
>(without using sudo) in my terminal
> 
>"qconf -sh" shows
>mbialjpj
> 
>"qconf -sel" shows
>no execution host defined
> 
>"qconf -ae" shows
>denied: "JPJ" must be manager for this operation"
iRunning qconf -ae as root so you can add a host should do it.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-15 Thread William Hay

On Tue, Nov 15, 2016 at 10:44:01AM +0530, Himanshu Joshi wrote:
>On Mon, Nov 14, 2016 at 8:41 PM, William Hay  wrote:
> 
>  On Mon, Nov 14, 2016 at 06:03:43PM +0530, Himanshu Joshi wrote:
>  >Thanks William
>  >On Fri, Nov 11, 2016 at 10:31 PM, William Hay 
>  wrote:
>  >
>  >  On Thu, Nov 10, 2016 at 02:26:35PM +0530, Himanshu Joshi wrote:
>  >  >  I suspect you probably want to use inst_sge to configure
>  the node
>  >  as an
>  >  >  execd as well.
>  >  >
>  >  >Is there any documentation available for doing that because
>  I do
>  >  not have
>  >  >any idea how to do it
>  >  http://arc.liv.ac.uk/SGE/howto/commontasks.html
>  >  If you are just making the initial qmaster into an execution host
>  as
>  >  well
>  >  then changing to $SGE_ROOT and running ./install_execd should do
>  it.
>  >
>  >
>  >It worked, Now Execution daemon installed successfully. But I am
>  not sure
>  >whether the nodes are configured or not...
>  >
>  >  Make sure you have SGE_ROOT set correctly first (see below)
>  >
>  >  >And I tried some computations with the current setup but
>  some of
>  >  the
>  >  >errors were
>  >  >
>  >  >Error: which: no qconf in (/usr/local..   )
>  >  >Warning SGE_ROOT environment variable is set but Grid Engine
>  >  software is
>  >  >not found, will run locally
>  >  If you installed Dave's packages then they install into /opt/sge
>  by
>  >  default so set the
>  >  SGE_ROOT environment variable to point to that.
>  >
>  >  sourcing /opt/sge/default/common/seetings.sh should set up the
>  >  enironment.
>  >  changes done in .bashrc file as suggested
>  >  >
>  >  >And there is no folder gridengine in usr/share/doc
>  >  >Thus it indicates the software is not at all installed
>  >  Dave's packages are designed to be installed under /opt and don't
>  >  stick things into /usr/share/doc.
>  >
>  >  William
>  >
>  >Please find below the outputs of few of the configuration commands
>  >(without using sudo) in my terminal
>  >
>  >"qconf -sh" shows
>  >mbialjpj
>  >
>  >"qconf -sel" shows
>  >no execution host defined
>  >
>  >"qconf -ae" shows
>  >denied: "JPJ" must be manager for this operation"
>  iRunning qconf -ae as root so you can add a host should do it.
> 
>If I understand this one liner correctly, you mean to say the qconf -ae
>newhost can add "newhost". But as a root using this command says qconf:
>Command not found.
As root:
source /opt/sge/default/common/settings.sh
qconf -ae



> 
> 
>  William
> 
>Kindly suggest the needful
>--
>Himanshu Joshi
>M.Tech. Cognitive & Neuroscience.
>Ph.D Scholar,
>Department of Psychiatry
>NIMHANS, Bangalore
>Publications
>Multimodal Brain Image Analysis Laboratory


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-16 Thread William Hay

On Tue, Nov 15, 2016 at 04:43:08PM +0530, Himanshu Joshi wrote:
>  As root:
>  source /opt/sge/default/common/settings.sh
>  qconf -ae
> 
>Thanks,Please find the outputs and advise
>[root@mbialjpj ~]# source /opt/sge/default/common/settings.sh
>SGE_ROOT=/opt/sge: Command not found.
>export: Command not found.
>SGE_ROOT: Undefined variable.
>[root@mbialjpj ~]# qconf -ae
>qconf: Command not found.
That is a little odd.  The settings.sh file should work with most bourne like 
shells.

Try source /opt/sge/default/common/settings.csh instead

echo $SHELL as root to see what shell you are running under.

> 
>[root@mbialjpj ~]# $SGE_ROOT
>SGE_ROOT: Undefined variable.
>[root@mbialjpj ~]# which $SGE_ROOT
>SGE_ROOT: Undefined variable.

$SGE_ROOT isn't a command just a variable that tells the various SGE commands 
where to find gride engine.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-18 Thread William Hay

On Wed, Nov 16, 2016 at 07:32:00PM +0530, Himanshu Joshi wrote:
>On Wed, Nov 16, 2016 at 6:59 PM, William Hay  wrote:
> 
>  On Tue, Nov 15, 2016 at 04:43:08PM +0530, Himanshu Joshi wrote:
>  >  As root:
>  >  source /opt/sge/default/common/settings.sh
>  >  qconf -ae
>  >
>  >Thanks,Please find the outputs and advise
>  >[root@mbialjpj ~]# source /opt/sge/default/common/settings.sh
>  >SGE_ROOT=/opt/sge: Command not found.
>  >export: Command not found.
>  >SGE_ROOT: Undefined variable.
>  >[root@mbialjpj ~]# qconf -ae
>  >qconf: Command not found.
>  That is a little odd.  The settings.sh file should work with most bourne
>  like shells.
> 
> 
> 
>  Try source /opt/sge/default/common/settings.csh instead
> 
>  echo $SHELL as root to see what shell you are running under.
>  Yes surprisingly as a root $SHELL says /bin/tcsh, thus your suggestion
>  source /opt/sge/default/common/settings.csh worked.
> 
>But  qconf -ae Newhost
>can't resolve hostname "Newhost"
> qconf -ae Newhost
>can't resolve hostname "Newhost"
> 
>Now "qconf -ae" shows
> 
>hostname  template
>load_scaling  NONE
>complex_valuesNONE
>user_listsNONE
>xuser_lists   NONE
>projects  NONE
>xprojects NONE
>usage_scaling NONE
>report_variables  NONE
>~  
>   
>.
>.
>   
> 
>~  
>   
>"/tmp/pid-14815-aPBdyG" 9L, 247C
> 
> But adding Newhost in place of template also shows
> 
>can't resolve hostname "Newhost"
> 
>i have no clue how to configure this

The name in hostname needs to be the regular hostname of a machine (ie what you 
would use if trying
to ssh to the machine).  If all the hosts in your cluster are within the same 
domain and you chose the 
appropriate option when running inst_sge for the qmaster you can use just the 
bit before the first dot 
otherwise you need the Fully Qualified Domain Name (FQDN).  If your cluster is 
isolated and the machines 
don't have names registered in the DNS then you can add the hostnames to 
/etc/hosts on each machine with a 
mapping to the host's IP address.  Look at the existing /etc/hosts and man 
hosts for details of the format.

I don't think it matters but I'd stick with all lower case names.

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-18 Thread William Hay

On Fri, Nov 18, 2016 at 08:27:36PM +0530, Himanshu Joshi wrote:
>During my install_execd, I was not able to install default queue with some
>error message I do not remember I think that might be one of the problem
>The hostname is the same which I use to ssh this machine i.e. mbialjpj
>Yes I had chosen that all hosts are in same domain but the message came
>like
Since you have mbialjpj in /etc/hosts you should be able to execute qconf -ae
and replace template with the name mbialjpj.

William




signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Systemd more friendly sgemaster

2016-11-28 Thread William Hay

On Mon, Nov 28, 2016 at 07:50:11AM +, Ondrej Valousek wrote:
>Hello,
> 
>I am just asking if it would be possible to modify sge_qmaster to do not
>fork (based on say some env variable).
The bug you reference points out that the variable SGE_ND does this already.  
The problem as
seen there is that it also causes additional messages to be logged.  A quick 
test on my debian box 
suggests that the messages in question are actually being sent to STDOUT and 
possibly captured and 
logged by systemd.

IIRC Systemd can be configured to just dump STDOUT into /dev/null which would 
presumably do what 
you need in combination with SGE_ND.

If you still think there is a problem a little more info would be helpful.

What version of grid engine are you using?
Does setting SGE_ND not prevent forking for you?
If setting SGE_ND  does prevent forking and the problem is exessive log 
messages then an example
of the messages in question would be helpful.

William

> 
>This way it would be more friendly to systemd.
> 
>See:
> 
> 
> 
> 
> 
>https://bugzilla.redhat.com/show_bug.cgi?id=1082129

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Systemd more friendly sgemaster

2016-11-28 Thread William Hay

On Mon, Nov 28, 2016 at 10:59:21AM +, Mike Grant wrote:
> Writing as the original reporter of that bug, here's a bit more info..
> 
> On 28/11/16 10:34, William Hay wrote:
> > What version of grid engine are you using?
> 
> The bug and the comments below refer to the Fedora packaged version,
> which will be quite a bit behind SoGE.  Yes, we plan to try out SoGE on
> our next OS update.
> 
> > Does setting SGE_ND not prevent forking for you?
> 
> There is an issue that shadowd (at least in the Fedora packaged version)
> still daemonises even with SGE_ND set.  In order to prevent that, one
> also has to set SGE_DEBUG_LEVEL="1 0 0 0 0 0 0 0" (minimal?), which
> results in a lot of spam if that node becomes the master.  Even
> filtering this doesn't help a lot, as it burns a lot of CPU printing and
> dropping the lines.

SGE_ND is documented in the qmaster man page but not the shadowd man page 
so I guess that is "working as designed".  Since SGE_DEBUG_LEVEL prevents 
daemonisation it would probably be fairly simple to make SGE_ND work as 
well.  You could submit a request for that as an enhancement bug to Dave's 
bugtracker.

An alternative/workaround might be to use a generic HA solution rather 
than the shadow_master daemon.  Pacemaker claims to work with systemd.  

> 
> If that's still true in SoGE, making it respect SGE_ND would be a nice fix.
> 
> > If setting SGE_ND  does prevent forking and the problem is exessive log 
> > messages then an example
> > of the messages in question would be helpful.
> 
> Here's a small snippet from last night - it's just a few lines every few
> minutes.  Mostly it's a regular scheduler report (does this go to
> stdout? maybe I should filter it!) and one legitimate error (definitely
> don't want to filter this).

The scheduler stuff vanishes if one redirects stdout to /dev/null (at least 
for me with 8.1.9). Other stuff ( a few messages at startup ) still gets 
logged (again at least the messages I saw).

William

signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Systemd more friendly sgemaster

2016-11-28 Thread William Hay

On Mon, Nov 28, 2016 at 10:46:35AM +, Ondrej Valousek wrote:
> I am using GE version 8.1.8.
> I did not try SGE_ND yet as I was not aware of it
> I will give it a try - will see how it goes.
> Thanks for the hint.

I'm not a systemd expert by any means but something like the following might 
also work and supports grid engine's
CELLs by mapping them to systemd instances while letting systemd know where 
gridengine puts its pidfile.

Type=forking
DefaultInstance=default
Environment="SGE_CELL=%i"
PidFile=/var/spool/gridengine/%i/qmaster/qmaster.pid


> On Mon, Nov 28, 2016 at 07:50:11AM +, Ondrej Valousek wrote:
> >Hello,
> > 
> >I am just asking if it would be possible to modify sge_qmaster to do not
> >fork (based on say some env variable).
> >This way it would be more friendly to systemd.
> >https://bugzilla.redhat.com/show_bug.cgi?id=1082129


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-28 Thread William Hay

On Mon, Nov 28, 2016 at 06:16:00PM +0530, Himanshu Joshi wrote:
> 
>Now installation of sge is done
> 
>ps aux | grep "sge" command says
> 
>root  7407  0.0  0.2 213524 38396 ?Sl   16:37   0:01
>/opt/sge/bin/lx-amd64/sge_qmaster
>root  9962  0.0  0.0 112648   960 pts/0S+   17:53   0:00 grep
>--color=auto sge
>then
>I did
> service sgeexecd.mbialjpj55 start
>   Starting Grid Engine execution daemon
> 
>but
>ps aux | grep "sge" again says the same status
> 
>root  7407  0.0  0.2 213524 38396 ?Sl   16:37   0:01
>/opt/sge/bin/lx-amd64/sge_qmaster
>root  9974  0.0  0.0 112648   960 pts/0S+   17:54   0:00 grep
>--color=auto sge
>I would now setup SGE

sge_execd should be running.

As root try /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55 start

See what the output is.  Hopefully it will provide some clue why the execd 
isn't starting.


> 
>Now using qmon gives the following error
> 
Are you logged in directly in the console of the machine where you are running 
qmon
or are you accessing it via ssh?  If you run "echo $DISPLAY" what do you get?

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-29 Thread William Hay

On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu Joshi wrote:
>On Mon, Nov 28, 2016 at 9:26 PM, William Hay  wrote:
> 
>  On Mon, Nov 28, 2016 at 06:16:00PM +0530, Himanshu Joshi wrote:
>  >
>  >Now installation of sge is done
>  >
>  >ps aux | grep "sge" command says
>  >
>  >root  7407  0.0  0.2 213524 38396 ?Sl   16:37   0:01
>  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >root  9962  0.0  0.0 112648   960 pts/0S+   17:53   0:00
>  grep
>  >--color=auto sge
>  >then
>  >I did
>  > service sgeexecd.mbialjpj55 start
>  >   Starting Grid Engine execution daemon
>  >
>  >but
>  >ps aux | grep "sge" again says the same status
>  >
>  >root  7407  0.0  0.2 213524 38396 ?Sl   16:37   0:01
>  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >root  9974  0.0  0.0 112648   960 pts/0S+   17:54   0:00
>  grep
>  >--color=auto sge
>  >I would now setup SGE
> 
>  sge_execd should be running.
> 
>  As root try /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55 start
> 
> 
>[root@mbialjpj ~]# /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55 start
>/bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No such file or directory

Try again but with the obvious typo corrected.

William



signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-29 Thread William Hay

On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi wrote:
>On Tue, Nov 29, 2016 at 5:30 PM, William Hay  wrote:
> 
>  On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu Joshi wrote:
>  >On Mon, Nov 28, 2016 at 9:26 PM, William Hay 
>  wrote:
>  >
>  >  On Mon, Nov 28, 2016 at 06:16:00PM +0530, Himanshu Joshi wrote:
>  >  >
>  >  >Now installation of sge is done
>  >  >
>  >  >ps aux | grep "sge" command says
>  >  >
>  >  >root  7407  0.0  0.2 213524 38396 ?Sl   16:37 
>   0:01
>  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >root  9962  0.0  0.0 112648   960 pts/0S+   17:53 
>   0:00
>  >  grep
>  >  >--color=auto sge
>  >  >then
>  >  >I did
>  >  > service sgeexecd.mbialjpj55 start
>  >  >   Starting Grid Engine execution daemon
>  >  >
>  >  >but
>  >  >ps aux | grep "sge" again says the same status
>  >  >
>  >  >root  7407  0.0  0.2 213524 38396 ?Sl   16:37 
>   0:01
>  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >root  9974  0.0  0.0 112648   960 pts/0S+   17:54 
>   0:00
>  >  grep
>  >  >--color=auto sge
>  >  >I would now setup SGE
>  >
>  >  sge_execd should be running.
>  >
>  >  As root try /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55 start
>  >
>  >
>  >[root@mbialjpj ~]# /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55 start
>  >/bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No such file or directory
> 
>  Try again but with the obvious typo corrected.
> 
>sorry for the typo error
> Here is the output...
I think the original typo is mine.  
Looks like everything should work.

can you try:
cat /opt/sge/default/spool/mbialjpj/messages

The log file may contain clues as to why it died/failed to start.


William



signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-30 Thread William Hay

On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi wrote:
>On Tue, Nov 29, 2016 at 8:57 PM, William Hay  wrote:
> 
>  On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi wrote:
>  >On Tue, Nov 29, 2016 at 5:30 PM, William Hay 
>  wrote:
>  >
>  >  On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu Joshi wrote:
>  >  >On Mon, Nov 28, 2016 at 9:26 PM, William Hay
>  
>  >  wrote:
>  >  >
>  >  >  On Mon, Nov 28, 2016 at 06:16:00PM +0530, Himanshu Joshi
>  wrote:
>  >  >  >
>  >  >  >Now installation of sge is done
>  >  >  >
>  >  >  >ps aux | grep "sge" command says
>  >  >  >
>  >  >  >root  7407  0.0  0.2 213524 38396 ?Sl 
>   16:37
>  >   0:01
>  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >root  9962  0.0  0.0 112648   960 pts/0S+ 
>   17:53
>  >   0:00
>  >  >  grep
>  >  >  >--color=auto sge
>  >  >  >then
>  >  >  >I did
>  >  >  > service sgeexecd.mbialjpj55 start
>  >  >  >   Starting Grid Engine execution daemon
>  >  >  >
>  >  >  >but
>  >  >  >ps aux | grep "sge" again says the same status
>  >  >  >
>  >  >  >root  7407  0.0  0.2 213524 38396 ?Sl 
>   16:37
>  >   0:01
>  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >root  9974  0.0  0.0 112648   960 pts/0S+ 
>   17:54
>  >   0:00
>  >  >  grep
>  >  >  >--color=auto sge
>  >  >  >I would now setup SGE
>  >  >
>  >  >  sge_execd should be running.
>  >  >
>  >  >  As root try /bin/sh -x /etc/init.d/sgeexecd.mbilajpj55
>  start
>  >  >
>  >  >
>  >  >[root@mbialjpj ~]# /bin/sh -x
>  /etc/init.d/sgeexecd.mbilajpj55 start
>  >  >/bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No such file or
>  directory
>  >
>  >  Try again but with the obvious typo corrected.
>  >
>  >sorry for the typo error
>  > Here is the output...
>  I think the original typo is mine.
>  Looks like everything should work.
> 
>  can you try:
>  cat /opt/sge/default/spool/mbialjpj/messages
> 
>It says
> 
> 
>  cat: /opt/sge/default/spool/mbialjpj/messages: No such file or directory
> 
>  The log file may contain clues as to why it died/failed to start.
> 
>I think you are looking for /opt/sge/default/spool/qmaster/messages
>please find it attached 
>Regards
No I was hoping for the execd messages file.  I think it should be in the 
location I specified.
Can you have a look under /opt/sge/default/spool for other directories and see 
if they have a
messages file somewhere under them somewhere?

William




>--
>Himanshu Joshi




signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-30 Thread William Hay

On Wed, Nov 30, 2016 at 04:50:02PM +0530, Himanshu Joshi wrote:
>On Wed, Nov 30, 2016 at 4:04 PM, William Hay  wrote:
> 
>  On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi wrote:
>  >On Tue, Nov 29, 2016 at 8:57 PM, William Hay 
>  wrote:
>  >
>  >  On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi wrote:
>  >  >On Tue, Nov 29, 2016 at 5:30 PM, William Hay
>  
>  >  wrote:
>  >  >
>  >  >  On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu Joshi
>      wrote:
>  >  >  >On Mon, Nov 28, 2016 at 9:26 PM, William Hay
>  >  
>  >  >  wrote:
>  >  >  >
>  >  >  >  On Mon, Nov 28, 2016 at 06:16:00PM +0530, Himanshu
>  Joshi
>  >  wrote:
>  >  >  >  >
>  >  >  >  >Now installation of sge is done
>  >  >  >  >
>  >  >  >  >ps aux | grep "sge" command says
>  >  >  >  >
>  >  >  >  >root  7407  0.0  0.2 213524 38396 ?   
>  Sl
>  >   16:37
>  >  >   0:01
>  >  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >  >root  9962  0.0  0.0 112648   960 pts/0   
>  S+
>  >   17:53
>  >  >   0:00
>  >  >  >  grep
>  >  >  >  >--color=auto sge
>  >  >  >  >then
>  >  >  >  >I did
>  >  >  >  > service sgeexecd.mbialjpj55 start
>  >  >  >  >   Starting Grid Engine execution daemon
>  >  >  >  >
>  >  >  >  >but
>  >  >  >  >ps aux | grep "sge" again says the same status
>  >  >  >  >
>  >  >  >  >root  7407  0.0  0.2 213524 38396 ?   
>  Sl
>  >   16:37
>  >  >   0:01
>  >  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >  >root  9974  0.0  0.0 112648   960 pts/0   
>  S+
>  >   17:54
>  >  >   0:00
>  >  >  >  grep
>  >  >  >  >--color=auto sge
>  >  >  >  >I would now setup SGE
>  >  >  >
>  >  >  >  sge_execd should be running.
>  >  >  >
>  >  >  >  As root try /bin/sh -x
>  /etc/init.d/sgeexecd.mbilajpj55
>  >  start
>  >  >  >
>  >  >  >
>  >  >  >[root@mbialjpj ~]# /bin/sh -x
>  >  /etc/init.d/sgeexecd.mbilajpj55 start
>  >  >  >/bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No such
>  file or
>  >  directory
>  >  >
>  >  >  Try again but with the obvious typo corrected.
>  >  >
>  >  >sorry for the typo error
>  >  > Here is the output...
>  >  I think the original typo is mine.
>  >  Looks like everything should work.
>  >
>  >  can you try:
>  >  cat /opt/sge/default/spool/mbialjpj/messages
>  >
>  >It says
>  >
>  >
>  >  cat: /opt/sge/default/spool/mbialjpj/messages: No such file or
>  directory
>  >
>  >  The log file may contain clues as to why it died/failed to start.
>  >
>  >I think you are looking for /opt/sge/default/spool/qmaster/messages
>  >please find it attached
>  >Regards
>  No I was hoping for the execd messages file.  I think it should be in
>  the location I specified.
>  Can you have a look under /opt/sge/default/spool for other directories
>  and see if they have a
>  messages file somewhere under them somewhere?
> 
>No I have rechecked .. and there is nothing in /opt/sge/default/spool
>folder by name "messages" . And surprisingly this folder has only one
>folder and that is "qmaster"
>And there is nothing by the name "messages" even in the main directory
>"/opt"  i.e in even in opt folder the desired file is not found.
>Regards
>Himanshu

In that case I would try running the sge_execd binary directly.

With the sge environment loaded set the SGE_ND environment variable to true 
(which will keep the daemon in the foreground)
and try running /opt/sge/bin/lx-amd64/sge_execd.  If it is exiting on startup 
it may tell you why.  

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-11-30 Thread William Hay

On Wed, Nov 30, 2016 at 07:14:48PM +0530, Himanshu Joshi wrote:
>On Wed, Nov 30, 2016 at 7:03 PM, William Hay  wrote:
> 
>  On Wed, Nov 30, 2016 at 04:50:02PM +0530, Himanshu Joshi wrote:
>  >On Wed, Nov 30, 2016 at 4:04 PM, William Hay 
>  wrote:
>  >
>  >  On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi wrote:
>  >  >On Tue, Nov 29, 2016 at 8:57 PM, William Hay
>  
>  >  wrote:
>  >  >
>  >  >  On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi
>      wrote:
>  >  >  >On Tue, Nov 29, 2016 at 5:30 PM, William Hay
>  >  
>  >  >  wrote:
>  >  >  >
>  >  >  >  On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu
>  Joshi
>  >  wrote:
>  >  >  >  >On Mon, Nov 28, 2016 at 9:26 PM, William Hay
>  >  >  
>  >  >  >  wrote:
>  >  >  >  >
>  >  >  >  >  On Mon, Nov 28, 2016 at 06:16:00PM +0530,
>  Himanshu
>  >  Joshi
>  >  >  wrote:
>  >  >  >  >  >
>  >  >  >  >  >Now installation of sge is done
>  >  >  >  >  >
>  >  >  >  >  >ps aux | grep "sge" command says
>  >  >  >  >  >
>  >  >  >  >  >root  7407  0.0  0.2 213524 38396 ?
>  >  Sl
>  >  >   16:37
>  >  >  >   0:01
>  >  >  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >  >  >root  9962  0.0  0.0 112648   960
>  pts/0
>  >  S+
>  >  >   17:53
>  >  >  >   0:00
>  >  >  >  >  grep
>  >  >  >  >  >--color=auto sge
>  >  >  >  >  >then
>  >  >  >  >  >I did
>  >  >  >  >  > service sgeexecd.mbialjpj55 start
>  >  >  >  >  >   Starting Grid Engine execution
>  daemon
>  >  >  >  >  >
>  >  >  >  >  >but
>  >  >  >  >  >ps aux | grep "sge" again says the same
>  status
>  >  >  >  >  >
>  >  >  >  >  >root  7407  0.0  0.2 213524 38396 ?
>  >  Sl
>  >  >   16:37
>  >  >  >   0:01
>  >  >  >  >  >/opt/sge/bin/lx-amd64/sge_qmaster
>  >  >  >  >  >root  9974  0.0  0.0 112648   960
>  pts/0
>  >  S+
>  >  >   17:54
>  >  >  >   0:00
>  >  >  >  >  grep
>  >  >  >  >  >--color=auto sge
>  >  >  >  >  >I would now setup SGE
>  >  >  >  >
>  >  >  >  >  sge_execd should be running.
>  >  >  >  >
>  >  >  >  >  As root try /bin/sh -x
>  >  /etc/init.d/sgeexecd.mbilajpj55
>  >  >  start
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >[root@mbialjpj ~]# /bin/sh -x
>  >  >  /etc/init.d/sgeexecd.mbilajpj55 start
>  >  >  >  >/bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No
>  such
>  >  file or
>  >  >  directory
>  >  >  >
>  >  >  >  Try again but with the obvious typo corrected.
>  >  >  >
>  >  >  >sorry for the typo error
>  >  >  > Here is the output...
>  >  >  I think the original typo is mine.
>  >  >  Looks like everything should work.
>  >  >
>  >  >  can you try:
>  >  >  cat /opt/sge/default/spool/mbialjpj/messages
>  >  >
&

Re: [SGE-discuss] Fwd: Error at the time of Distribution staging

2016-12-01 Thread William Hay

On Thu, Dec 01, 2016 at 10:54:07AM +0530, Himanshu Joshi wrote:
>  If so then that sounds like something else
>  is using the port the sge_execd is trying to use.  Also 1024 isn't the
>  default
>  port for sge_execd.  Did you deliberately set it to something unusual
>  when
>  running inst_sge?
> 
>I have no idea about this discrepancy, At the time of running inst_sge I
>have used
>sge_qmaster 6444/tcp
>and
>sge_execd 6445/tcp
> 
>Thus, my /etc/services file reads
>sge_qmaster 6444/tcp  sge-qmaster   # Grid Engine Qmaster Service
>sge_qmaster 6444/udp  sge-qmaster   # Grid Engine Qmaster Service
>sge_execd   6445/tcp  sge-execd # Grid Engine Execution Service
>sge_execd   6445/udp  sge-execd # Grid Engine Execution Service
> 
>  If you have fuser installed then something like fuser -v 1024/tcp should
>  give you
>  the name of the process that is listening there.
> 
>this command lists
>Cannot stat file /proc/13859/fd/7: Permission denied
>Cannot stat file /proc/13859/fd/78: Permission denied
>Cannot stat file /proc/13859/fd/79: Permission denied
>Cannot stat file /proc/13859/fd/80: Permission denied
>Cannot stat file /proc/13859/fd/81: Permission denied
>Cannot stat file /proc/13859/fd/82: Permission denied
>Cannot stat file /proc/13859/fd/83: Permission denied
>Cannot stat file /proc/13859/fd/84: Permission denied
>Cannot stat file /proc/13859/fd/85: Permission denied
> USERPID ACCESS COMMAND
>1024/tcp:root   7407 F sge_qmaster
> 
Well the evidence suggests that something has set both the qmaster and the 
execd to try 
to listen on port 1024 which won't work if they are on the same machine.  

The port can be set via environment variables which I think will override the 
enties in /etc/service.

As root:
env |grep 'SGE_.*_PORT' 
should list the variables that control the ports used.

You should probably unset them (as you are using csh the unsetenv command 
should do that).
and restart both the qmaster and execd.


William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] USE_CGROUPS

2016-12-22 Thread William Hay

On Tue, Dec 20, 2016 at 02:09:22PM +, Ondrej Valousek wrote:
>Hi List,
> 
> 
> 
>I just enabled USE_CGROUPS execd parameters and I observe that
> 
>-  Relevant job cgroup is created in /dev/cpuset/sge
> 
>-  Task PIDs can not be found in /dev/cpuset/sge//tasks
> 
>What could be wrong?
> 
>I also use ENABLE_ADDGRP_KILL=true and USE_QSUB_GID=false, son of GE
>version 8.1.8.
The cgroup support in 8.1.8 is somewhat buggy.  I suggest upgrading to 8.1.9 if 
you want
to use cgroups.

William


signature.asc
Description: Digital signature
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

44 matches

Mail list logo