Re: [SGE-discuss] Kerberos authentication

2017-04-09 Thread juanesteban.jime...@mdc-berlin.de
I have a lot of problems with AD, Kerberos, SSSD, LDAP and GridEngine, but I 
think it is related to the fact that I connect to AD servers that do not 
synchronize with the master quicktly enough. Once in a while I have to clear 
the SSSD cache and restart the SSSD services on all the nodes, and until they 
manage to repopulated, qrsh refuses to open a new shell unless I point it to a 
node that I know is working.

Mfg,
Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800



From: SGE-discuss [sge-discuss-boun...@liverpool.ac.uk] on behalf of Orion 
Poplawski [or...@cora.nwra.com]
Sent: Thursday, April 06, 2017 22:46
To: sge-disc...@liverpool.ac.uk
Subject: [SGE-discuss] Kerberos authentication

I've built the gss utils with 'aimk -gss' and am testing with
security_mode set to kerberos.
   In my first attempt I tried to make use of gssproxy to store the
sge/qmaster principal, but unfortunately it appears that gssproxy is too old
on EL7 to handle storing the delegated credential for us:

put_cred stderr: GSS-API error copying delegated creds to ccache: The
operation or option is not available or unsuppo

Next attempt was to set:

KRB5_KTNAME=FILE:/var/spool/gridengine/sge.keytab

in the environment of the daemons and store the sge/host principals there.
This avoids needing to run qmaster as root to access /etc/krb5.keytab.  Need a
sge service principal for the qmaster and each of the exec hosts, which seems
appropriate.

   Another issue I ran into is that I'm running in an IPA/Active Directory
trust setup where the users are stored in the AD domain, and the hosts are in
the IPA domain.  Therefore the code in gsslib_put_credentials that was using
gss_compare_name() to compare users ended up comparing "orion" to
"or...@ad.nwra.com".  I changed that to also try using gss_localname() to
convert the client principal to a local username and comparing that.

   Also, the later code that called krb5_kuserok() segfaulted because it was
erroneously casting gss_name_t to krb5_principal.  I've started work changing
that to do the conversion properly but as of now that is untested.

  There are also a bunch of memory leaks in this code that probably should be
cleaned up, although at the moment this is all run in short lived executables.

  Finally, I needed to tweak my peopen() patch to run put_cred and delete_cred
as root on the exec hosts since they need to change the ownership and remove
files of the user running the job.

  At least for a simple test case, this appears to be working now for me, so
I'm fairly pleased.  Next issue I expect to face is renewing and expiring user
credentials for long running jobs.

--
Orion Poplawski
Technical Manager  720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301   http://www.nwra.com
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


Re: [SGE-discuss] Sizing the qmaster

2017-04-09 Thread juanesteban.jime...@mdc-berlin.de
Update.

Weeks later the overkill of setting memory to 12gb for the VM that runs qmaster 
has resulted in relative stability. I could investigate how much I can reduce 
that without cause problems but not in the mood for tempting Murphy,

FYI, those of you using SnakeMake will find out that it does not support using 
-v. Because it makes the user's lives "easier" they don't want to go through 
the trouble of using -v manually. I don't know why SnakeMake insists on not 
following best practice. I've just told the users that if they use SnakeMake to 
create thousands of jobs using -V and they crash the scheduler as a result, the 
blame will come right back to them...

Mfg,
Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800



From: Reuti [re...@staff.uni-marburg.de]
Sent: Tuesday, March 21, 2017 17:19
To: Jimenez, Juan Esteban
Cc: Jesse Becker; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster

> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de:
>
>>   The "size" of job metadata (scripts, ENV, etc) doesn't really affect
>>   the RAM usage appreciably that I've seen.  We routinely have jobs
>>   ENVs of almost 4k or more, and it's never been a problem.  The
>>   "data" processed by jobs isn't a factor in qmaster RAM usage, so far as
>>   I know.
>
> I’ve been reading otherwise, with specific recommendations to use –v rather 
> than –V, but many tools out there are lazy and just use –V. passing the 
> entire environment.

Although it falls out of the main discussion here:

I second to use only -v and only for environment variables you need in the job 
script, preferably with an assignment and not only broadcasting any value of 
the current bash session to the job. The best is to have self contained 
scripts, so that you can reproduce the behavior of a job also in a month for 
sure. Whether you have -v in the job script as a #$ line for SGE, or set/export 
it directly in the script, might be a matter of taste. One real application of 
-v with an assignment on the command line, is to use a different program flow 
in the job script (and print this case of execution in the output).

I avoided the word "path" in the last sentence above by intention. When I 
started with queuing systems, I saw many users using -V as it looks appealing 
to execute in the queuing system what you would otherwise start right now on 
the command line. But on the command line you notice instantly whether any 
$PATH was set in the wrong way and target the wrong binary or a wrong value for 
any application was assigned which changes its behavior. Recalling in one month 
why a different version of program XY was used than it was intended, or why the 
job crashed using e.g. a different parameter file can be hard. So I convinced 
my users to stop this procedure, as it was impossible for me as an admin to 
explain to them why their job crashed, when I'm not aware of the actual 
settings inside their shell session was at submission time. Even worse: having 
more than one terminal window open to the head node can result in the effect 
that in one session it's working, but not in the other because the shell's 
environment was changed by the user.

-- Reuti
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


Re: [SGE-discuss] Sizing the qmaster

2017-04-09 Thread Reuti
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Am 09.04.2017 um 12:38 schrieb juanesteban.jime...@mdc-berlin.de:

> Update.
> 
> Weeks later the overkill of setting memory to 12gb for the VM that runs 
> qmaster has resulted in relative stability. I could investigate how much I 
> can reduce that without cause problems but not in the mood for tempting 
> Murphy,
> 
> FYI, those of you using SnakeMake will find out that it does not support 
> using -v. Because it makes the user's lives "easier" they don't want to go 
> through the trouble of using -v manually. I don't know why SnakeMake insists 
> on not following best practice. I've just told the users that if they use 
> SnakeMake to create thousands of jobs using -V and they crash the scheduler 
> as a result, the blame will come right back to them...

I never used SnakeMake, but why is -v not supported? In the plain download I 
can't spot -V in the demo files.

In case a long list needs to be exported (with or w/o an assigned value), it 
could be put in each job specific directory or the users' home directories in a 
.sge_request file - whether SnakeMake like it or not.

- -- Reuti


> Mfg,
> Juan Jimenez
> System Administrator, HPC
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
> 
> 
> 
> From: Reuti [re...@staff.uni-marburg.de]
> Sent: Tuesday, March 21, 2017 17:19
> To: Jimenez, Juan Esteban
> Cc: Jesse Becker; SGE-discuss@liv.ac.uk
> Subject: Re: [SGE-discuss] Sizing the qmaster
> 
>> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de:
>> 
>>>  The "size" of job metadata (scripts, ENV, etc) doesn't really affect
>>>  the RAM usage appreciably that I've seen.  We routinely have jobs
>>>  ENVs of almost 4k or more, and it's never been a problem.  The
>>>  "data" processed by jobs isn't a factor in qmaster RAM usage, so far as
>>>  I know.
>> 
>> I’ve been reading otherwise, with specific recommendations to use –v rather 
>> than –V, but many tools out there are lazy and just use –V. passing the 
>> entire environment.
> 
> Although it falls out of the main discussion here:
> 
> I second to use only -v and only for environment variables you need in the 
> job script, preferably with an assignment and not only broadcasting any value 
> of the current bash session to the job. The best is to have self contained 
> scripts, so that you can reproduce the behavior of a job also in a month for 
> sure. Whether you have -v in the job script as a #$ line for SGE, or 
> set/export it directly in the script, might be a matter of taste. One real 
> application of -v with an assignment on the command line, is to use a 
> different program flow in the job script (and print this case of execution in 
> the output).
> 
> I avoided the word "path" in the last sentence above by intention. When I 
> started with queuing systems, I saw many users using -V as it looks appealing 
> to execute in the queuing system what you would otherwise start right now on 
> the command line. But on the command line you notice instantly whether any 
> $PATH was set in the wrong way and target the wrong binary or a wrong value 
> for any application was assigned which changes its behavior. Recalling in one 
> month why a different version of program XY was used than it was intended, or 
> why the job crashed using e.g. a different parameter file can be hard. So I 
> convinced my users to stop this procedure, as it was impossible for me as an 
> admin to explain to them why their job crashed, when I'm not aware of the 
> actual settings inside their shell session was at submission time. Even 
> worse: having more than one terminal window open to the head node can result 
> in the effect that in one session it's working, but not in the other because 
> the shell's environment was changed by the user.
> 
> -- Reuti
> 

-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAljqTqMACgkQo/GbGkBRnRr+cQCguRgTKdENd6paZgdlBJI068jN
gvAAnRrKoeUTGtW8pAmL0iXL8wNFIonr
=iHmT
-END PGP SIGNATURE-
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


Re: [SGE-discuss] Sizing the qmaster

2017-04-09 Thread juanesteban.jime...@mdc-berlin.de
I'm guess it's because they took a shotgun approach to that and have not 
figured out how to let the user specify the -v can be used, and how. But that's 
just a guess.

I don't follow your suggestion, though...

Juan


From: Reuti [re...@staff.uni-marburg.de]
Sent: Sunday, April 09, 2017 17:09
To: Jimenez, Juan Esteban
Cc: Jesse Becker; SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Am 09.04.2017 um 12:38 schrieb juanesteban.jime...@mdc-berlin.de:

> Update.
>
> Weeks later the overkill of setting memory to 12gb for the VM that runs 
> qmaster has resulted in relative stability. I could investigate how much I 
> can reduce that without cause problems but not in the mood for tempting 
> Murphy,
>
> FYI, those of you using SnakeMake will find out that it does not support 
> using -v. Because it makes the user's lives "easier" they don't want to go 
> through the trouble of using -v manually. I don't know why SnakeMake insists 
> on not following best practice. I've just told the users that if they use 
> SnakeMake to create thousands of jobs using -V and they crash the scheduler 
> as a result, the blame will come right back to them...

I never used SnakeMake, but why is -v not supported? In the plain download I 
can't spot -V in the demo files.

In case a long list needs to be exported (with or w/o an assigned value), it 
could be put in each job specific directory or the users' home directories in a 
.sge_request file - whether SnakeMake like it or not.

- -- Reuti


> Mfg,
> Juan Jimenez
> System Administrator, HPC
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
>
>
> 
> From: Reuti [re...@staff.uni-marburg.de]
> Sent: Tuesday, March 21, 2017 17:19
> To: Jimenez, Juan Esteban
> Cc: Jesse Becker; SGE-discuss@liv.ac.uk
> Subject: Re: [SGE-discuss] Sizing the qmaster
>
>> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de:
>>
>>>  The "size" of job metadata (scripts, ENV, etc) doesn't really affect
>>>  the RAM usage appreciably that I've seen.  We routinely have jobs
>>>  ENVs of almost 4k or more, and it's never been a problem.  The
>>>  "data" processed by jobs isn't a factor in qmaster RAM usage, so far as
>>>  I know.
>>
>> I’ve been reading otherwise, with specific recommendations to use –v rather 
>> than –V, but many tools out there are lazy and just use –V. passing the 
>> entire environment.
>
> Although it falls out of the main discussion here:
>
> I second to use only -v and only for environment variables you need in the 
> job script, preferably with an assignment and not only broadcasting any value 
> of the current bash session to the job. The best is to have self contained 
> scripts, so that you can reproduce the behavior of a job also in a month for 
> sure. Whether you have -v in the job script as a #$ line for SGE, or 
> set/export it directly in the script, might be a matter of taste. One real 
> application of -v with an assignment on the command line, is to use a 
> different program flow in the job script (and print this case of execution in 
> the output).
>
> I avoided the word "path" in the last sentence above by intention. When I 
> started with queuing systems, I saw many users using -V as it looks appealing 
> to execute in the queuing system what you would otherwise start right now on 
> the command line. But on the command line you notice instantly whether any 
> $PATH was set in the wrong way and target the wrong binary or a wrong value 
> for any application was assigned which changes its behavior. Recalling in one 
> month why a different version of program XY was used than it was intended, or 
> why the job crashed using e.g. a different parameter file can be hard. So I 
> convinced my users to stop this procedure, as it was impossible for me as an 
> admin to explain to them why their job crashed, when I'm not aware of the 
> actual settings inside their shell session was at submission time. Even 
> worse: having more than one terminal window open to the head node can result 
> in the effect that in one session it's working, but not in the other because 
> the shell's environment was changed by the user.
>
> -- Reuti
>

-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAljqTqMACgkQo/GbGkBRnRr+cQCguRgTKdENd6paZgdlBJI068jN
gvAAnRrKoeUTGtW8pAmL0iXL8wNFIonr
=iHmT
-END PGP SIGNATURE-
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss