Re: [SGE-discuss] Kerberos authentication
I have a lot of problems with AD, Kerberos, SSSD, LDAP and GridEngine, but I think it is related to the fact that I connect to AD servers that do not synchronize with the master quicktly enough. Once in a while I have to clear the SSSD cache and restart the SSSD services on all the nodes, and until they manage to repopulated, qrsh refuses to open a new shell unless I point it to a node that I know is working. Mfg, Juan Jimenez System Administrator, HPC MDC Berlin / IT-Dept. Tel.: +49 30 9406 2800 From: SGE-discuss [sge-discuss-boun...@liverpool.ac.uk] on behalf of Orion Poplawski [or...@cora.nwra.com] Sent: Thursday, April 06, 2017 22:46 To: sge-disc...@liverpool.ac.uk Subject: [SGE-discuss] Kerberos authentication I've built the gss utils with 'aimk -gss' and am testing with security_mode set to kerberos. In my first attempt I tried to make use of gssproxy to store the sge/qmaster principal, but unfortunately it appears that gssproxy is too old on EL7 to handle storing the delegated credential for us: put_cred stderr: GSS-API error copying delegated creds to ccache: The operation or option is not available or unsuppo Next attempt was to set: KRB5_KTNAME=FILE:/var/spool/gridengine/sge.keytab in the environment of the daemons and store the sge/host principals there. This avoids needing to run qmaster as root to access /etc/krb5.keytab. Need a sge service principal for the qmaster and each of the exec hosts, which seems appropriate. Another issue I ran into is that I'm running in an IPA/Active Directory trust setup where the users are stored in the AD domain, and the hosts are in the IPA domain. Therefore the code in gsslib_put_credentials that was using gss_compare_name() to compare users ended up comparing "orion" to "or...@ad.nwra.com". I changed that to also try using gss_localname() to convert the client principal to a local username and comparing that. Also, the later code that called krb5_kuserok() segfaulted because it was erroneously casting gss_name_t to krb5_principal. I've started work changing that to do the conversion properly but as of now that is untested. There are also a bunch of memory leaks in this code that probably should be cleaned up, although at the moment this is all run in short lived executables. Finally, I needed to tweak my peopen() patch to run put_cred and delete_cred as root on the exec hosts since they need to change the ownership and remove files of the user running the job. At least for a simple test case, this appears to be working now for me, so I'm fairly pleased. Next issue I expect to face is renewing and expiring user credentials for long running jobs. -- Orion Poplawski Technical Manager 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane or...@nwra.com Boulder, CO 80301 http://www.nwra.com ___ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss ___ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
Re: [SGE-discuss] Sizing the qmaster
Update. Weeks later the overkill of setting memory to 12gb for the VM that runs qmaster has resulted in relative stability. I could investigate how much I can reduce that without cause problems but not in the mood for tempting Murphy, FYI, those of you using SnakeMake will find out that it does not support using -v. Because it makes the user's lives "easier" they don't want to go through the trouble of using -v manually. I don't know why SnakeMake insists on not following best practice. I've just told the users that if they use SnakeMake to create thousands of jobs using -V and they crash the scheduler as a result, the blame will come right back to them... Mfg, Juan Jimenez System Administrator, HPC MDC Berlin / IT-Dept. Tel.: +49 30 9406 2800 From: Reuti [re...@staff.uni-marburg.de] Sent: Tuesday, March 21, 2017 17:19 To: Jimenez, Juan Esteban Cc: Jesse Becker; SGE-discuss@liv.ac.uk Subject: Re: [SGE-discuss] Sizing the qmaster > Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de: > >> The "size" of job metadata (scripts, ENV, etc) doesn't really affect >> the RAM usage appreciably that I've seen. We routinely have jobs >> ENVs of almost 4k or more, and it's never been a problem. The >> "data" processed by jobs isn't a factor in qmaster RAM usage, so far as >> I know. > > I’ve been reading otherwise, with specific recommendations to use –v rather > than –V, but many tools out there are lazy and just use –V. passing the > entire environment. Although it falls out of the main discussion here: I second to use only -v and only for environment variables you need in the job script, preferably with an assignment and not only broadcasting any value of the current bash session to the job. The best is to have self contained scripts, so that you can reproduce the behavior of a job also in a month for sure. Whether you have -v in the job script as a #$ line for SGE, or set/export it directly in the script, might be a matter of taste. One real application of -v with an assignment on the command line, is to use a different program flow in the job script (and print this case of execution in the output). I avoided the word "path" in the last sentence above by intention. When I started with queuing systems, I saw many users using -V as it looks appealing to execute in the queuing system what you would otherwise start right now on the command line. But on the command line you notice instantly whether any $PATH was set in the wrong way and target the wrong binary or a wrong value for any application was assigned which changes its behavior. Recalling in one month why a different version of program XY was used than it was intended, or why the job crashed using e.g. a different parameter file can be hard. So I convinced my users to stop this procedure, as it was impossible for me as an admin to explain to them why their job crashed, when I'm not aware of the actual settings inside their shell session was at submission time. Even worse: having more than one terminal window open to the head node can result in the effect that in one session it's working, but not in the other because the shell's environment was changed by the user. -- Reuti ___ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
Re: [SGE-discuss] Sizing the qmaster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 09.04.2017 um 12:38 schrieb juanesteban.jime...@mdc-berlin.de: > Update. > > Weeks later the overkill of setting memory to 12gb for the VM that runs > qmaster has resulted in relative stability. I could investigate how much I > can reduce that without cause problems but not in the mood for tempting > Murphy, > > FYI, those of you using SnakeMake will find out that it does not support > using -v. Because it makes the user's lives "easier" they don't want to go > through the trouble of using -v manually. I don't know why SnakeMake insists > on not following best practice. I've just told the users that if they use > SnakeMake to create thousands of jobs using -V and they crash the scheduler > as a result, the blame will come right back to them... I never used SnakeMake, but why is -v not supported? In the plain download I can't spot -V in the demo files. In case a long list needs to be exported (with or w/o an assigned value), it could be put in each job specific directory or the users' home directories in a .sge_request file - whether SnakeMake like it or not. - -- Reuti > Mfg, > Juan Jimenez > System Administrator, HPC > MDC Berlin / IT-Dept. > Tel.: +49 30 9406 2800 > > > > From: Reuti [re...@staff.uni-marburg.de] > Sent: Tuesday, March 21, 2017 17:19 > To: Jimenez, Juan Esteban > Cc: Jesse Becker; SGE-discuss@liv.ac.uk > Subject: Re: [SGE-discuss] Sizing the qmaster > >> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de: >> >>> The "size" of job metadata (scripts, ENV, etc) doesn't really affect >>> the RAM usage appreciably that I've seen. We routinely have jobs >>> ENVs of almost 4k or more, and it's never been a problem. The >>> "data" processed by jobs isn't a factor in qmaster RAM usage, so far as >>> I know. >> >> I’ve been reading otherwise, with specific recommendations to use –v rather >> than –V, but many tools out there are lazy and just use –V. passing the >> entire environment. > > Although it falls out of the main discussion here: > > I second to use only -v and only for environment variables you need in the > job script, preferably with an assignment and not only broadcasting any value > of the current bash session to the job. The best is to have self contained > scripts, so that you can reproduce the behavior of a job also in a month for > sure. Whether you have -v in the job script as a #$ line for SGE, or > set/export it directly in the script, might be a matter of taste. One real > application of -v with an assignment on the command line, is to use a > different program flow in the job script (and print this case of execution in > the output). > > I avoided the word "path" in the last sentence above by intention. When I > started with queuing systems, I saw many users using -V as it looks appealing > to execute in the queuing system what you would otherwise start right now on > the command line. But on the command line you notice instantly whether any > $PATH was set in the wrong way and target the wrong binary or a wrong value > for any application was assigned which changes its behavior. Recalling in one > month why a different version of program XY was used than it was intended, or > why the job crashed using e.g. a different parameter file can be hard. So I > convinced my users to stop this procedure, as it was impossible for me as an > admin to explain to them why their job crashed, when I'm not aware of the > actual settings inside their shell session was at submission time. Even > worse: having more than one terminal window open to the head node can result > in the effect that in one session it's working, but not in the other because > the shell's environment was changed by the user. > > -- Reuti > -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAljqTqMACgkQo/GbGkBRnRr+cQCguRgTKdENd6paZgdlBJI068jN gvAAnRrKoeUTGtW8pAmL0iXL8wNFIonr =iHmT -END PGP SIGNATURE- ___ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
Re: [SGE-discuss] Sizing the qmaster
I'm guess it's because they took a shotgun approach to that and have not figured out how to let the user specify the -v can be used, and how. But that's just a guess. I don't follow your suggestion, though... Juan From: Reuti [re...@staff.uni-marburg.de] Sent: Sunday, April 09, 2017 17:09 To: Jimenez, Juan Esteban Cc: Jesse Becker; SGE-discuss@liv.ac.uk Subject: Re: [SGE-discuss] Sizing the qmaster -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 09.04.2017 um 12:38 schrieb juanesteban.jime...@mdc-berlin.de: > Update. > > Weeks later the overkill of setting memory to 12gb for the VM that runs > qmaster has resulted in relative stability. I could investigate how much I > can reduce that without cause problems but not in the mood for tempting > Murphy, > > FYI, those of you using SnakeMake will find out that it does not support > using -v. Because it makes the user's lives "easier" they don't want to go > through the trouble of using -v manually. I don't know why SnakeMake insists > on not following best practice. I've just told the users that if they use > SnakeMake to create thousands of jobs using -V and they crash the scheduler > as a result, the blame will come right back to them... I never used SnakeMake, but why is -v not supported? In the plain download I can't spot -V in the demo files. In case a long list needs to be exported (with or w/o an assigned value), it could be put in each job specific directory or the users' home directories in a .sge_request file - whether SnakeMake like it or not. - -- Reuti > Mfg, > Juan Jimenez > System Administrator, HPC > MDC Berlin / IT-Dept. > Tel.: +49 30 9406 2800 > > > > From: Reuti [re...@staff.uni-marburg.de] > Sent: Tuesday, March 21, 2017 17:19 > To: Jimenez, Juan Esteban > Cc: Jesse Becker; SGE-discuss@liv.ac.uk > Subject: Re: [SGE-discuss] Sizing the qmaster > >> Am 21.03.2017 um 16:15 schrieb juanesteban.jime...@mdc-berlin.de: >> >>> The "size" of job metadata (scripts, ENV, etc) doesn't really affect >>> the RAM usage appreciably that I've seen. We routinely have jobs >>> ENVs of almost 4k or more, and it's never been a problem. The >>> "data" processed by jobs isn't a factor in qmaster RAM usage, so far as >>> I know. >> >> I’ve been reading otherwise, with specific recommendations to use –v rather >> than –V, but many tools out there are lazy and just use –V. passing the >> entire environment. > > Although it falls out of the main discussion here: > > I second to use only -v and only for environment variables you need in the > job script, preferably with an assignment and not only broadcasting any value > of the current bash session to the job. The best is to have self contained > scripts, so that you can reproduce the behavior of a job also in a month for > sure. Whether you have -v in the job script as a #$ line for SGE, or > set/export it directly in the script, might be a matter of taste. One real > application of -v with an assignment on the command line, is to use a > different program flow in the job script (and print this case of execution in > the output). > > I avoided the word "path" in the last sentence above by intention. When I > started with queuing systems, I saw many users using -V as it looks appealing > to execute in the queuing system what you would otherwise start right now on > the command line. But on the command line you notice instantly whether any > $PATH was set in the wrong way and target the wrong binary or a wrong value > for any application was assigned which changes its behavior. Recalling in one > month why a different version of program XY was used than it was intended, or > why the job crashed using e.g. a different parameter file can be hard. So I > convinced my users to stop this procedure, as it was impossible for me as an > admin to explain to them why their job crashed, when I'm not aware of the > actual settings inside their shell session was at submission time. Even > worse: having more than one terminal window open to the head node can result > in the effect that in one session it's working, but not in the other because > the shell's environment was changed by the user. > > -- Reuti > -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAljqTqMACgkQo/GbGkBRnRr+cQCguRgTKdENd6paZgdlBJI068jN gvAAnRrKoeUTGtW8pAmL0iXL8wNFIonr =iHmT -END PGP SIGNATURE- ___ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss