On Fri, 24 Mar 2017 at 1:03pm, Reuti wrote
Is this expected behavior? Or is something wonky with the cgroups here? Thanks for any insights.
And the mystery deepens. After changing execd_params to turn off "USE_CGROUPS", I tried restarting the exec daemons on the compute nodes (just to make sure the change was propagated, which I see now from the man page isn't necessary). However, the daemons failed to restart on some of the nodes that aren't also admin hosts (do they have to be now?). When the testing showed that the commands generated output now on the nodes with restarted exec daemons, I turned "USE_CGROUPS" back on and restarted the daemons again... and the commands *still* work. So it seems to be restarting the daemons that "fixed" the issue, not the cgroups change. Color me even more confused.
You can try to use `strace` to call the two applications in question, maybe it give some hints about their behavior.
Good idea. One result of the above shenanigans is that I currently have nodes where these commands work, and ones where they don't (because those exec daemons never got restarted). This is the only difference that looks relevant.
From a node that doesn't work:
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffe56529040) = -1 ENOTTY (Inappropriate ioctl for device) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab7c6f9d000 write(1, "cin-id2\n", 8) = 8 exit_group(0) = ?
From a node that does work:
fstat(1, {st_mode=S_IFREG|0644, st_size=45077, ...}) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2adced199000 write(1, "qb3-id2\n", 8) = 8 exit_group(0) = ? I'm getting progressively more confused. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users