Finally I found the problem in the source and submitted a patch. https://arc.liv.ac.uk/trac/SGE/ticket/1619
The function "read" needs "ssize_t" as return, not "size_t". "size_t" is unsigned, which makes it impossible to check for "-1". On my linux system the read-call returns a "ssize_t" which is signed and can be compared to "-1". See read(2). Greetings ... Marco Am 03.11.2017 um 10:57 schrieb Marco Schmidt: > Good news, I found the problem. > > Even if in the "bootstrap" file the "security_mode" is set to "none", > qmaster wants more than plaintext communication. I guess without a > certificate it can not initialize the ssl components. Means qmaster does > not expect ssl nor plaintext. Maybe "Munge" communication? (If this > sounds silly to people with "munge" knowledge, please go ahead and add > some more documentation about "SGE and Munge", I have no idea about it > (yet)) > > After generating the sgeCA infrastructure and set "security_mode" to > "csp" the commlib error disappears and communication with qmaster works. > > Unfortunately I am not that deep inside the code that I can send a patch > right now. It will take some time. > > If there is somebody who knows already how to fix it, please tell us. > > Greetings ... > Marco > > > On 02.11.2017 15:09, Marco Schmidt wrote: >> Same behavior on debian 8 (jessie). >> >> Seem I have to go deeper in the code for debugging. >> >> Greetings ... >> Marco >> >> >> On 01.11.2017 15:46, Marco Schmidt wrote: >>> Hello, >>> >>> I am now trying since several days to make SGE run on Debian 9 (stretch) >>> using the source from the darcs repository. >>> >>> It was really easy to build the debian packages and install them. >>> >>> The master is running (it responds to qping), but any other command >>> (qconf, qstat) fails with a "comm error". >>> >>> Yes, another of these "comm errors". I am quiet experienced with these, >>> because I run the Gridengine in many versions and this is the most >>> common error I get. Usually its because of wrong entries in /etc/hosts. >>> And till now, I always found a solution. >>> >>> From the client (on same host as qmaster): >>> # qconf -sql >>> error: unable to contact qmaster using port 6444 on host "xxxx.fqdn" >>> >>> In the qmaster log: >>> 11/01/2017 15:36:27|listen|xxxx|E|commlib error: got read error (closing >>> "xxxx.fqdn/qconf/15") >>> 11/01/2017 15:36:27|listen|xxxx|E|commlib error: got select error >>> (closing "xxxx.fqdn/qconf/17") >>> >>> Seems that they communicate, but not successful. >>> >>> security is set to "none". >>> >>> Has anybody any idea? >>> Any idea how to debug this ? >>> Somebody who runs it on debian 9 (stretch)? >>> >>> I built the packages on Ubuntu Xenial (16.04), installed a new qmaster >>> with them and get the same error. >>> (I will try to send some patches, because it was not as straight forward >>> as with debian stretch). >>> >>> Currently I try to build the packages for debian 8 (jessie) to see if it >>> works there. >>> >>> Greetings ... >>> Marco >>> >>> >>> -- marco.schm...@gmail.com _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss