Finally I found the problem in the source and submitted a patch.

https://arc.liv.ac.uk/trac/SGE/ticket/1619

The function "read" needs "ssize_t" as return, not "size_t".
"size_t" is unsigned, which makes it impossible to check for "-1".

On my linux system the read-call returns a "ssize_t" which is signed and
can be compared to "-1".

See read(2).

Greetings ...
 Marco


Am 03.11.2017 um 10:57 schrieb Marco Schmidt:
> Good news, I found the problem.
> 
> Even if in the "bootstrap" file the "security_mode" is set to "none",
> qmaster wants more than plaintext communication. I guess without a
> certificate it can not initialize the ssl components. Means qmaster does
> not expect ssl nor plaintext. Maybe "Munge" communication? (If this
> sounds silly to people with "munge" knowledge, please go ahead and add
> some more documentation about "SGE and Munge", I have no idea about it
> (yet))
> 
> After generating the sgeCA infrastructure and set "security_mode" to
> "csp" the commlib error disappears and communication with qmaster works.
> 
> Unfortunately I am not that deep inside the code that I can send a patch
> right now. It will take some time.
> 
> If there is somebody who knows already how to fix it, please tell us.
> 
> Greetings ...
>  Marco
> 
> 
> On 02.11.2017 15:09, Marco Schmidt wrote:
>> Same behavior on debian 8 (jessie).
>>
>> Seem I have to go deeper in the code for debugging.
>>
>> Greetings ...
>>  Marco
>>
>>
>> On 01.11.2017 15:46, Marco Schmidt wrote:
>>> Hello,
>>>
>>> I am now trying since several days to make SGE run on Debian 9 (stretch)
>>> using the source from the darcs repository.
>>>
>>> It was really easy to build the debian packages and install them.
>>>
>>> The master is running (it responds to qping), but any other command
>>> (qconf, qstat) fails with a "comm error".
>>>
>>> Yes, another of these "comm errors". I am quiet experienced with these,
>>> because I run the Gridengine in many versions and this is the most
>>> common error I get. Usually its because of wrong entries in /etc/hosts.
>>> And till now, I always found a solution.
>>>
>>> From the client (on same host as qmaster):
>>> # qconf -sql
>>> error: unable to contact qmaster using port 6444 on host "xxxx.fqdn"
>>>
>>> In the qmaster log:
>>> 11/01/2017 15:36:27|listen|xxxx|E|commlib error: got read error (closing
>>> "xxxx.fqdn/qconf/15")
>>> 11/01/2017 15:36:27|listen|xxxx|E|commlib error: got select error
>>> (closing "xxxx.fqdn/qconf/17")
>>>
>>> Seems that they communicate, but not successful.
>>>
>>> security is set to "none".
>>>
>>> Has anybody any idea?
>>> Any idea how to debug this ?
>>> Somebody who runs it on debian 9 (stretch)?
>>>
>>> I built the packages on Ubuntu Xenial (16.04), installed a new qmaster
>>> with them and get the same error.
>>> (I will try to send some patches, because it was not as straight forward
>>> as with debian stretch).
>>>
>>> Currently I try to build the packages for debian 8 (jessie) to see if it
>>> works there.
>>>
>>> Greetings ...
>>>  Marco
>>>
>>>
>>>

-- 
marco.schm...@gmail.com
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to