Actually, no change happened from 300-> 600 timeout, the web portal itself
gave me an ISE I hadn't noticed when I tried clicking save!

Alfred

On Mon, Jun 7, 2021 at 3:57 PM Alfred Victor <alvic...@gmail.com> wrote:

> Hi FreeIPA list,
>
> I don't see any in error log that match `grep -i "err=3"
> /var/log/httpd/error_log`. We have tried raising searchtimelimit as high
> as 120, then 300 (now are trying 600) but observed no difference in the
> rate at which nodes succeeded or failed in IPA joins. We are somewhat
> puzzled by this, as none of the other values we are aware of might have
> changed, though it is possible that the IPA systems are under a little
> higher demand from client systems, we have tried to mitigate this by
> shutting down some workflows and aren't sure whether we've seen any
> improvement. Short of adjusting apache/resource/process timeouts it is
> difficult to say what might be wrong. To give an example, out of 250 nodes
> rebooted, only 112 joined IPA successfully. Here is some output from the
> error log, following what this looks like in the ipa client install log
> (error log output will match the node attempt):
>
>
> 2021-06-07T18:25:30Z DEBUG The ipa-client-install command failed, exception: 
> NetworkError: cannot connect to 'https://redactednode.com/ipa/json 
> <https://hauth0004.dug.com/ipa/json>': Internal Server Error
>
> [Mon Jun 07 13:25:06.198259 2021] [core:error] [pid 25020] [client 
> 10.1.24.48:47808] Script timed out before returning headers: wsgi.py, 
> referer: https://redacted.redacted.com/ipa/xml 
> <https://hauth0004.dug.com/ipa/xml>
>
> Different node, same time period:
>
>
> [Mon Jun 07 13:24:02.178092 2021] [:error] [pid 25725] ipa: INFO: [xmlserver] 
> mach_j...@redacted.com: join(u'redacted.node.com', 
> nshardwareplatform=u'x86_64', 
> nsosversion=u'3.10.0-1062.18.1.1.el7.redacted.x86_64', version=u'2.51'): 
> TimeLimitExceeded
>
> I also saw this:
>
>
> [Mon Jun 07 13:25:07.103503 2021] [:error] [pid 25725] ipa: ERROR: 
> non-public: IOError: request data read error
> [Mon Jun 07 13:25:07.103529 2021] [:error] [pid 25725] Traceback (most recent 
> call last):
> [Mon Jun 07 13:25:07.103536 2021] [:error] [pid 25725]   File 
> "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 360, in 
> wsgi_execute
> [Mon Jun 07 13:25:07.103542 2021] [:error] [pid 25725]     data = 
> read_input(environ)
> [Mon Jun 07 13:25:07.103548 2021] [:error] [pid 25725]   File 
> "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 200, in 
> read_input
> [Mon Jun 07 13:25:07.103553 2021] [:error] [pid 25725]     return 
> environ['wsgi.input'].read(length).decode('utf-8')
> [Mon Jun 07 13:25:07.103559 2021] [:error] [pid 25725] IOError: request data 
> read error
> [Mon Jun 07 13:25:07.103826 2021] [:error] [pid 25725] ipa: INFO: [xmlserver] 
> mach_j...@redacted.com: None: InternalError
> [Mon Jun 07 13:25:07.149962 2021] [:error] [pid 25726] ipa: ERROR: 
> non-public: IOError: request data read error
> [Mon Jun 07 13:25:07.149984 2021] [:error] [pid 25726] Traceback (most recent 
> call last):
> [Mon Jun 07 13:25:07.149991 2021] [:error] [pid 25726]   File 
> "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 360, in 
> wsgi_execute
> [Mon Jun 07 13:25:07.149997 2021] [:error] [pid 25726]     data = 
> read_input(environ)
> [Mon Jun 07 13:25:07.150002 2021] [:error] [pid 25726]   File 
> "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 200, in 
> read_input
> [Mon Jun 07 13:25:07.150008 2021] [:error] [pid 25726]     return 
> environ['wsgi.input'].read(length).decode('utf-8')
> [Mon Jun 07 13:25:07.150013 2021] [:error] [pid 25726] IOError: request data 
> read error
>
> After setting the timeout to 600 and rebooting the remaining 139 nodes
> from the initial set of 250, 83 joined of the 139 and we still had ISE
> occurring. In some cases, it would ISE on the first attempt, try another
> IPA system, and succeed. I'm not sure that even such a long timeout as 600
> has helped.
>
> Alfred
>
>
>
>
> On Thu, Jun 3, 2021 at 7:51 PM Rob Crittenden <rcrit...@redhat.com> wrote:
>
>> Alfred Victor via FreeIPA-users wrote:
>> > Hi FreeIPA list,
>> >
>> > We are having an issue with our IPA environment of 4 replicated FreeIPA
>> > systems serving linux compute clients which join from a command in
>> > rc.local after boot. This worked in the past, but the system has been
>> > rebuilt since and the join command changed slightly. Unfortunately
>> > booting a few dozen nodes at a time, though they each talk to a
>> > different IPA system by design, leads to problems such as these - though
>> > 40-100 nodes can boot ok at a time there are always many stragglers, and
>> > the more we attempt to boot at once the more fail to join IPA (if we try
>> > to boot 500 nodes, we are lucky if we get a fifth of that joining IPA).
>> > Can you please advise on this output? Here is our join command in
>> > compute node rc.local:
>> >
>> >
>> >
>> >     ipa-client-install -U -q -p mach_join \
>> >     -w <redacted> \
>> >     --force-join \
>> >     --no-dns-sshfp \
>> >     --automount-location=redacted-node
>> >
>> >
>> >
>> >
>> > And here is some log output of the 500 error:
>> >
>> >     ProtocolError: <ProtocolError for redacted.redacted.com/ipa/json <
>> http://redacted.redacted.com/ipa/json>: 500 Internal Server Error>
>> >     Cannot connect to the server due to generic error: cannot connect
>> to 'https://redacted.redacted.com/ipa/json <
>> https://hauth0003.dug.com/ipa/json>': Internal Server Error
>> >
>> >
>> > As well as:
>> >
>> >     2021-06-02T21:39:11Z DEBUG Starting external process
>> >     2021-06-02T21:39:11Z DEBUG args=/usr/sbin/ipa-join -s
>> >     redacted.redacted.com <http://redacted.redacted.com> -b
>> >     dc=redacted,dc=com -h redactednode.redacted.com
>> >     <http://redactednode.redacted.com> -f
>> >     2021-06-02T21:40:13Z DEBUG Process finished, return code=17
>> >     2021-06-02T21:40:13Z DEBUG stdout=
>> >     2021-06-02T21:40:13Z DEBUG stderr=HTTP response code is 500, not 200
>> >     2021-06-02T21:40:13Z ERROR Joining realm failed: HTTP response code
>> >     is 500, not 200
>> >
>> > And we also see timeouts happen:
>> >
>> >
>> >     2021-06-02T22:08:50Z DEBUG args=/usr/sbin/ipa-join -s
>> >     redacted.redacted.com <http://redacted.redacted.com> -b
>> >     dc=redacted,dc=com -h redactednode.redacted.com
>> >     <http://redactednode.redacted.com> -f
>> >     2021-06-02T22:09:01Z DEBUG Process finished, return code=17
>> >     2021-06-02T22:09:01Z DEBUG stdout=
>> >     2021-06-02T22:09:01Z DEBUG stderr=RPC failed at server. Configured
>> >     time limit exceeded
>> >     2021-06-02T22:09:01Z ERROR Joining realm failed: RPC failed at
>> >     server. Configured time limit exceeded
>> >
>> >
>> > And we also see later timeouts near the end of the log in some cases
>> though are able to authenticate and it didn't back out the install, but
>> never got going healthy either:
>> >
>> >
>> >
>> >     2021-06-03T19:20:13Z DEBUG The ipa-client-install command failed,
>> >     exception: TimeLimitExceeded: Configured time limit exceeded
>>
>> When you see Internal Error look to the Apache error log on the server
>> for more information.
>>
>> In this case an LDAP search is failing because the server is too busy.
>> Look for queries failing with err=3 to get an idea of how long it is
>> taking.
>>
>> To increase the timeout use: ipa config-mod --searchtimelimit=INT
>>
>> The default is 2 seconds.
>>
>> You can pick a time at random but could see failures again.
>>
>> rob
>>
>>
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to