Actually, no change happened from 300-> 600 timeout, the web portal itself gave me an ISE I hadn't noticed when I tried clicking save!
Alfred On Mon, Jun 7, 2021 at 3:57 PM Alfred Victor <alvic...@gmail.com> wrote: > Hi FreeIPA list, > > I don't see any in error log that match `grep -i "err=3" > /var/log/httpd/error_log`. We have tried raising searchtimelimit as high > as 120, then 300 (now are trying 600) but observed no difference in the > rate at which nodes succeeded or failed in IPA joins. We are somewhat > puzzled by this, as none of the other values we are aware of might have > changed, though it is possible that the IPA systems are under a little > higher demand from client systems, we have tried to mitigate this by > shutting down some workflows and aren't sure whether we've seen any > improvement. Short of adjusting apache/resource/process timeouts it is > difficult to say what might be wrong. To give an example, out of 250 nodes > rebooted, only 112 joined IPA successfully. Here is some output from the > error log, following what this looks like in the ipa client install log > (error log output will match the node attempt): > > > 2021-06-07T18:25:30Z DEBUG The ipa-client-install command failed, exception: > NetworkError: cannot connect to 'https://redactednode.com/ipa/json > <https://hauth0004.dug.com/ipa/json>': Internal Server Error > > [Mon Jun 07 13:25:06.198259 2021] [core:error] [pid 25020] [client > 10.1.24.48:47808] Script timed out before returning headers: wsgi.py, > referer: https://redacted.redacted.com/ipa/xml > <https://hauth0004.dug.com/ipa/xml> > > Different node, same time period: > > > [Mon Jun 07 13:24:02.178092 2021] [:error] [pid 25725] ipa: INFO: [xmlserver] > mach_j...@redacted.com: join(u'redacted.node.com', > nshardwareplatform=u'x86_64', > nsosversion=u'3.10.0-1062.18.1.1.el7.redacted.x86_64', version=u'2.51'): > TimeLimitExceeded > > I also saw this: > > > [Mon Jun 07 13:25:07.103503 2021] [:error] [pid 25725] ipa: ERROR: > non-public: IOError: request data read error > [Mon Jun 07 13:25:07.103529 2021] [:error] [pid 25725] Traceback (most recent > call last): > [Mon Jun 07 13:25:07.103536 2021] [:error] [pid 25725] File > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 360, in > wsgi_execute > [Mon Jun 07 13:25:07.103542 2021] [:error] [pid 25725] data = > read_input(environ) > [Mon Jun 07 13:25:07.103548 2021] [:error] [pid 25725] File > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 200, in > read_input > [Mon Jun 07 13:25:07.103553 2021] [:error] [pid 25725] return > environ['wsgi.input'].read(length).decode('utf-8') > [Mon Jun 07 13:25:07.103559 2021] [:error] [pid 25725] IOError: request data > read error > [Mon Jun 07 13:25:07.103826 2021] [:error] [pid 25725] ipa: INFO: [xmlserver] > mach_j...@redacted.com: None: InternalError > [Mon Jun 07 13:25:07.149962 2021] [:error] [pid 25726] ipa: ERROR: > non-public: IOError: request data read error > [Mon Jun 07 13:25:07.149984 2021] [:error] [pid 25726] Traceback (most recent > call last): > [Mon Jun 07 13:25:07.149991 2021] [:error] [pid 25726] File > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 360, in > wsgi_execute > [Mon Jun 07 13:25:07.149997 2021] [:error] [pid 25726] data = > read_input(environ) > [Mon Jun 07 13:25:07.150002 2021] [:error] [pid 25726] File > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line 200, in > read_input > [Mon Jun 07 13:25:07.150008 2021] [:error] [pid 25726] return > environ['wsgi.input'].read(length).decode('utf-8') > [Mon Jun 07 13:25:07.150013 2021] [:error] [pid 25726] IOError: request data > read error > > After setting the timeout to 600 and rebooting the remaining 139 nodes > from the initial set of 250, 83 joined of the 139 and we still had ISE > occurring. In some cases, it would ISE on the first attempt, try another > IPA system, and succeed. I'm not sure that even such a long timeout as 600 > has helped. > > Alfred > > > > > On Thu, Jun 3, 2021 at 7:51 PM Rob Crittenden <rcrit...@redhat.com> wrote: > >> Alfred Victor via FreeIPA-users wrote: >> > Hi FreeIPA list, >> > >> > We are having an issue with our IPA environment of 4 replicated FreeIPA >> > systems serving linux compute clients which join from a command in >> > rc.local after boot. This worked in the past, but the system has been >> > rebuilt since and the join command changed slightly. Unfortunately >> > booting a few dozen nodes at a time, though they each talk to a >> > different IPA system by design, leads to problems such as these - though >> > 40-100 nodes can boot ok at a time there are always many stragglers, and >> > the more we attempt to boot at once the more fail to join IPA (if we try >> > to boot 500 nodes, we are lucky if we get a fifth of that joining IPA). >> > Can you please advise on this output? Here is our join command in >> > compute node rc.local: >> > >> > >> > >> > ipa-client-install -U -q -p mach_join \ >> > -w <redacted> \ >> > --force-join \ >> > --no-dns-sshfp \ >> > --automount-location=redacted-node >> > >> > >> > >> > >> > And here is some log output of the 500 error: >> > >> > ProtocolError: <ProtocolError for redacted.redacted.com/ipa/json < >> http://redacted.redacted.com/ipa/json>: 500 Internal Server Error> >> > Cannot connect to the server due to generic error: cannot connect >> to 'https://redacted.redacted.com/ipa/json < >> https://hauth0003.dug.com/ipa/json>': Internal Server Error >> > >> > >> > As well as: >> > >> > 2021-06-02T21:39:11Z DEBUG Starting external process >> > 2021-06-02T21:39:11Z DEBUG args=/usr/sbin/ipa-join -s >> > redacted.redacted.com <http://redacted.redacted.com> -b >> > dc=redacted,dc=com -h redactednode.redacted.com >> > <http://redactednode.redacted.com> -f >> > 2021-06-02T21:40:13Z DEBUG Process finished, return code=17 >> > 2021-06-02T21:40:13Z DEBUG stdout= >> > 2021-06-02T21:40:13Z DEBUG stderr=HTTP response code is 500, not 200 >> > 2021-06-02T21:40:13Z ERROR Joining realm failed: HTTP response code >> > is 500, not 200 >> > >> > And we also see timeouts happen: >> > >> > >> > 2021-06-02T22:08:50Z DEBUG args=/usr/sbin/ipa-join -s >> > redacted.redacted.com <http://redacted.redacted.com> -b >> > dc=redacted,dc=com -h redactednode.redacted.com >> > <http://redactednode.redacted.com> -f >> > 2021-06-02T22:09:01Z DEBUG Process finished, return code=17 >> > 2021-06-02T22:09:01Z DEBUG stdout= >> > 2021-06-02T22:09:01Z DEBUG stderr=RPC failed at server. Configured >> > time limit exceeded >> > 2021-06-02T22:09:01Z ERROR Joining realm failed: RPC failed at >> > server. Configured time limit exceeded >> > >> > >> > And we also see later timeouts near the end of the log in some cases >> though are able to authenticate and it didn't back out the install, but >> never got going healthy either: >> > >> > >> > >> > 2021-06-03T19:20:13Z DEBUG The ipa-client-install command failed, >> > exception: TimeLimitExceeded: Configured time limit exceeded >> >> When you see Internal Error look to the Apache error log on the server >> for more information. >> >> In this case an LDAP search is failing because the server is too busy. >> Look for queries failing with err=3 to get an idea of how long it is >> taking. >> >> To increase the timeout use: ipa config-mod --searchtimelimit=INT >> >> The default is 2 seconds. >> >> You can pick a time at random but could see failures again. >> >> rob >> >>
_______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure