On Пан, 07 жні 2023, Harry G Coin via FreeIPA-users wrote:
This January, Mark Potter first experienced what is now killing my new
freeipa setup as well. Once the primary server on which dns records
are being programatically added sees named-pcks11 putting "writeback
to ldap failed" in the log: 'systemctl' reports nothing amiss yet
ns-slapd becomes entirely unresponsive and all that depend upon dirsrv
hangs/times out.
There is nothing in any of the log entries detailing failure, even
with ample memory and storage space and processor resources. There
are perhaps 10 hosts in the little lab sandbox, while the system does
service DNS requests from the public internet.
I looked through the 'changelogs' and could find nothing that seems to
address this, is there a 'work-around' known? I'd like to avoid using
'bleeding edge' versions of freeipa.
The details of Mark's discussion are here:
https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/thread/WZKFEUAYYABTF5BIFQTNGPIYEJTXXJPY/
What's common to his and my case is that the problem arises during an
'initial programmatic load' of many DNS records in a few dozen zones
(often ~200 / in a *.arpa domain), even when slowed to not more than
one every 8 seconds. But only when at least one replication agreement
exists. It's possible the fact that dnssec is enabled and so 'adding
a record' generates further signature set computing activity either
provides ldap/bind9 server load that exposes the
race-condition-feeling problem or (less likely) is directly involved
in the cause.
I agree with Mark's quote: "Are we going to see everything fail in a
spectacular manner and is there anything I can do to mitigate the
failure with adding DNS entries as I still need to complete the
addition and have ~5k per zone left for two zones." He 'worked
around' the problem by adding a record at a time and slowly (without
knowing what 'slowly' means in seconds). I'm unable to guess what
'slowly' means in my setup, but it appears to be more than 8 seconds /
update, so fails for me.
Maybe some kind of 'flow control' is required, or a some way to
determine whether all replication activity and triggered further
'signature computation' dns activity has concluded after writing a new
dns record?
The only 'way out' is to restart ipa. Of some interest, when that is
attempted after these failures, ns-slapd uses 100% CPU, dirsrv hang
for several minutes, often longer than the systemctl shutdown / reboot
process allows before forcibly killing the process. This is 100%
repeatable. After a few dozen records are loaded, ns-slapd just stops
responding. The program adding the records is using the python ldap2
interface native on the local host.
If ns-slapd uses 100% CPU, you might be able to get the stack trace out
of those processes. Install debuginfo packages and try to get a pstack.
This should help to narrow it down:
# pstack $(pgrep ns-slapd) > ns-slapd log
pstack is part of gdb package.
There are few bugs in 389-ds that were fixed relatively recently (not in
production packages yet) which had been preventing it to respond on
incoming LDAP connections.
Some further details:
Aug 06 15:58:58 registry1.1.somedomain.com named-pkcs11[1497]: zone
177.168.192.in-addr.arpa/IN: sending notifies (serial 1691355538)
Aug 06 15:58:58 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_show: Summary: None args:
('7.somedomain.com.', 'email19')
Aug 06 15:59:01 registry1.1.somedomain.com named-pkcs11[1497]:
checkhints: unable to get root NS rrset from cache: not found
Aug 06 15:59:06 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_add: Summary: None
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_find: Summary: None args:
('177.168.192.in-addr.arpa.',), 144 records total
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_show: Summary: None args:
('177.168.192.in-addr.arpa.', '147')
Aug 06 15:59:07 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_find: Summary: None args:
('6.0.c.0.0.0.0.0.0.0.0.1.0.0.c.f.ip6.arpa.',), 22 records total
Aug 06 15:59:07 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:15:59:07.848595593 -0500] - WARN - content-sync-plugin -
sync_update_persist_betxn_pre_op - DB retried operation targets
"changenumber=59921,cn=changelog" (op=0x7f6b6fc1f400 idx_pl=0) => op
not c>
Aug 06 15:59:16 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_add: Summary: None
Aug 06 15:59:16 registry1.1.somedomain.com myprogram.py[35998]:
rssmain: ipa command dnsrecord_show: Summary: None args:
('7.somedomain.com.', 'email19')
Aug 06 15:59:32 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (unsigned): serial (1691355547) write back to LDAP
failed
Aug 06 15:59:33 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): serial 1691355547 (unsigned 1691355547)
Aug 06 15:59:33 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): sending notifies (serial 1691355547)
Aug 06 16:01:01 registry1.1.somedomain.com CROND[52537]: (root) CMD
(run-parts /etc/cron.hourly)
Aug 06 16:01:01 registry1.1.somedomain.com run-parts[52540]:
(/etc/cron.hourly) starting 0anacron
Aug 06 16:01:01 registry1.1.somedomain.com run-parts[52546]:
(/etc/cron.hourly) finished 0anacron
Aug 06 16:02:05 registry1.1.somedomain.com sssd_kcm[19097]: Shutting
down (status = 0)
Aug 06 16:02:05 registry1.1.somedomain.com systemd[1]:
sssd-kcm.service: Succeeded.
Aug 06 16:02:20 registry1.1.somedomain.com systemd[1]:
loclibrary.mount: Succeeded.
Aug 06 16:02:20 registry1.1.somedomain.com systemd[1]:
qflibrary.mount: Succeeded.
Aug 06 16:07:37 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:07:37 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:08:34 registry1.1.somedomain.com named-pkcs11[1497]: zone
9.somedomain.com/IN (signed): Key 9.somedomain.com/RSASHA256/60467
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:08:34 registry1.1.somedomain.com named-pkcs11[1497]: zone
9.somedomain.com/IN (signed): sending notifies (serial 1691350608)
Aug 06 16:09:08 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:08 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:18 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:18 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:30 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:09:30 registry1.1.somedomain.com sssd_be[659]: GSSAPI client
step 1
Aug 06 16:11:05 registry1.1.somedomain.com sssd_be[659]: Backend is offline
Aug 06 16:11:38 registry1.1.somedomain.com named-pkcs11[1497]: zone
3.somedomain.com/IN (signed): Key 3.somedomain.com/RSASHA256/18257
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:11:38 registry1.1.somedomain.com named-pkcs11[1497]: zone
3.somedomain.com/IN (signed): sending notifies (serial 1691350754)
Aug 06 16:12:37 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): Key 7.somedomain.com/RSASHA256/40511
missing or inactive and has no replacement: retaining signatures.
Aug 06 16:12:37 registry1.1.somedomain.com named-pkcs11[1497]: zone
7.somedomain.com/IN (signed): sending notifies (serial 1691355548)
Aug 06 16:20:01 registry1.1.somedomain.com CROND[52573]: (root) CMD
(test -f /var/lock/subsys/ods-enforcerd && kill -s SIGHUP `cat
/var/run/opendnssec/enforcerd.pid` > /dev/null 2> /dev/null)
Aug 06 16:29:21 registry1.1.somedomain.com systemd[1]: Stopping 389
Directory Server 1-somedomain-COM....
Aug 06 16:29:21 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:21.659348276 -0500] - INFO - op_thread_cleanup -
slapd shutting down - signaling operation threads - op stack size 0
max work q size 600 max work q stack size 4
Aug 06 16:29:21 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:21.735442997 -0500] - INFO - slapd_daemon - slapd
shutting down - waiting for 16 threads to terminate
Aug 06 16:29:22 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:22.278851959 -0500] - ERR - NSMMReplicationPlugin -
release_replica - agmt="cn=meToregistry2.1.somedomain.com"
(registry2:389): Attempting to release replica, but unable to receive
endRe>
Aug 06 16:29:22 registry1.1.somedomain.com ns-slapd[1434]:
[06/Aug/2023:16:29:22.307800436 -0500] - WARN - NSMMReplicationPlugin
- acquire_replica - agmt="cn=meToregistry2.1.somedomain.com"
(registry2:389): Unable to receive the response for a startReplication
ext>
Aug 06 16:34:03 registry1.1.somedomain.com ods-enforcerd[2124]:
[enforcer] update zone: _domainkey.anotherdomain.com
Aug 06 16:34:04 registry1.1.somedomain.com ods-enforcerd[2124]:
[signconf_cmd] performing signconf for zone
_domainkey.anotherdomain.com
Aug 06 16:34:04 registry1.1.somedomain.com ods-enforcerd[2124]:
[signconf_cmd] signconf done for zone _domainkey.anotherdomain.com,
notifying signer
Aug 06 16:34:04 registry1.1.somedomain.com systemd[1]: Started IPA
OpenDNSSEC Signer replacement.
Aug 06 16:34:07 registry1.1.somedomain.com ipa-ods-exporter[52623]:
ipa-ods-exporter: INFO To increase debugging set debug=True in
dns.conf See default.conf(5) for details
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
Traceback (most recent call last):
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line
1085, in error_handler
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
yield
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib/python3.6/site-packages/ipapython/ipaldap.py", line
1257, in gssapi_bind
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
'', auth_tokens, server_controls, client_controls)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
476, in sasl_interactive_bind_s
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
return self._ldap_call(self._l.sasl_interactive_bind_s,who,auth,RequestControlTuples(serverctrls),RequestControlTuples(clientctrls),sasl_flags)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
340, in _ldap_call
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
reraise(exc_type, exc_value, exc_traceback)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/compat.py", line 46,
in reraise
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
raise exc_value
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
File "/usr/lib64/python3.6/site-packages/ldap/ldapobject.py", line
324, in _ldap_call
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
result = func(*args,**kwargs)
Aug 06 16:34:11 registry1.1.somedomain.com ipa-ods-exporter[52623]:
ldap.SERVER_DOWN: {'result': -1, 'desc': "Can't contact LDAP server",
'errno': 111, 'ctrls': [], 'info': 'Connection refused'}
Name : ipa-server-common
Version : 4.9.11
Release : 6.module_el8.8.0+3593+1210bde8.alma.1
Architecture : noarch
Size : 621 k
Source : ipa-4.9.11-6.module_el8.8.0+3593+1210bde8.alma.1.src.rpm
Repository : appstream
Summary : Common files used by IPA server
...
Name : ipa-server-dns
Version : 4.9.11
Release : 6.module_el8.8.0+3593+1210bde8.alma.1
Architecture : noarch
Size : 197 k
Source : ipa-4.9.11-6.module_el8.8.0+3593+1210bde8.alma.1.src.rpm
Repository : appstream
[root@registry2 ~]# uname -r
4.18.0-477.15.1.el8_8.x86_64
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue
--
/ Alexander Bokovoy
Sr. Principal Software Engineer
Security / Identity Management Engineering
Red Hat Limited, Finland
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue