Hello there,

Matthew and I spent a non-trivial amount of time trying to reproduce
this bug.  Let me tell you what we did, what we found, and then maybe
you can help us progress.

First of all, it is really important that we find a reproducer so that
we can: (1) figure out what exactly is going on, (2) possibly find a
patch or write a fix for the problem, and (3) drive the SRU process to
its completion and get the fix release for all affected Ubuntu versions.

Having said that, here is what I did (and, to the best of my knowledge,
what Matthew also tried):

1) I installed Windows Server 2019 in a VM.  I configured Active
Directory, DHCP and DNS.  I also configured the Certificate Authority in
it.

Bear in mind that this is a test environment, so the configuration I did
was basic.  For AD, I chose to "Add a new forest" using "Windows Server
2016" as the functional level, and with DNS and Global Catalog
capabilities actived.  For the Certificate Authority, I chose to
generate a root certificate (I don't remember exactly the other options,
but the root certificate is important).

2) I created an LXD container running Ubuntu Focal that shared the same
network bridge as the Windows Server 2019 VM.  Promptly, the container
acquired an IP from the DHCP I had configured in the Windows VM.

3) I joined the AD realm using "realm join win-ad-example.adtest.local".
sssd and other dependencies were automatically installed, and the
process finished successfully.

4) I then went to the Windows VM, opened certsrv, found the certificate
for the machine and exported it.  I copied the certificate into the LXD
container, put it into the right place (/usr/local/share/ca-
certificates/) and ran update-ca-certificates.  I noticed that the
command added 1 certificate to the chain.

5) I edited /etc/sssd/sssd.conf and added the following options to the
domain section:

ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
ad_use_ldaps = True
debug_level = 4

6) I then restarted sssd.  And I noticed the error manifesting!  At that
point, I thought I had reproduced the bug (keep reading, though).  I
started to investigate what could be happening.

7) After several minutes thinking I was on the right track, I decided to
try and run the "ldapsearch" command provided above:

$ ldapsearch -x -Z -v -H ldaps://win-ad-example.adtest.local:636

And to my surprise, I noticed that the command could not connect to the
Windows server.  I then started debugging things, and quickly found that
the problem was that the TLS certificate from the server could not be
validated.  Something was wrong...

8) I went back to the Windows VM and poked around certsrv.  I noticed
that I had exported the certificate for the machine, but not the root
certificate.  I decided to give it a try.

9) After having imported the root certificate into the LXD container, I
restarted sssd and, much to my surprise, everything worked out of the
box (using the sssd package from the archive).  Back to square one...


As you can see, I have a working AD DC and I can successfully connect to it 
using a regular Focal container (I also tried with a Focal VM, with the same 
results).  I spent a few more hours trying to tweak some things here and there 
to see if I could make the bug manifest, to no avail.

For this reason, I decided to come and ask for more information from
you.  It would be great if you could tell me if there's anything you can
think of that might trigger this problem.  Something related to the way
your AD DC is configured, perhaps?

Here is the sssd.conf I'm using:

# cat /etc/sssd/sssd.conf 
[sssd]
domains = adtest.local
config_file_version = 2
services = nss, pam

[domain/adtest.local]
default_shell = /bin/bash
krb5_store_password_if_offline = True
cache_credentials = True
krb5_realm = ADTEST.LOCAL
realmd_tags = manages-system joined-with-adcli 
id_provider = ad
ldap_sasl_authid = SSSD-BUG1921494$
fallback_homedir = /home/%u@%d
ad_domain = adtest.local
use_fully_qualified_names = True
ldap_id_mapping = True
access_provider = ad
ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
ad_use_ldaps = True
debug_level = 4
ldap_library_debug_level = -1


Note that I haven't configured Kerberos authentication in my example, but I 
don't that should matter much.

It would be great if you could do a few things:

a) As I mentioned above, let us know if there is any peculiarity in your
AD DC configuration that might impact this.

b) If possible, set up an Ubuntu Impish system (which has just been
released) and try to reproduce the problem there.  Impish ships a newer
version of sssd and also OpenLDAP, which might influence here.

c) Can you confirm whether you can always trigger this problem, or does
this just happen sporadically?

d) Can you confirm whether you have imported the root certificate for
your AD DC server into the client as well (assuming this applies to your
scenario, of course)?


I think this is all I have for now.  Matthew, feel free to complement the info 
from this comment and also to expand on the questions if you have any.

Thanks in advance.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921494

Title:
  ldap_install_tls occasionally fails due to watchdog timeout when using
  ad_use_ldaps with tls

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1921494/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to