Am 2021-04-09 um 20:24 schrieb Greg Hudson: > On 4/9/21 11:35 AM, Osipov, Michael (LDA IT PLM) wrote: >> I am quite sure that this is a race condition where stat() is performed, >> file does not exist, open() with write is performed, in parallel it is >> already created and the later call returns in EEXIST. > > I agree, except I think it's just unlink() and open(O_CREAT|O_EXCL) > calls with no stat(). I had erroneously assumed that the unexpected > error was happening inside fcc_store() because of "Failed to store > credentials" in the message, but that string turns out to be from > get_in_tkt.c in a block of code that also calls krb5_cc_initialize(). > > The fcc_initialize() EEXIST self-race has existed since 1.0. I'd > speculate that the original developers' assumption was that lots of > processes might be competing to use a file ccache, but that creating > ccaches would be a rare and one-at-a-time affair (happening at login or > when a user runs "kinit"). With client keytab support, that is no > longer the case; it's easy to have multiple threads or processes > competing to create or refresh a cache as part of gss_acquire_cred() or > gss_init_sec_context(). > > Just fixing the fcc_initialize() race wouldn't really solve the problem; > there would still be a window between krb5_cc_initialize() and > krb5_cc_store_cred() where other threads (or processes) would see an > initialized cache with no TGT in it, and would fail the > gss_init_sec_context() call.
Re-reading the code and your analysis, I agree that it won't work w/o external synchronization. > This ticket describes that problem and > some possible solutions: > > https://krbdev.mit.edu/rt/Ticket/Display.html?id=7707 > > Heimdal has implemented option 5. I'm not wild about it and it won't > work with other ccache types, but it's a working stopgap and it can > always be backed out in favor of a different solution later. While I don't understand all of them, option 2 seems to be the most obvious (idiotproof) solution for the FILE cache, isn't it? I can't tell for the ccache formats. So for now, the only workaorounds I see are: 1. Initiate the cache in the main thread and then spawn worker threads. For long running apps (10 h+) refresh cache although there is no 'kinit -R' in GSS-API. 2. Use a per-thread cache to avoid race conditions: > spnego = gssapi.OID.from_int_seq("1.3.6.1.5.5.2") > if keytab_location: > store = {} > store[b"client_keytab"] = > keytab_location.encode(sys.getdefaultencoding()) > store[b"ccache"] = ("/tmp/krb5cc_%d_%s" % (os.getpid(), > threading.get_ident())).encode(sys.getdefaultencoding()) > creds = (gssapi.raw.acquire_cred_from(store=store, mechs=[spnego], > usage="initiate")).creds It'd be nice if this limitation would be documented here: https://web.mit.edu/kerberos/krb5-1.19/doc/basic/ccache_def.html Could have spared me quite some time. Regards, Michael ________________________________________________ Kerberos mailing list Kerberos@mit.edu https://mailman.mit.edu/mailman/listinfo/kerberos