Am 2021-04-06 um 19:28 schrieb Greg Hudson: > On 4/6/21 11:48 AM, Osipov, Michael (LDA IT PLM) wrote: >> gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor >> code may provide more information, Minor (100001): Failed to store >> credentials: Internal credentials cache error (filename: /tmp/krb5cc_1000) > > This is not expected, and bears investigation. It suggests an EINVAL, > EEXIST, EFAULT, EBADF, or EWOULDBLOCK error from one of the I/O > operations performed by fcc_store(), none of which are expected. If > you're building libkrb5, you could try modifying interpret_error() to > pass those error codes through in order to find out which one is happening. > > Getting multiple cache entries for a service is normal when multiple > threads or processes initiate contexts to the same (new) service within > a short window. >
Hi Greg, so I was able to properly compile and install 1.19.1 in the GitLab Runner and verified that py-gssapi picks it up from LD_LIBRARY_PATH. Unfortunately, 1.19.1 still suffers from the same problem as 1.17. I tried to narrow it down with strace, but that changes the runtime behavior of the application and the error disappears. I did patch the fcc_store() funtion: > $ git diff > diff --git a/src/lib/krb5/ccache/cc_file.c b/src/lib/krb5/ccache/cc_file.c > index 9a9b45a6e..7f604c0f4 100644 > --- a/src/lib/krb5/ccache/cc_file.c > +++ b/src/lib/krb5/ccache/cc_file.c > @@ -1000,8 +1000,9 @@ fcc_store(krb5_context context, krb5_ccache id, > krb5_creds *creds) > if (ret) > goto cleanup; > nwritten = write(fileno(fp), buf.data, buf.len); > - if (nwritten == -1) > + if (nwritten == -1) { > ret = interpret_errno(context, errno); > + printf("errno: %d, ret: %d\n", errno, ret); } > if ((size_t)nwritten != buf.len) > ret = KRB5_CC_IO; but the output did not appear. Then I patched the interpret_errno() dirctly for the internal error: > @@ -1293,6 +1294,7 @@ interpret_errno(krb5_context context, int errnum) > case EWOULDBLOCK: > #endif > ret = KRB5_FCC_INTERNAL; > + printf("errnum: %d, ret: %d\n", errnum, ret); > break; > /* > * The rest all map to KRB5_CC_IO. These errnos are listed to I had exactly one faiure in the job and received exactly this: > errnum: 17, ret: -1765328188 which maps to EEXIST I am quite sure that this is a race condition where stat() is performed, file does not exist, open() with write is performed, in parallel it is already created and the later call returns in EEXIST. I assumed it to be fcc_initialize() and added a printf(): > fcc_initialize() > errnum: 17, ret: -1765328188 > fcc_initialize() > errnum: 17, ret: -1765328188 What now? Michael ________________________________________________ Kerberos mailing list Kerberos@mit.edu https://mailman.mit.edu/mailman/listinfo/kerberos