I am a little closer now. I added some debugging information in client_protocol.c (~line 338):
Debug("Receive counter challenge from server\n"); /* proposition S3 */ memset(in,0,CF_BUFSIZE); encrypted_len = ReceiveTransaction(conn->sd,in,NULL); if (encrypted_len < 0) { CfOut(cf_error,"","Protocol transaction sent illegal cipher length"); return false; } if ((decrypted_cchall = malloc(encrypted_len)) == NULL) { snprintf(MATT_MESS,CF_BUFSIZE,"memory failure TWO:encrypted_len:%d",encrypted_len); FatalError(MATT_MESS); } cf-agent dies with FatalError: Fatal cfengine error: memory failure TWO:encrypted_len:0 It appears that the encrypted_len is indeed zero on the challenge response to the policy host. On AIX, that will result in a NULL malloc - which in turn fatals with a memory error in cf-agent. From the timestamps, the client who fails first, then cf-serverd on the policy host core dumps two seconds later. I don't know enough about the SSL communication between client and host, so I need a little help here. Is it possible that a encrypted length can be zero? On Dec 7, 2009, at 8:55 AM, Mark Burgess wrote: > > Perhaps you have access to some fancy tools, like purify, insight etc > that might help debug this. It sounds like some kind of heap corruption. > > M > > Matt Richards wrote: >> Well, I hate to say this, but I am still having this problem (svn 657 >> now). However, I am getting closer. When cf-serverd core dumps, I get >> a corresponding "Fatal cfengine error: memory failure" on the client. >> I am not sure which one dies first, but I am guessing the client >> (cf-agent). I don't understand why it would get a memory failure, the >> code is just doing a regular malloc, and the machines (random, never >> the same one twice) in question have plenty of memory. I will dig >> (pulling my soxs up) more, but it is just odd. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine