This is very interesting.

I don't understand how your solution completely fixes things though.  What if the 
server is
restarted with caching disabled while the client still has sessions cached.  When the 
sessions
were cached by the client the session ID was not zero length so you validly cached 
them.  Yet you
run into the same problem when the server restarts.

I realize this is an unlikely scenario but it lead me to think that the problem should 
be fixed
elsewhere, namely in the OpenSSL client code.

Am I following things correctly?

    Tim

--- Jonathan Hersch <[EMAIL PROTECTED]> wrote:
> [I apologize for the length of this message in advance.]
> 
> I think I figured out what's going on with my client-side caching issue.  I
> stared at the code some more, and then did something I should have done at the
> beginning, I went back to the SSL spec.
> 
> It turns out that according to the SSL/TLS spec the server can return an empty
> session_id to tell the client the session is not resumable.  My problem
> occurred when the server had caching disabled, i.e when all sessions are not
> resumable.  Here's the relevant section (my emphasis):
> 
>     /* If the session_id_length is 0 then don't cache this entry because it's
>      * not being cached by the server and so is not resumable.  From RFC 2246,
>      * "The TLS Protocol Version 1.0", January 1999, section 7.4.1.3. "Server
>      * hello" :
>      *
>      * session_id
>      *    This is the identity of the session corresponding to this
>      *    connection. If the ClientHello.session_id was non-empty, the server
>      *    will look in its session cache for a match. If a match is found and
>      *    the server is willing to establish the new connection using the
>      *    specified session state, the server will respond with the same value
>      *    as was supplied by the client. This indicates a resumed session and
>      *    dictates that the parties must proceed directly to the finished
>      *    messages. Otherwise this field will contain a different value
>      *    identifying the new session. THE SERVER MAY RETURN AN EMPTY
>      *    SESSION_ID TO INDICATE THAT THE SESSION WILL NOT BE CACHED AND
>      *    THEREFORE CANNOT BE RESUMED. If a session is resumed, it must be
>      *    resumed using the same cipher suite it was originally negotiated
>      *    with.
>      */
> 
> --- Geoff Thorpe <[EMAIL PROTECTED]> wrote:
> 
> 
> > It would be easier if you explain in code terms what
> > you observe, when 
> > you observe it, and how you're managing things on
> > your side of the API. 
> 
> 
> So the short answer is it's in the spec.  But it might be useful to go through
> the logic of the code anyway just to make sure I'm really understanding this.
> 
> First, here's the way I was calling the APIs _before_ I read about empty
> session_ids.  (I'm glossing over where mSess comes from and how multiple
> destinations are being handled, but that doesn't really matter for the
> explanation.)
> 
> 
> Before calling SSL_connect():
> 
> 
> void
> useSessionCache(SSL *ssl)
> {
>     lockme(CacheLock);
> 
>     // If there is a cached session then use it.
>     if (NULL != mSess)
>     {
>         SSL_set_session(ssl, mSess); // incs ref count for session
>     }
>     else
>     {
>           GlobalLogDebug("useSessionCache(): mSess was NULL\n");
>     }
> 
>       unlockme(CacheLock);
> }
> 
> 
> And just before calling SSL_shutdown():
> 
> 
> void
> setSessionCache(SSL *ssl)
> {
>     lockme(CacheLock);
> 
>     // Release current session if there is one
>     if (NULL != mSess)
>     {
>         GlobalLogDebug("setSessionCache(): freeing cached session %p\n",
>                        mSess);
>         SSL_SESSION_free(mSess);
>     }
> 
>     // Use "get1" version of call so increments the ref count for the session
>     // so it doesn't disappear on us.
>     mSess = SSL_get1_session(ssl);
>     GlobalLogDebug("setSessionCache(): cached session %p\n",
>                    mSess);
>     unlockme(CacheLock);
> }
> 
> 
> So I believe this does the ref counting correctly.  The key here is that every
> connection always has its session extracted and cached away.  When a new
> connection is started (to the same destination) the entry in the cache is
> pulled out and set into that new connection.
> 
> Now looking at the s3_clnt.c and s3_srvr.c code.  For the very first connection
> to the server a new session is created, and the session_id sent to the server
> has a length of 0.
> 
> - ssl3_client_hello()
>       if (s->state == SSL3_ST_CW_CLNT_HELLO_A)
>               {
>               if ((s->session == NULL) ||
>                       (s->session->ssl_version != s->version) ||
>                       (s->session->not_resumable))
>                       {
>                       if (!ssl_get_new_session(s,0))
>                               goto err;
>                       }
>               /* else use the pre-loaded session */
> 
>               ...
> 
>               /* Session ID */
>               if (s->new_session)
>                       i=0;
>               else
>                       i=s->session->session_id_length;
>               *(p++)=i;
> 
>               ...
>        }
> 
> - In ssl3_get_client_hello() on the server side a new session is created, and a
>   session_id is generated for it (the "1" to ssl_get_new_session() causes
> this):
> 
>     ...
> 
>       /* get the session-id */
>       j= *(p++);
> 
>       s->hit=0;
>       if (j == 0)
>               {
>               if (!ssl_get_new_session(s,1))
>                       goto err;
>               }
>       else
>               {
>               i=ssl_get_prev_session(s,p,j);
>               if (i == 1)
>                       { /* previous session */
>                       s->hit=1;
>                       }
>               else if (i == -1)
>                       goto err;
>               else /* i == 0 */
>                       {
>                       if (!ssl_get_new_session(s,1))
>                               goto err;
>                       }
>               }
> 
>     ...
> 
> - The server then goes to ssl3_send_server_hello() where this code for handling
>   the session_id is:
> 
>         ...
> 
>               /* now in theory we have 3 options to sending back the
>                * session id.  If it is a re-use, we send back the
>                * old session-id, if it is a new session, we send
>                * back the new session-id or we send back a 0 length
>                * session-id if we want it to be single use.
>                * Currently I will not implement the '0' length session-id
>                * 12-Jan-98 - I'll now support the '0' length stuff.
>                */
>               if (!(s->ctx->session_cache_mode & SSL_SESS_CACHE_SERVER))
>                       s->session->session_id_length=0;
> 
>               sl=s->session->session_id_length;
>               if (sl > sizeof s->session->session_id)
>                       {
>                       SSLerr(SSL_F_SSL3_SEND_SERVER_HELLO, SSL_R_INTERNAL_ERROR);
>                       return -1;
>                       }
>               *(p++)=sl;
>               memcpy(p,s->session->session_id,sl);
>               p+=sl;
> 
>         ...
> 
>   In the problem I was having SSL_SESS_CACHE_SERVER is not set (caching is
>   disabled), so the session_id_length returned is 0, this is not a resumable
>   session.
> 
> - On the client in ssl3_get_server_hello() the session_id is handled like this:
> 
>      ...
> 
>       /* get the session-id */
>       j= *(p++);
> 
>        if(j > sizeof s->session->session_id)
>                {
>                al=SSL_AD_ILLEGAL_PARAMETER;
>                SSLerr(SSL_F_SSL3_GET_SERVER_HELLO,
>                       SSL_R_SSL3_SESSION_ID_TOO_LONG);
>                goto f_err;
>                }
> 
>       if ((j != 0) && (j != SSL3_SESSION_ID_SIZE))
>               {
>               /* SSLref returns 16 :-( */
>               if (j < SSL2_SSL_SESSION_ID_LENGTH)
>                       {
>                       al=SSL_AD_ILLEGAL_PARAMETER;
>                       
>SSLerr(SSL_F_SSL3_GET_SERVER_HELLO,SSL_R_SSL3_SESSION_ID_TOO_SHORT);
>                       goto f_err;
>                       }
>               }
>       if (j != 0 && j == s->session->session_id_length
>           && memcmp(p,s->session->session_id,j) == 0)
>           {
>           if(s->sid_ctx_length != s->session->sid_ctx_length
>              || memcmp(s->session->sid_ctx,s->sid_ctx,s->sid_ctx_length))
>               {
>               al=SSL_AD_ILLEGAL_PARAMETER;
>       
> 
>SSLerr(SSL_F_SSL3_GET_SERVER_HELLO,SSL_R_ATTEMPT_TO_REUSE_SESSION_IN_DIFFERENT_CONTEXT);
>               goto f_err;
>               }
>           s->hit=1;
>           }
>       else    /* a miss or crap from the other end */
>               {
>               /* If we were trying for session-id reuse, make a new
>                * SSL_SESSION so we don't stuff up other people */
>               s->hit=0;
>               if (s->session->session_id_length > 0)
>                       {
>                       if (!ssl_get_new_session(s,0))
>                               {
>                               al=SSL_AD_INTERNAL_ERROR;
>                               goto f_err;
>                               }
>                       }
>               s->session->session_id_length=j;
>               memcpy(s->session->session_id,p,j); /* j could be 0 */
>               }
>       p+=j;
> 
>      ....
> 
>   In this case j is 0 since the server sent an empty session_id.  Therefore the
>   bottom clause is used.  And since the session_id_length is 0 a new session is
>   NOT generated.  The assumption is that if the session_id_length is 0 then the
>   session is a new one and can be used at will.  And in fact right away j is
>   written back into the session.
> 
> - Going back up to the top of s3_clnt.c, the next thing on the client side that
>   happens is that a decision is made to jump to the end of the negotiation if
>   we're reusing a session (s->hit is 1), or continue to reading the server
>   cert.  Since we think we're using a new session s->hit is 0 and we read the
>   cert.
> 
>         ...
> 
>               case SSL3_ST_CR_SRVR_HELLO_A:
>               case SSL3_ST_CR_SRVR_HELLO_B:
>                       ret=ssl3_get_server_hello(s);
>                       if (ret <= 0) goto end;
>                       if (s->hit)
>                               s->state=SSL3_ST_CR_FINISHED_A;
>                       else
>                               s->state=SSL3_ST_CR_CERT_A;
>                       s->init_num=0;
>                       break;
> 
>         ...
> 
> - ssl3_get_server_certificate() is now called and it does a lot of stuff, but
>   the one thing that's most important in terms of the crash is that it
>   allocates a new sess_cert and sets it into the session after freeing the old
>   sess_cert.
> 
>     ...
> 
>       sc=ssl_sess_cert_new();
>       if (sc == NULL) goto err;
> 
>       if (s->session->sess_cert) ssl_sess_cert_free(s->session->sess_cert);
>       s->session->sess_cert=sc;
> 
>     ...
> 
> - Later on the public key is pulled from the sess_cert to use for encrypting a
>   secret to send to the server.  That's the point at which my stuff crashed.
> 
> 
> Now the reason (if it's not obvious) that the code crashed is because when the
> client finishes using this session my code caches it away.  And the next
> connection to the same server gets the same session back (ref count
> incremented).  However, because the session_id_length is always 0, since the
> server always returns a 0 length when caching is disabled, the cached session
> appears as if it's a newly allocated session.  So at the point at which the
> code decides to get the server cert or jump to the end of negotiation, it
> always gets the server cert.  This causes a new sess_cert to be allocated and
> put into the session.  Except, this session is being used by multiple
> connections at once since it came from the cache.  So some poor connection is
> left holding a pointer into the sess_cert which is no longer valid, and we
> crash.  I actually saw this exact situation in gdb while digging through the
> data structures.
> 
> The fix for this problem is easy, change setSessionCache() function to do:
> 
> 
> void
> setSessionCache(SSL *ssl)
> {
>     lockme(CacheLock);
> 
>     // Release current session if there is one
>     if (NULL != mSess)
>     {
>         GlobalLogDebug("setSessionCache(): freeing cached session %p\n",
>                        mSess);
>         SSL_SESSION_free(mSess);
>     }
> 
>     // Use "get1" version of call so increments the ref count for the session
>     // so it doesn't disappear on us.
>     mSess = SSL_get1_session(ssl);
> 
>     if (0 == mSess->session_id_length)
>     {
>         SSL_SESSION_free(mSess);
>         mSess = NULL;
>     }
>     else
>     {
>         GlobalLogDebug("setSessionCache(): cached session %p\n",
>                        mSess);
>     }
> 
>     unlockme(CacheLock);
> }
> 
> 
> I feel safe making this check since the spec says that the server can send an
> empty session_id.
> 
> Thanks for bearing with me on this, and for your help.
> 
> - Jonathan
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Faith Hill - Exclusive Performances, Videos & More
> http://faith.yahoo.com
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> User Support Mailing List                    [EMAIL PROTECTED]
> Automated List Manager                           [EMAIL PROTECTED]


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to