I was going through failing tests today and noticed some which failed because of memory leaks using code like the following:

  RETURN_RT_STRING(s, 0);
}

For non-unicode mode this is functionally identical to:

 RETURN_STRING(s, 0);
}

However for unicode mode, this binary value is translated to unicode using (effectively) this:

 {
   UChar *u;
   int u_len;
   UErrorCode status = U_ZERO_ERROR;
   zend_convert_to_string(ZEND_U_CONVERTER(UG(runtime_encoding_conv)),
                          &u, &u_len, s, strlen(s), &status);
   RETURN_UNICODE(u, 0);
 }
}

The trouble with this, is that the original s value winds up getting leaked. I noticed a few other spots in the source where this is handled as such:

 RETVAL_RT_STRING(s, 0);
 if (UG(unicode)) {
   efree(s);
 }
 return;
}

Which is, of course, perfectly valid, but it feels a bit cludgy to me as it's a little inconsistent with the normal RETURN_STRING()/RETURN_UNICODE() semantics when it comes to "giving away" the variable. It's also a problem for the RETURN_RT_STRING(s, 0); usage in general as there's no point after the macro to free the original var.

I see two solutions:

(1) Modify the RETURN_RT_STRING(L)() macros to the following:

#define RETURN_RT_STRING(t, duplicate) \
{ RETVAL_RT_STRING(t, duplicate); if (duplicate && UG(unicode)) efree(t); return; }

(2) Modify ZVAL_U_STRINGL() to:

#define ZVAL_U_STRINGL(conv, z, s, l, duplicate, auto_free) \
   if (UG(unicode)) { \
       UErrorCode status = U_ZERO_ERROR; \
       UChar *u_str; \
       int u_len; \
       zend_convert_to_unicode(conv, &u_str, &u_len, s, l, &status); \
       if (auto_free && !duplicate) { \
           efree(s); \
       } \
       ZVAL_UNICODEL(z, u_str, u_len, 0); \
   } else { \
       char *__s=(s); int __l=l;   \
       Z_STRLEN_P(z) = __l;        \
       Z_STRVAL_P(z) = (duplicate?estrndup(__s, __l):__s); \
       Z_TYPE_P(z) = IS_STRING;    \
   }

Along with changes to the (ZVAL|RETVAL)_RT_STRING() macros: to support the additional auto_free option (with RETURN_* assuming auto_free=1).

The first option solves the one real problem with the current implementation by making the uncatchable RETURN_RT_STRING() macros free the original char* as needed, whereas the latter takes that work away from the calling scope at the cost of adding to the proto and requiring more work to go back and clean up existing uses.

Thoughts?

-Sara

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to