In article <CAEZNtZ+SKqiATLwDvzyVf=sulpght7usxhzeawnfextpwsb...@mail.gmail.com> troycurti...@gmail.com writes: > On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI > <futat...@yf.bsdclub.org> wrote:
>> The patch attached modifies 4 kind of input argment translations. >> >> (1) typemap(in) char * (with/without const modifiers); not allow NULL, >> typemap(in) const char * MAY_BE_NULL; allows NULL >> These had done by using 'parse=' typemap modifier, however there is no >> PyArg_Parse() format unit to convert both of str and bytes in py3. >> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c, >> and use it in for new typemap definition. >> >> consideration: >> * For py2, my patch code uses svn_swig_py_string_to_cstring() >> - It isn't allow Unicode for input, however 's' and 'z' format units >> PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2, >> it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or >> add code to conversion for py2) > > > Yes I think you should support Unicode in this case, but it turns out > you are most of the way there. If you just remove the IS_PY3 > conditional, it will support unicode! The "PyBytes_*" and "PyStr_*" > functions are wrappers provided by the py3c library. The names point > to the concept that they target, and then map it appropriately in Py2 > and Py3. So > > PyBytes: Sequence of byte values, e.g. "raw data" > In Py2: str > In Py3: bytes > > PyStr: Character data > In Py2: Unicode > In Py3: str Unfortunately, PyStr in py3c compatibility layer API is the intersection of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly use PyUnicode_* for handling Unicode in py2. >> (3) typemap(in) apr_hash_t * (for various types) >> These are using make_string_from_ob() for hash key string conversion, >> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value >> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses >> make_svn_strinf_from_ob() for hash value conversion >> >> consideration: >> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value, >> but current implementation of conversion function doesn't allows. Isn't >> it needed? (I added test for this case, but disabled until it make clear) >> * test case for UnicodeEncodeError is needed (for both of hash key and >> hash value) > > > Yes this looks like a good catch, the generic conversion function > should handle the NULL/None case. Then I'll fix it. Does it need to separate typemap that allow NULL as hash value and dones not allow? >> (4) typemap(in) apr_array_header_t *STRINGLIST >> This typemap is using svn_swig_py_unwrap_string() through the function >> svn_swig_py_seq_to_array(). >> >> consideration: >> * test case for UnicodeEncodeError is needed (for both of hash key and >> hash value) apr_array_header_t is't hash, so this was '(for values)' :) > And here is the start of my patch review: > > Don't forget to put a brief description of the over overall change in > the log message here. And as this is such large change, a detail > section is probably also in order as well. Yes, I see. > > [In subversion/bindings/swig] > > > > * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3, > > allow Str as well for data argment of svn_stream_write() > > > > * include/svn_global.swg > > (remove)(%typemap(in) char *, char const *, char * const, > > char const * const): Move this typemap into include/svn_strings as > > typemap (in) IN_STRING > > > [snip] > > > > diff --git a/subversion/bindings/swig/core.i > > b/subversion/bindings/swig/core.i > > index c309d48070..6993ef4c9c 100644 > > --- a/subversion/bindings/swig/core.i > > +++ b/subversion/bindings/swig/core.i > > @@ -442,18 +442,31 @@ > > */ > > #ifdef SWIGPYTHON > > %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) { > > - char *tmpdata; > > Py_ssize_t length; > > - if (!PyBytes_Check($input)) { > > - PyErr_SetString(PyExc_TypeError, > > - "expecting a bytes object for the buffer"); > > - SWIG_fail; > > + if (PyBytes_Check($input)) { > > + if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) { > > + SWIG_fail; > > + } > > + } > > +%#if IS_PY3 > > + else if (PyStr_Check($input)) { > > + $1 = PyStr_AsUTF8AndSize($input, &length); > > + if (PyErr_Occurred()) { > > + SWIG_fail; > > + } > > } > > - if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) { > > +%#endif > > Don't use the IS_PY3 conditional here, and you get Unicode (on Py2) > conversion automatically. I'll try to use Unicode_* API here for both of py2 and py3. > > + else { > > + PyErr_SetString(PyExc_TypeError, > > +%#if IS_PY3 > > + "expecting a bytes or str object for the buffer" > > +%#else > > + "expecting a string for the buffer" > > +%#endif > > I think it is valid to say "bytes or string" object for both Py2 and > Py3, thus getting rid of another conditional. I see. > > + ); > > SWIG_fail; > > } > > temp = ($*2_type)length; > > - $1 = tmpdata; > > $2 = ($2_ltype)&temp; > > } > > #endif > > [snip] > > > diff --git a/subversion/bindings/swig/include/svn_string.swg > > b/subversion/bindings/swig/include/svn_string.swg > > index 0fc64ebdcc..8be4c3d746 100644 > > --- a/subversion/bindings/swig/include/svn_string.swg > > +++ b/subversion/bindings/swig/include/svn_string.swg > > @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t; > > } > > #endif > > > > + /* ----------------------------------------------------------------------- > > + Type: char * (input) > > +*/ > > +#ifdef SWIGPYTHON > > +%typemap (in) IN_STRING > > +{ > > + $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", > > "$1_name"); > > + if (PyErr_Occurred()) SWIG_fail; > > +} > > + > > +%typemap (freearg) IN_STRING ""; > > What does freearg here do? Looking at the docs, it seems this supposed > to be used to free allocated resources once the wrapper function > exits. This is the replacement of %typemap(in, parse='s') and %typemap(in, parse='z') which uses internal char * buffer of Python object. This typemap also uses py object internal buffer, so it doesn't need typemap for freearg. As far as I read the code SWIG produed for %typemap(in, parse=...), it seems undefine typemap(freearg) corresponding it. If `%typemap (freearg) IN_STRING "";' doesn't exist here, pre-defined typemap for char * in swig library becomes effective, then extra codes are produced. > > + > > +%apply IN_STRING { > > + const char *, > > + char *, > > + char const *, > > + char * const, > > + char const * const > > +}; > > +#endif > > /* ----------------------------------------------------------------------- > > define a way to return a 'const char *' > > */ > > diff --git a/subversion/bindings/swig/include/svn_types.swg > > b/subversion/bindings/swig/include/svn_types.swg > > index 319f7daa6b..7c933b1ac7 100644 > > --- a/subversion/bindings/swig/include/svn_types.swg > > +++ b/subversion/bindings/swig/include/svn_types.swg > > @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self) > > #ifdef SWIGPYTHON > > %typemap(in) const char *MAY_BE_NULL > > { > > - if ($input == Py_None) { > > - $1 = NULL; > > - } else { > > - $1 = PyBytes_AsString($input); > > - if ($1 == NULL) SWIG_fail; > > - } > > + $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", > > "$1_name"); > > + if (PyErr_Occurred()) SWIG_fail; > > } > > #endif > > > > diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c > > b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c > > index 8a4ec631be..58cfec30a3 100644 > > --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c > > +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c > > @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t > > *error_chain) > > > > /*** Helper/Conversion Routines ***/ > > > > +/* Function to get char * representation of bytes/str object */ > > +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null, > > + const char * funcsym, const char * > > argsym) > > I think we need to be very careful with the return value here, both > the type and the lifetime. The trouble with using the value returned > directly from the C api below is that they typically return a pointer > to an internal buffer which must not be modified. Returning it as > "char *" is probably not the right idea. I see. > Additionally, these are > internal to the "input" object, so if something like: > > PyObject *foo = <create Py String>; > char *my val = svn_swig_py_string_to_cstring(foo, ...); > Py_DECREF(foo); > > svn_awesome_api(my_val, ...); /* This could explode */ > > Of course, we do need to handle the case of an input "char *" > argument, so I think the best option is probably to use > "apr_pstrdup()" to allocate a new string from the application pool. I don't suppose that this function is called other than "%typemap(in) IN_STRING" and "%typemap(in) char *MAY_BE_NULL", so I think the comment for this function is insufficient about it. (APIs which need to use apr_pstrdup() are already separated like "%typemap(in) const void *value" in core.i, I think.) > > +{ > > + char *retval = NULL; > > + if (PyBytes_Check(input)) { > > + retval = PyBytes_AsString(input); > > + } > > +#if IS_PY3 > > + else if (PyStr_Check(input)) { > > + retval = (char *)PyStr_AsUTF8(input); > > + } > > +#endif > > Remove this IF_PY3 conditional as well. > > > + else if (input != Py_None || ! maybe_null) { > > + PyErr_Format(PyExc_TypeError, > > +#if IS_PY3 > > + "%s() argument %s must be bytes or str%s, not %s", > > +#else > > + "%s() argument %s must be string%s, not %s", > > +#endif > > + funcsym, argsym, maybe_null?" or None":"", > > + Py_TYPE(input)->tp_name); > > + } > > + return retval; > > +} > > + > > [ As far as my review got for now] > > I'll continue reviewing over the next few days, that is all the time I > have for tonight! > > Troy Thank you for review. I'll start to fix pointed out. -- Yasuhito FUTATSUKI