In article <CAEZNtZ+SKqiATLwDvzyVf=sulpght7usxhzeawnfextpwsb...@mail.gmail.com>
troycurti...@gmail.com writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <futat...@yf.bsdclub.org> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>      typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>    - It isn't allow Unicode for input, however 's' and 'z' format units
>>     PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>     it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>     add code to conversion for py2)
> 
> 
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library.  The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
> 
> PyBytes: Sequence of byte values, e.g. "raw data"
>    In Py2: str
>    In Py3: bytes
> 
> PyStr: Character data
>    In Py2: Unicode
>    In Py3: str

Unfortunately, PyStr in py3c compatibility layer API is the intersection
of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
use PyUnicode_* for handling Unicode in py2.

 
>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>>   but current implementation of conversion function doesn't allows. Isn't
>>   it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
> 
> 
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.

Then I'll fix it. Does it need to separate typemap that allow NULL
as hash value and dones not allow?
 
>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
apr_array_header_t is't hash, so this was '(for values)' :)

 
> And here is the start of my patch review:
> 
> Don't forget to put a brief description of the over overall change in
> the log message here. And as this is such large change, a detail
> section is probably also in order as well.
 
Yes, I see.

> > [In subversion/bindings/swig]
> >
> > * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3,
> >  allow Str as well for data argment of svn_stream_write()
> >
> > * include/svn_global.swg
> >  (remove)(%typemap(in) char *, char const *, char * const,
> >   char const * const): Move this typemap into include/svn_strings as
> >   typemap (in) IN_STRING
> >
> [snip]
> >
> > diff --git a/subversion/bindings/swig/core.i 
> > b/subversion/bindings/swig/core.i
> > index c309d48070..6993ef4c9c 100644
> > --- a/subversion/bindings/swig/core.i
> > +++ b/subversion/bindings/swig/core.i
> > @@ -442,18 +442,31 @@
> >  */
> >  #ifdef SWIGPYTHON
> >  %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> > -    char *tmpdata;
> >      Py_ssize_t length;
> > -    if (!PyBytes_Check($input)) {
> > -        PyErr_SetString(PyExc_TypeError,
> > -                        "expecting a bytes object for the buffer");
> > -        SWIG_fail;
> > +    if (PyBytes_Check($input)) {
> > +        if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) {
> > +            SWIG_fail;
> > +        }
> > +    }
> > +%#if IS_PY3
> > +    else if (PyStr_Check($input)) {
> > +        $1 = PyStr_AsUTF8AndSize($input, &length);
> > +        if (PyErr_Occurred()) {
> > +            SWIG_fail;
> > +        }
> >      }
> > -    if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
> > +%#endif
> 
> Don't use the IS_PY3 conditional here, and you get Unicode (on Py2)
> conversion automatically.

I'll try to use Unicode_* API here for both of py2 and py3.
 
> > +    else {
> > +        PyErr_SetString(PyExc_TypeError,
> > +%#if IS_PY3
> > +                        "expecting a bytes or str object for the buffer"
> > +%#else
> > +                        "expecting a string for the buffer"
> > +%#endif
> 
> I think it is valid to say "bytes or string" object for both Py2 and
> Py3, thus getting rid of another conditional.

I see.
 
> > +                       );
> >          SWIG_fail;
> >      }
> >      temp = ($*2_type)length;
> > -    $1 = tmpdata;
> >      $2 = ($2_ltype)&temp;
> >  }
> >  #endif
> 
> [snip]
> 
> > diff --git a/subversion/bindings/swig/include/svn_string.swg 
> > b/subversion/bindings/swig/include/svn_string.swg
> > index 0fc64ebdcc..8be4c3d746 100644
> > --- a/subversion/bindings/swig/include/svn_string.swg
> > +++ b/subversion/bindings/swig/include/svn_string.swg
> > @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t;
> >  }
> >  #endif
> >
> > + /* -----------------------------------------------------------------------
> > +   Type: char * (input)
> > +*/
> > +#ifdef SWIGPYTHON
> > +%typemap (in) IN_STRING
> > +{
> > +    $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", 
> > "$1_name");
> > +    if (PyErr_Occurred()) SWIG_fail;
> > +}
> > +
> > +%typemap (freearg) IN_STRING "";
> 
> What does freearg here do? Looking at the docs, it seems this supposed
> to be used to free allocated resources once the wrapper function
> exits.

This is the replacement of %typemap(in, parse='s') 
and %typemap(in, parse='z') which uses internal char * buffer of
Python object. This typemap also uses py object internal buffer,
so it doesn't need typemap for freearg.  As far as I read the code
SWIG produed for %typemap(in, parse=...), it seems undefine
typemap(freearg) corresponding it. If `%typemap (freearg) IN_STRING "";'
doesn't exist here, pre-defined typemap for char * in swig library
becomes effective, then extra codes are produced.  

 
> > +
> > +%apply IN_STRING {
> > +    const char *,
> > +    char *,
> > +    char const *,
> > +    char * const,
> > +    char const * const
> > +};
> > +#endif
> >  /* -----------------------------------------------------------------------
> >     define a way to return a 'const char *'
> >  */
> > diff --git a/subversion/bindings/swig/include/svn_types.swg 
> > b/subversion/bindings/swig/include/svn_types.swg
> > index 319f7daa6b..7c933b1ac7 100644
> > --- a/subversion/bindings/swig/include/svn_types.swg
> > +++ b/subversion/bindings/swig/include/svn_types.swg
> > @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
> >  #ifdef SWIGPYTHON
> >  %typemap(in) const char *MAY_BE_NULL
> >  {
> > -    if ($input == Py_None) {
> > -        $1 = NULL;
> > -    } else {
> > -        $1 = PyBytes_AsString($input);
> > -        if ($1 == NULL) SWIG_fail;
> > -    }
> > +    $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", 
> > "$1_name");
> > +    if (PyErr_Occurred()) SWIG_fail;
> >  }
> >  #endif
> >
> > diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c 
> > b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > index 8a4ec631be..58cfec30a3 100644
> > --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t 
> > *error_chain)
> >
> >  /*** Helper/Conversion Routines ***/
> >
> > +/* Function to get char * representation of bytes/str object */
> > +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null,
> > +                                    const char * funcsym, const char * 
> > argsym)
> 
> I think we need to be very careful with the return value here, both
> the type and the lifetime. The trouble with using the value returned
> directly from the C api below is that they typically return a pointer
> to an internal buffer which must not be modified. Returning it as
> "char *" is probably not the right idea.

I see.

>                                          Additionally, these are
> internal to the "input" object, so if something like:
> 
>   PyObject *foo = <create Py String>;
>   char *my val = svn_swig_py_string_to_cstring(foo, ...);
>   Py_DECREF(foo);
> 
>   svn_awesome_api(my_val, ...); /* This could explode */
> 
> Of course, we do need to handle the case of an input "char *"
> argument, so I think the best option is probably to use
> "apr_pstrdup()" to allocate a new string from the application pool.

I don't suppose that this function is called other than
"%typemap(in) IN_STRING" and "%typemap(in) char *MAY_BE_NULL",
so I think the comment for this function is insufficient about it.
(APIs which need to use apr_pstrdup() are already separated
like "%typemap(in) const void *value" in core.i, I think.)
 
> > +{
> > +    char *retval = NULL;
> > +    if (PyBytes_Check(input)) {
> > +        retval = PyBytes_AsString(input);
> > +    }
> > +#if IS_PY3
> > +    else if (PyStr_Check(input)) {
> > +        retval = (char *)PyStr_AsUTF8(input);
> > +    }
> > +#endif
> 
> Remove this IF_PY3 conditional as well.
> 
> > +    else if (input != Py_None || ! maybe_null) {
> > +        PyErr_Format(PyExc_TypeError,
> > +#if IS_PY3
> > +                     "%s() argument %s must be bytes or str%s, not %s",
> > +#else
> > +                     "%s() argument %s must be string%s, not %s",
> > +#endif
> > +                     funcsym, argsym, maybe_null?" or None":"",
> > +                     Py_TYPE(input)->tp_name);
> > +    }
> > +    return retval;
> > +}
> > +
> 
> [ As far as my review got for now]
> 
> I'll continue reviewing over the next few days, that is all the time I
> have for tonight!
>
> Troy

Thank you for review. I'll start to fix pointed out.

-- 
Yasuhito FUTATSUKI

Reply via email to