Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

Ruediger Meier Wed, 29 Mar 2017 15:48:56 -0700

On Wednesday 29 March 2017, Andi Vajda wrote:
> On Wed, 29 Mar 2017, Petrus Hyvönen wrote:
> > Hi,
> >
> > With the /DLL, sprintf(buffer, "%0*%jx", (int) hexdig, hash); and
> > Py_SIZE it compiles under windows (Windows 7, 64 bit)
> >
> > I haven't set up for building pylucene but has another library that
> > I build.
> >
> > For that I get a udf-8 error on:
> >
> >  File
> > "C:\Users\phy\AppData\Local\Continuum\Anaconda3-430\conda-bld\oreki
> >t_1490824040916\_b_env\lib\site-packages\jcc\c pp.py", line 898, in
> > header
> >    env.strhash(signature(constructor)))
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xac in
> > position 9: invalid start byte
>
> That means that the "%0*%jx" change for _MSC_VER also needs a change
> for the hexdig sizeof computation. There is a mismatch and garbage is
> left in the buffer array.



Hm, actually "%jx" and PRIxMAX should be equivalent modifiers to print 
uintmax_t (c99). According to the MSC documentation I thought that the 
macro is even more safe to use.

The problem is that we do C++ here which does not need to support c99 at 
all. Though MSC might be the only C++ compiler which ignores c99...

To fix it I would suggest to use non-c99 types here. Lets use "unsigned 
long long" and "%llx". This may look a bit less rock-solid but good 
enough for any existing systems:

static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
{
    unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
    static const size_t hexdig = sizeof(hash) * 2;
    char buffer[hexdig + 1];

    sprintf(buffer, "%0*llx", (int) hexdig, hash);
    return PyUnicode_FromStringAndSize(buffer, hexdig);
}

BTW this function should be also copied to the py2 directory where we 
still use int allthough PyObject_Hash returns already long on python 
>2.x.

cu,
Rudi

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

Reply via email to