This is a follow-on to a question I asked yesterday, which was answered by MRAB. I'm using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences. The Python code I am trying to replicate is:
from nltk.corpus import gutenberg for i, fileid in enumerate(gutenberg.fileids()): sentences = gutenberg.sents(fileid) etc I have everything finished down to the last line (sentences = gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, but it segfaults. The fileid is a string -- the first fileid in this corpus is "austen-emma.txt." pName = PyUnicode_FromString("nltk.corpus"); pModule = PyImport_Import(pName); pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); pSentMod = PyObject_GetAttrString(pSubMod, "sents"); pFileIds = PyObject_CallObject(pFidMod, 0); pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); Py_DECREF(pListStrE); // sentences = gutenberg.sents(fileid) PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *NullPtr = 0; pSents = PyObject_Call(pSentMod, c_args, NullPtr); The final line segfaults: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6e4e8d5 in _PyEval_EvalCodeWithName () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 My guess is the problem is in Py_BuildValue, which returns a pointer but it may not be constructed correctly. I also tried it with "O" and it doesn't segfault but it returns 0x0. I'm new to using the C API. Thanks for any help. Jen -- https://mail.python.org/mailman/listinfo/python-list