I am using the C API in Python 3.8 with the nltk library, and I have a problem with the return from a library call implemented with PyObject_CallFunctionObjArgs.
This is the relevant Python code: import nltk from nltk.corpus import gutenberg fileids = gutenberg.fileids() sentences = gutenberg.sents(fileids[0]) sentence = sentences[0] sentence = " ".join(sentence) pt = nltk.word_tokenize(sentence) I run this at the Python command prompt to show how it works: >>> sentence = " ".join(sentence) >>> pt = nltk.word_tokenize(sentence) >>> print(pt) ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] >>> type(pt) <class 'list'> This is the relevant part of the C API code: PyObject* str_sentence = PyObject_Str(pSentence); // nltk.word_tokenize(sentence) PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize"); PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0); (where pModule_mstr is the nltk library). That should produce a list with a length of 7 that looks like it does on the command line version shown above: ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'] But instead the C API produces a list with a length of 24, and the REPR looks like this: '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']' I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without success. Thanks for any help on this. Jen -- https://mail.python.org/mailman/listinfo/python-list