subhabangal...@gmail.com wrote: > I wrote a small piece of following code > > import nltk > from nltk.corpus.reader import TaggedCorpusReader > from nltk.tag import CRFTagger > def NE_TAGGER(): > reader = TaggedCorpusReader('/python27/', r'.*\.pos') > f1=reader.fileids() > print "The Files of Corpus are:",f1 > sents=reader.tagged_sents() > ls=len(sents) > print "Length of Corpus Is:",ls > train_data=sents[:300] > test_data=sents[301:350]
Offtopic: not that sents[300] is neither in the training nor in the test data; Python uses half-open intervals. > ct = CRFTagger() > crf_tagger=ct.train(train_data,'model.crf.tagger') > > This code is working fine. > Now if I change the data size to say 500 or 3000 in train_data by giving > train_data=sents[:500] or > train_data=sents[:3000] it is giving me the following error. What about sents[:499], sents[:498], ...? I'm not an nltk user, but to debug the problem I suggest that you identify the exact index that triggers the exception, and then print it print sents[minimal_index_that_causes_typeerror] Perhaps you can spot a problem with the input data. (In the spirit of the "offtopic" remark: if sents[:333] triggers the failure you have to print sents[332]) > Traceback (most recent call last): > File "<pyshell#2>", line 1, in <module> > NE_TAGGER() > File "C:\Python27\HindiCRFNERTagger1.py", line 20, in NE_TAGGER > crf_tagger=ct.train(train_data,'model.crf.tagger') > File "C:\Python27\lib\site-packages\nltk\tag\crf.py", line 185, in train > trainer.append(features,labels) > File "pycrfsuite\_pycrfsuite.pyx", line 312, in > pycrfsuite._pycrfsuite.BaseTrainer.append > (pycrfsuite/_pycrfsuite.cpp:3800) File "stringsource", line 53, in > vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string > (pycrfsuite/_pycrfsuite.cpp:10738) File "stringsource", line 15, in > string.from_py.__pyx_convert_string_from_py_std__in_string > (pycrfsuite/_pycrfsuite.cpp:10633) > TypeError: expected string or Unicode object, NoneType found >>>> > > I have searched for solutions in web found the following links as, > https://stackoverflow.com/questions/14219038/python-multiprocessing-typeerror-expected-string-or-unicode-object-nonetype-f > or > https://github.com/kamakazikamikaze/easysnmp/issues/50 > > reloaded Python but did not find much help. > > I am using Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:22:17) [MSC > v.1500 32 bit (Intel)] on win32 > > My O/S is, MS-Windows 7. > > If any body may kindly suggest a resolution. -- https://mail.python.org/mailman/listinfo/python-list