On 27/04/17 18:39, Siva Kumar S wrote:
Source Code:

clean_train_reviews=[]
for review in train["review"]:
    clean_train_reviews.append(review_to_wordlist(review, 
remove_stopwords=True))

trainDataVecs=getAvgFeatureVecs(clean_train_reviews, model, num_features)

print "Creating average feature vecs for test reviews"
clean_test_reviews=[]
for review in test["review"]:
    clean_test_reviews.append(review_to_wordlist(review,remove_stopwords=True))

testDataVecs=getAvgFeatureVecs(clean_test_reviews, model, num_features)

forest = RandomForestClassifier(n_estimators = 100)

forest = forest.fit(trainDataVecs, train["sentiment"])

result = forest.predict(testDataVecs)

output = pd.DataFrame(data={"id":test["id"], "sentiment":result})
output.to_csv("Word2Vec_AverageVectors.csv", index=False, quoting=3)

Error Message:

Traceback (most recent call last):
  File "/test_IMDB_W2V_RF.py", line 224, in <module>
    result = forest.predict(testDataVecs)
  File "/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 
534, in predict
    proba = self.predict_proba(X)
  File "/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 
573, in predict_proba
    X = self._validate_X_predict(X)
  File "/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 
355, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 365, in 
_validate_X_predict
    X = check_array(X, dtype=DTYPE, accept_sparse="csr")
  File "/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 
407, in check_array
    _assert_all_finite(array)
  File "/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 
58, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for 
dtype('float32').

Process finished with exit code 1


Description :
Can any one help with the error message.

It means exactly what it says. One of the values in your testDataVecs (I assume) is not a number, infinite or too big for a 32-bit IEEE float to represent. You may be using the sklearn package incorrectly; you'll have to read the (apparently quite prolific) documentation yourself, I've never used it.

--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to