Hello.
This is the first time i am using RTextTools. I have to implement an SVM
classification on a collection of text documents. I am following this
tutorial.

http://journal.r-project.org/archive/2013-1/collingwood-jurka-boydstun-etal.pdf

I am giving you my code, stepwise.

#First i read my data and gave an index file. The index file had a list of
all the text documents to be classified along with their individual tag.
Example, if there is a file, abc.txt, belonging to the genre X, the index
file will have it stored as abc.txt,X  and so on.

Code :
data = read_data('C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/folder',
type=c('folder'), index =
'C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/index.txt')

#####Second, i create a doc-term matrix.

doc_matrix <- create_matrix(data, language="english", removeNumbers=TRUE,
stemWords=TRUE, removeSparseTerms=.8)

#####Third, i create a container which houses

container <- create_container(doc_matrix, data$genre, trainSize=1:93,
testSize=94:116, virgin=FALSE)

#######Here, data$genre is a label, where each document has its genre label
given in exact order, aligned like an index.

######So far, there has been no error.

But Now when i try to train the SVM on the container, using the following
code,

SVM <- train_model(container, "SVM")

 ##### It gives me this error.######

Error in svm.default(x = container@training_matrix, y =
container@training_codes,  :   x and y don't match.

######When i see the structure of the "container', it shows me training
codes empty. Like this. (attached full structure)######

Slot "training_codes":
factor(0)
Levels:

Slot "testing_codes":
factor(0)
Levels:

#####Can somebody please, please help? I have been desperately trying to
look for some answer. Could there be something wrong with the index file of
read_data, or is it a problem with the data$genre variable? Those are the
new things, i may have gotten them incorrect. I will be most grateful.
#######

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to