Re: [R] SVM. How to use categorical attributes?

Ulrich Bodenhofer Wed, 28 Mar 2012 04:39:31 -0700

Alex,

To avoid the memory issue, you can directly use a "bag of words" kernel
(which corresponds to using the linear kernel on the sparse bag of words
matrix Steve suggested). Just a little toy example how this is done for two
:

> x1 <- c("how", "to", "grow", "tree") 
> x2 <- c("where", "to", "go", "weekend", "cinema") 
> k12 <- length(intersect(x1, x2))
> k12
[1] 1

If you run this for every pair of samples (additionally exploiting the
symmetry of the resulting matrix), you will get an L x L matrix of kernel
values (where L is the number of samples) without the need of having to
store the large bag of words matrix. That's exactly one of the beauties of
SVMs, in my humble opinion.

Just as a side note: the result above is 1 because there is one overlap in
the two bags of words, the word "to". Maybe it is a good idea to remove such
unspecific words first and, moreover, to do word stemming, as is the
standard in analyses like the one you are aiming at.

Best regards,
Ulrich

--
View this message in context: 
http://r.789695.n4.nabble.com/SVM-How-to-use-categorical-attributes-tp4508460p4512034.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SVM. How to use categorical attributes?

Reply via email to