On 06/03/11 22:34, John Dennison wrote:
[...]
from data like
Customer-ID | Item-ID
cust1 | 2
cust1 | 3
cust1 | 5
cust2 | 5
cust2 | 3
cust3 | 2
...
#read in data to a sparse binary transaction matrix
txn = read.transactions(file="tranaction_list.txt", rm.duplicates= TRUE,
format="single",sep="|",cols =c(1,2));
#tranaction matrix to matrix
a<-as(txn, "matrix")
#matrix to data.frame
b<-as.data.frame(a)
I end up with a data.frame like:
X X.1 X.2 X.3 X.4 X.5 ...
cust1 0 1 1 0 1
cust2 0 0 1 0 1
cust3 0 1 0 0 0
...
However the as.data.frame(a) transforms the matrix into a numeric
data.frame so when I implement the rpart algorithm it automatically returns
a regression classification tree.
I am not sure your approach with rpart is going to give you what you are
looking for, but on to your R question:
[...] I can't successfully transform the data.frame to a factor. i
tried:
b_factor<-as.factor(b)
Error in sort.list(y) :
'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
You need to do each column individually, i.e. b_factor$X.1 <-
as.factor(b$X.1) or
str( as.data.frame(lapply(b, as.factor)) )
'data.frame': 4 obs. of 4 variables:
$ X.2 : Factor w/ 2 levels "0","1": 2 1 2 1
$ X.3 : Factor w/ 2 levels "0","1": 2 2 1 1
$ X.5 : Factor w/ 2 levels "0","1": 2 2 1 1
$ X.Item.ID: Factor w/ 2 levels "0","1": 1 1 1 2
Also have a look at as(txn, "data.frame") for a different format that
may (with some clean up) be easier to use.
as(txn, "data.frame")
transactionID items
1 cust1 { 2, 3, 5}
2 cust2 { 3, 5}
3 cust3 { 2}
4 Customer-ID { Item-ID}
Hope this helps a little.
Allan
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.