On 06/03/11 22:34, John Dennison wrote:
[...]
from data like

Customer-ID | Item-ID
cust1           | 2
cust1           | 3
cust1           | 5
cust2          | 5
cust2          | 3
cust3         | 2
...

#read in data to a sparse binary transaction matrix
txn = read.transactions(file="tranaction_list.txt", rm.duplicates= TRUE,
format="single",sep="|",cols =c(1,2));

#tranaction matrix to matrix
a<-as(txn, "matrix")

#matrix to data.frame
b<-as.data.frame(a)

I end up with a data.frame like:

X       X.1 X.2  X.3 X.4 X.5 ...
cust1  0    1   1    0    1
cust2  0    0   1    0    1
cust3  0    1   0    0    0
...

  However the as.data.frame(a) transforms the matrix into a numeric
data.frame so when I implement the rpart algorithm it automatically returns
a regression classification tree.

I am not sure your approach with rpart is going to give you what you are looking for, but on to your R question:

[...] I can't successfully transform the data.frame to a factor. i
tried:

b_factor<-as.factor(b)
Error in sort.list(y) :
   'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

You need to do each column individually, i.e. b_factor$X.1 <- as.factor(b$X.1) or

 str( as.data.frame(lapply(b, as.factor)) )
'data.frame':    4 obs. of  4 variables:
 $ X.2      : Factor w/ 2 levels "0","1": 2 1 2 1
 $ X.3      : Factor w/ 2 levels "0","1": 2 2 1 1
 $ X.5      : Factor w/ 2 levels "0","1": 2 2 1 1
 $ X.Item.ID: Factor w/ 2 levels "0","1": 1 1 1 2


Also have a look at as(txn, "data.frame") for a different format that may (with some clean up) be easier to use.

 as(txn, "data.frame")
     transactionID      items
1 cust1            { 2, 3, 5}
2  cust2              { 3, 5}
3   cust3                { 2}
4     Customer-ID  { Item-ID}


Hope this helps a little.

Allan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to