Hi,
Suppose your data is similar to below:
dat <- structure(list(Custom = c("Judi", "Judi", "Ben", "Tom", "Tom", 
"Bill", "Lindy", "Shary", "Judu", "Judu", "Billy", "Tommy", "Tommy", 
"Benjum", "Linda", "Shiry", "Shiry", "Shiry", "Judu", "Billy", 
"Tommy", "Lindy"), Gender = c("Female", "Female", "Male", "Male", 
"Male", "Male", "Female", "Female", "Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Female", 
"Female", "Male", "Male", "Female"), Product = c("A", "B", "A", 
"A", "B", "B", "A", "B", "A", "B", "A", "A", "B", "B", "A", "B", 
"A", "C", "D", "E", "D", "C"), Payment = c("Credit Card", "Credit Card", 
"Cash", "Cash", "Cash", "Credit Card", "Cash", "Credit Card", 
"Credit Card", "Credit Card", "Cash", "Cash", "Cash", "Credit Card", 
"Cash", "Credit Card", "Credit Card", "Credit Card", "Credit Card", 
"Cash", "Cash", "Cash")), .Names = c("Custom", "Gender", "Product", 
"Payment"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22"))

dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) 
if(length(x)>1)  paste("Purchase", gsub("(.*)\\,(.*)$","\\1 and 
\\2",paste(sort(unique(x)),collapse = ","))) else paste("Purchase", x, 
"only"))) 

library(reshape2)
res <- acast(dat1,Categ~Gender+Payment,length,value.var="Categ")

library(stringr)
res1 <- res/str_count(gsub("Purchase|and|only|\\,"," ",rownames(res)),"\\w+")
res1
#Female_Cash Female_Credit Card Male_Cash Male_Credit Card
# Purchase A and B             0                  1         1                0
# Purchase A and C             1                  0         0                0 
#Purchase A and E             0                  0         1                0 
#Purchase A,B and C           0                  1         0                0 
#Purchase A,B and D           0                  1         1                0 
#Purchase A only              1                  0         1                0 
#Purchase B only              0                  1         0                2
 
A.K.


Hello A.K. ,

Thank you so much for your reply.  The error message was fixed.   One more 
thing I would like to get your kind instruction.

For "res[2,] <- res[2,]/2", I think you divide the count of customers who 
purchase both product A and B by 2.  If there are more than two products or 
more ways of payments, how can R handle?

Is there any other way to run distinct count of customers directly (count 
customers who purchase product both A and B only one time but not two times)?   
 Thank you so much for your time and help.

Best,
Tom 




On Wednesday, April 9, 2014 3:47 PM, arun <smartpink...@yahoo.com> wrote:


Hi,
Try:
datNew <- read.csv("customer_samples.csv",stringsAsFactors=FALSE)

#I could reproduce similar error message with:
dat[] <- lapply(dat,as.factor) 

dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) 
if(length(x)>1) "A and B" else x)) 


#Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA 
generated
2: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA 
generated
3: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA 
generated
4: In `[<-.factor`(`*tmp*`, i, value = "A and B") : invalid factor level, NA 
generated 

A.K.


Hello A.K. ,  Thank you very much for your reply.  I tried the following codes 
but got some warning messages:  ------------------------- Codes I tried 
-------------- 
dat <- read.csv ("customer samples.csv")  dat1 <- within(dat, Categ <- 
ave(Product, Custom, FUN= function(x) if(length(x)>1) "A and B" else x))  
library(reshape2)  res <- 
acast(dat1,Categ~Gender+Payment,length,value.var="Categ") #or dcast()  res[2,] 
<- res[2,]/2 
res  ---------------------------------  Waring messages I got:  1: In 
'[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor level, NA 
generated  2: In '[<-.factor' ('*tmp*', i, value = "A and B"):  invalid factor 
level, NA generated  3: In '[<-.factor' ('*tmp*', i, value = "A and B"):  
invalid factor level, NA generated  4: In '[<-.factor' ('*tmp*', i, value = "A 
and B"):  invalid factor level, NA generated  
-------------------------------------------------  Could you please help me 
out?  Thanks a lot! 



On Wednesday, April 9, 2014 12:18 PM, arun <smartpink...@yahoo.com> wrote:
Hi,
Try:

dat <- structure(list(Custom = c("Judi", "Judi", "Ben", "Tom", "Tom", 
"Bill", "Lindy", "Shary", "Judu", "Judu", "Billy", "Tommy", "Tommy", 
"Benjum", "Linda", "Shiry"), Gender = c("Female", "Female", "Male", 
"Male", "Male", "Male", "Female", "Female", "Female", "Female", 
"Male", "Male", "Male", "Male", "Female", "Female"), Product = c("A", 
"B", "A", "A", "B", "B", "A", "B", "A", "B", "A", "A", "B", "B", 
"A", "B"), Payment = c("Credit Card", "Credit Card", "Cash", 
"Cash", "Cash", "Credit Card", "Cash", "Credit Card", "Credit Card", 
"Credit Card", "Cash", "Cash", "Cash", "Credit Card", "Cash", 
"Credit Card")), .Names = c("Custom", "Gender", "Product", "Payment"
), class = "data.frame", row.names = c(NA, -16L))

 dat1 <- within(dat, Categ <- ave(Product, Custom, FUN= function(x) 
if(length(x)>1) "A and B" else x))

 library(reshape2)
 res <- acast(dat1,Categ~Gender+Payment,length,value.var="Categ") #or dcast()

res[2,] <- res[2,]/2 
res 


A.K.


Hello experts, I am a beginner of R and need your kind help for a R question. 
Any advice will be greatly appreciated. I have a sample data set like below: 
Customs purchase either product A or B or both using either Credit card or 
Cash. I would like to summarize the data as a crosstab in R ---- show how many 
customs purchase product A only or product B only or product A and B using 
either credit card or cash. Is that possible in R? Thank you very much for your 
time and help. Customer_Sample.xlsx

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to