[R] Help deciding on data format for sales data (newbie)

jonathanbriggs Tue, 02 Jun 2009 09:06:34 -0700

Dear All

Beginning data mining and need some help working out the best way to
represent data. I have searched here and online and not found any real help.
Imagines that I have a file of order(sales) data


OrderNo CustomerNo ItemsInOrder
1           1                a,b,c
2           1                d
3           2                a,d

I can represent this as a data.frame but then need to parse my ItemsInOrder?
This seems quite clumsy. Alternatively I can try this sort of representation

OrderNo  CustomerNo  a  b  c  d
1            1                1  1   1  NA
2            1                NA NA NA 1
3            2                1  NA  NA 1

Are these really the two choices and how well does the second representation
scale? (I have 50,000 SKUs)

Can anyone point me in the direction of some worked examples that take such
data and manipulate it; looking for association rules and clusters?

Thanks

Jonathan
-- 
View this message in context: 
http://www.nabble.com/Help-deciding-on-data-format-for-sales-data-%28newbie%29-tp23835331p23835331.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help deciding on data format for sales data (newbie)

Reply via email to