Dear All Beginning data mining and need some help working out the best way to represent data. I have searched here and online and not found any real help. Imagines that I have a file of order(sales) data
OrderNo CustomerNo ItemsInOrder 1 1 a,b,c 2 1 d 3 2 a,d I can represent this as a data.frame but then need to parse my ItemsInOrder? This seems quite clumsy. Alternatively I can try this sort of representation OrderNo CustomerNo a b c d 1 1 1 1 1 NA 2 1 NA NA NA 1 3 2 1 NA NA 1 Are these really the two choices and how well does the second representation scale? (I have 50,000 SKUs) Can anyone point me in the direction of some worked examples that take such data and manipulate it; looking for association rules and clusters? Thanks Jonathan -- View this message in context: http://www.nabble.com/Help-deciding-on-data-format-for-sales-data-%28newbie%29-tp23835331p23835331.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.