Dear all, I'm trying to use some technic to do a pattern recognition over a large dataset. I really don't have any idea on how to do that using R.
Here is a sample of the data: id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284 1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69 1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497 1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3 1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158 1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284 1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336 1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3 1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578 1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34 1482662,336,369,75,495,34,,,,,,,,,, 1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238 1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513 1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69 1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69 1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241 1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495 1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516 1484738,336,495,34,475,391,,,,,,,,,, The column id is the identity of the object. After that, the columns 1, 2, 3 ... brings me some information about the object in a sequence. I'd like to recognize the patterns. I.E.: - As you can see, the number "208" os the most common value in the column 1. I have "208" 12 times over 20. Or 60%. - Usually, after a "208", I have a "69" in the column 2. Or 100% when the first column is "208". - In the column 3 we can find a fork. Sometimes I have a "180" (line 1), sometimes a "352". I'd like to identify this patterns, plotting 2 graphs: - A dendogram showing the chances of a pattern to occur to each possible combination. - A dispersion graph, identifying the possible clusters. Does anybody have any idea on how to do something like this? Many thanks, in advanced, -- *Pablo de Camargo Cerdeira* pa...@fgv.br pablo.cerde...@gmail.com +55 (21) 3799-6065 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.