This is a good example of what I'm looking for: [image: dendrogram.jpg]
Best On Thu, Jul 29, 2010 at 12:01 AM, Pablo Cerdeira <pablo.cerde...@gmail.com>wrote: > > Dear all, > > I'm trying to use some technic to do a pattern recognition over a large > dataset. I really don't have any idea on how to do that using R. > > Here is a sample of the data: > > id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 > 1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284 > 1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69 > 1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497 > 1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3 > 1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158 > 1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284 > 1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336 > 1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3 > 1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578 > 1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34 > 1482662,336,369,75,495,34,,,,,,,,,, > 1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238 > 1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513 > 1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69 > 1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69 > 1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241 > 1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495 > 1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516 > 1484738,336,495,34,475,391,,,,,,,,,, > > The column id is the identity of the object. After that, the columns 1, 2, > 3 ... brings me some information about the object in a sequence. > > I'd like to recognize the patterns. I.E.: > > - As you can see, the number "208" os the most common value in the column > 1. I have "208" 12 times over 20. Or 60%. > - Usually, after a "208", I have a "69" in the column 2. Or 100% when the > first column is "208". > - In the column 3 we can find a fork. Sometimes I have a "180" (line 1), > sometimes a "352". > > I'd like to identify this patterns, plotting 2 graphs: > > - A dendogram showing the chances of a pattern to occur to each possible > combination. > - A dispersion graph, identifying the possible clusters. > > Does anybody have any idea on how to do something like this? > > Many thanks, in advanced, > > > -- > *Pablo de Camargo Cerdeira* > pa...@fgv.br > pablo.cerde...@gmail.com > +55 (21) 3799-6065 > > -- *Pablo de Camargo Cerdeira* pa...@fgv.br pablo.cerde...@gmail.com +55 (21) 3799-6065 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.