Dear all,

I'm trying to use some technic to do a pattern recognition over a large
dataset. I really don't have any idea on how to do that using R.

Here is a sample of the data:

id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284
1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69
1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497
1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3
1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158
1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284
1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336
1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3
1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578
1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34
1482662,336,369,75,495,34,,,,,,,,,,
1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238
1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513
1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69
1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69
1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241
1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495
1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516
1484738,336,495,34,475,391,,,,,,,,,,

The column id is the identity of the object. After that, the columns 1, 2, 3
... brings me some information about the object in a sequence.

I'd like to recognize the patterns. I.E.:

- As you can see, the number "208" os the most common value in the column 1.
I have "208" 12 times over 20. Or 60%.
- Usually, after a "208", I have a "69" in the column 2. Or 100% when the
first column is "208".
- In the column 3 we can find a fork. Sometimes I have a "180" (line 1),
sometimes a "352".

I'd like to identify this patterns, plotting 2 graphs:

- A dendogram showing the chances of a pattern to occur to each possible
combination.
- A dispersion graph, identifying the possible clusters.

Does anybody have any idea on how to do something like this?

Many thanks, in advanced,


-- 
*Pablo de Camargo Cerdeira*
pa...@fgv.br
pablo.cerde...@gmail.com
+55 (21) 3799-6065

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to