[R] calculating the occurrences of distinct observations in the subsets of a dataframe

Bodnar Laszlo EB_HU Thu, 17 Mar 2011 03:14:36 -0700

Hello everybody,

I have a data frame in R which is similar to the follows. Actually my real 'df' 
dataframe is much bigger than this one here but I really do not want to confuse 
anybody so that is why I try to simplify things as much as possible.


So here's the data frame.

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df

Basically what I would like to do is to get the occurrences of numbers for each 
column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see 
my column 'id').

So, for column 'a' and for id number '1' (for the latter see column 'id') the 
code would be something like this:
as.numeric(table(df[1:10,2]))

The results are:
[1] 3 7

Just to briefly explain my results: in column 'a' (and regarding only those 
records which have number '1' in column 'id') we can say that:
number 1 occured 3 times, and
number 3 occured 7 times.

Again, just to show you another example. For column 'a' and for id number '2' 
(for the latter grouping see again column 'id'):
as.numeric(table(df[11:20,2]))

After running the codes the results are: [1] 4 3 3

Let me explain a little again: in column 'a' and regarding only those 
observations which have number '2' in column 'id') we can say that
number 1 occured 4 times
number 2 occured 3 times and
number 3 occured 3 times.

Last example: for column 'e' and for id number '3' the code would be:
as.numeric(table(df[21:30,6]))

With the results:
[1] 1 4 5

...meaning that number '1' occured once, number '2' occured four times and 
number '3' occured 5 times.

So this is what I would like to do. Calculating the occurrences of numbers for 
each custom-defined subsets (and then collecting these values into a data 
frame). I know it is NOT a difficult task but the PROBLEM is that I'm gonna 
have to change the input 'df' dataframe on a regular basis and hence both the 
overall number of rows and columns might CHANGE over time...

What I have done so far is that I have separated the 'df' dataframe by columns, 
like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])

So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But 
I'm really stuck now and I don't know how to move forward, you know, getting 
the occurrences for each column and each group of ids.

Do you have any ideas?
Best regards,
Laszlo

____________________________________________________________________________________________________
Ez az e-mail Ã©s az Ã¶sszes hozzÃ¡ tartozÃ³ csatolt mellÃ©klet titkos Ã©s/vagy 
jogilag, szakmailag vagy mÃ¡s mÃ³don vÃ©dett informÃ¡ciÃ³t tartalmazhat. 
Amennyiben nem Ãn a levÃ©l cÃmzettje akkor a levÃ©l tartalmÃ¡nak kÃ¶zlÃ©se, 
reprodukÃ¡lÃ¡sa, mÃ¡solÃ¡sa, vagy egyÃ©b mÃ¡s Ãºton tÃ¶rtÃ©nÅ terjesztÃ©se, 
felhasznÃ¡lÃ¡sa szigorÃºan tilos. Amennyiben tÃ©vedÃ©sbÅl kapta meg ezt az 
Ã¼zenetet kÃ©rjÃ¼k azonnal Ã©rtesÃtse az Ã¼zenet kÃ¼ldÅjÃ©t. Az Erste Bank 
Hungary Zrt. (EBH) nem vÃ¡llal felelÅssÃ©get az informÃ¡ciÃ³ teljes Ã©s pontos 
- cÃmzett(ek)hez tÃ¶rtÃ©nÅ - eljuttatÃ¡sÃ¡Ã©rt, valamint semmilyen 
kÃ©sÃ©sÃ©rt, kapcsolat megszakadÃ¡sbÃ³l eredÅ hibÃ¡Ã©rt, vagy az informÃ¡ciÃ³ 
felhasznÃ¡lÃ¡sÃ¡bÃ³l vagy annak megbÃzhatatlansÃ¡gÃ¡bÃ³l eredÅ kÃ¡rÃ©rt.

Az Ã¼zenetek EBH-n kÃvÃ¼li kÃ¼ldÅje vagy cÃmzettje tudomÃ¡sul veszi Ã©s 
hozzÃ¡jÃ¡rul, hogy az Ã¼zenetekhez mÃ¡s banki alkalmazott is hozzÃ¡fÃ©rhet az 
EBH folytonos munkamenetÃ©nek biztosÃtÃ¡sa Ã©rdekÃ©ben.


This e-mail and any attached files are confidential and/...{{dropped:19}}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculating the occurrences of distinct observations in the subsets of a dataframe

Reply via email to