On 02/07/10 16:21, Chris Beeley wrote:
Hello-

I have a dataset which basically looks like this:

Location   Sex       Date          Time   Verbal    Self harm
Violence_objects   Violence
   A             1      1-4-2007       1800      3             0
             1                       3
   A             1      1-4-2007       1230      2            1
            2                       4
   D             2      2-4-2007       1100      0            4
            0                       0
...

I've put a dput of the first section of the data at the end of this
email. [...]

What I want to do is:

A) sum each of the dependent variables for each of the dates (so e.g.
in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
3+4=7 for each of the variables)

If 'data' is the data at the end of your email, then

 aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date, data = 
data, FUN = sum)
      Date verbal self.harm violence_objects violence
1 01/04/07     25        15                3        9
2 02/04/07     24         6                8       13
3 03/04/07     17        13                0       10


is one approach. Read help("aggregate") and don't forget the na.action= argument.


B) do this sum, but only in each location this time (location is the
first variable)- so the sum for 1-4-2007 in location A, sum for
1-4-2007 in location B, and so on and so on. Because this is divided

The basic approach could be

 aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date + Location, 
data = data, FUN = sum)
       Date Location verbal self.harm violence_objects violence
1  01/04/07        A      7         1                0        3
2  02/04/07        A      8         2                0        1
3  03/04/07        A      0         0                0        2
4  01/04/07        B      3         2                0        1
5  02/04/07        B      4         2                0        0
6  03/04/07        B      4         0                0        3
7  01/04/07        C      4         2                3        2
8  02/04/07        C      0         0                4        2
9  03/04/07        C      1         1                0        5
10 01/04/07        D      7         6                0        3
11 02/04/07        D      0         0                0        9
12 03/04/07        D      4        11                0        0
13 01/04/07        E      4         3                0        0
14 02/04/07        E      4         0                4        0
15 03/04/07        E      8         1                0        0
16 01/04/07        F      0         1                0        0
17 02/04/07        F      8         2                0        1



across locations, some dates will have no data going into them and
will return 0 sums. Crucially I still want these dates to appear- so
e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
0 0, then 23-5-2008 0 0 0 0 again, and etc.

Why?

But variations on

 data2<- data[!(as.numeric(data$Date)==3&  data$Location=="B"),] # For example
 z<- with(data2, tapply(verbal, list(Date,Location), FUN=sum))
 z[is.na(z)]<- 0
 print(z)
           A B C D E F
         0 0 0 0 0 0 0
01/04/07 0 7 3 4 7 4 0
02/04/07 0 8 0 0 0 4 8
03/04/07 0 0 4 1 4 8 0



will perhaps work for you.

Hope this helps

Allan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to