Thanks for all your help, that has worked a treat. To answer your questions, I want to include the zero rows because I am going to analyse using mixed models (with dummies for day of week, location etc.) and I thought it was necessary to include a complete list of time variables, but now I'm wondering if it is necessary. As for the empty rows, the database is generated automatically by the incidents reporting system and is a bit of a mess, so I want to make sure that the code doesn't stumble over such things.
Thanks again all! On 2 Jul 2010, at 17:14, David Winsemius <dwinsem...@comcast.net> wrote: > > On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote: > >> Hi >> >> did you try aggregate? >> >>> aggregate(data[, 5:8],list(data$Date), sum, na.rm=T) >> Group.1 verbal self.harm violence_objects violence >> 1 0 0 0 0 >> 2 01/04/07 25 15 3 9 >> 3 02/04/07 24 6 8 13 >> 4 03/04/07 17 13 0 10 >>> aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T) > > That address his A) request: > > Here is the application of aggregate to his B) request (I think): > > # Not e that Date is not of class Date but is rather a factor that includes > "" as a level. > > > aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T) > Group.1 Group.2 verbal self.harm violence_objects violence > 1 0 0 0 0 > 2 A 0 0 0 0 > 3 01/04/07 A 7 1 0 3 > 4 02/04/07 A 8 2 0 1 > 5 03/04/07 A 0 0 0 2 > 6 B 0 0 0 0 > 7 01/04/07 B 3 2 0 1 > 8 02/04/07 B 4 2 0 0 > 9 03/04/07 B 4 0 0 3 > 10 C 0 0 0 0 > 11 01/04/07 C 4 2 3 2 > 12 02/04/07 C 0 0 4 2 > 13 03/04/07 C 1 1 0 5 > 14 D 0 0 0 0 > 15 01/04/07 D 7 6 0 3 > 16 02/04/07 D 0 0 0 9 > 17 03/04/07 D 4 11 0 0 > 18 E 0 0 0 0 > 19 01/04/07 E 4 3 0 0 > 20 02/04/07 E 4 0 4 0 > 21 03/04/07 E 8 1 0 0 > 22 F 0 0 0 0 > 23 01/04/07 F 0 1 0 0 > 24 02/04/07 F 8 2 0 1 > > So perhaps an output with less extraneous input would be better: > > > with(series[series$Date != "", ], > aggregate(list(verbal=verbal, self.harm=self.harm, > viol_obj=violence_objects, violence=violence), > list(Date, Location), > sum, na.rm=T) > ) > > Group.1 Group.2 verbal self.harm viol_obj violence > 1 01/04/07 A 7 1 0 3 > 2 02/04/07 A 8 2 0 1 > 3 03/04/07 A 0 0 0 2 > 4 01/04/07 B 3 2 0 1 > 5 02/04/07 B 4 2 0 0 > 6 03/04/07 B 4 0 0 3 > 7 01/04/07 C 4 2 3 2 > 8 02/04/07 C 0 0 4 2 > 9 03/04/07 C 1 1 0 5 > 10 01/04/07 D 7 6 0 3 > 11 02/04/07 D 0 0 0 9 > 12 03/04/07 D 4 11 0 0 > 13 01/04/07 E 4 3 0 0 > 14 02/04/07 E 4 0 4 0 > 15 03/04/07 E 8 1 0 0 > 16 01/04/07 F 0 1 0 0 > 17 02/04/07 F 8 2 0 1 >> >> BTW, why do you have empty rows? >> >> Regards >> Petr >> >> >> >> >>> Hello- >>> >>> I have a dataset which basically looks like this: >>> >>> Location Sex Date Time Verbal Self harm >>> Violence_objects Violence >>> A 1 1-4-2007 1800 3 0 >>> 1 3 >>> A 1 1-4-2007 1230 2 1 >>> 2 4 >>> D 2 2-4-2007 1100 0 4 >>> 0 0 >>> ... >>> >>> I've put a dput of the first section of the data at the end of this >>> email. Basically I have these data for several days across all of the >>> dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on >>> until 31-12-2009. The last four variables which you can see at the end >>> of the email are my dependent variables, they are different types of >>> violent and self harming behaviour shown by patients in a psychiatric >>> hospital. >>> >>> What I want to do is: >>> >>> A) sum each of the dependent variables for each of the dates (so e.g. >>> in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and >>> 3+4=7 for each of the variables) >>> >>> B) do this sum, but only in each location this time (location is the >>> first variable)- so the sum for 1-4-2007 in location A, sum for >>> 1-4-2007 in location B, and so on and so on. Because this is divided >>> across locations, some dates will have no data going into them and >>> will return 0 sums. Crucially I still want these dates to appear- so >>> e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 >>> 0 0, then 23-5-2008 0 0 0 0 again, and etc. >>> >>> I've had several abortive attempts and done some Googling but have got >>> nowhere. I'd greatly appreciate any advice. >>> >>> Many thanks, >>> Chris Beeley >>> (Institute of Mental Health, UK) >>> >>> >>> structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, >>> 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, >>> 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, >>> 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, >>> 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, >>> 4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class = >> "factor"), >>> Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, >>> 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, >>> 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, >>> 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, >>> 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L >>> ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, >>> 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, >>> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, >>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, >>> 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, >>> 4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07" >>> ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L, >>> 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, >>> 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, >>> 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, >>> 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, >>> 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, >>> 7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00", >>> "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00", >>> "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00", >>> "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00", >>> "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00", >>> "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00", >>> "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00", >>> "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00" >>> ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L, >>> 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, >>> NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, >>> NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, >>> 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, >>> 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, >>> NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, >>> 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, >>> 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, >>> NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, >>> 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, >>> 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, >>> NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, >>> 0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >>> 4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA, >>> 0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA, >>> 0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L, >>> NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, >>> 3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, >>> 0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA, >>> 0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names = >> c("Location", >>> "Sex", "Date", "Time", "verbal", "self.harm", "violence_objects", >>> "violence"), class = "data.frame", row.names = c(NA, -73L)) >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.