Sorry, a typo in my reply below. See at "#######". On 12-Jan-2015 11:12:43 Ted Harding wrote: > On 12-Jan-2015 10:32:41 Erik B Svensson wrote: >> Hello >> I've got a problem I don't know how to solve. I have got a dataset that >> contains age intervals (age groups) of people and the number of persons in >> each age group each year (y1994-y1996). The number of persons varies each >> year. I only have access to the age intervals, not the age of each person, >> which would make things easier. >> >> I want to know the median age interval (not the median number) for each >> year. Let's say that in y1994 23 corresponds to the median age interval >> "45-54", I want to "45-54" as a result. How is that done? >> >> This is the sample dataset: >> > agegrp <- > c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74", > "75-84","84-") > y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) > y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) > y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) > >> I look forward to your response >> >> Best regards, >> Erik Svensson > > In principle, this is straightforward. But in ##############practice you may > need to be careful about how to deal with borderline cases -- and > about what you mean by "median age interval". > The underlying idea is based on: > > cumsum(y1994)/sum(y1994) > # [1] 0.00000000 0.02183406 0.05240175 0.09170306 0.20087336 > # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.00000000 > > Thus age intervals 1-7 ("<1" - "45-64") contain less that 50% > (0.49344978...), though "45-64" almost gets there. However, > age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence > the median age is within "49-64". ####### Should be: age groups 1-8 ("<1" - 55-64") contain more than 50%. Hence the median age is within "55-64".
> Implementing the above as a procedure: > > agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)] > # [1] "55-64" > > Note that the "obvious solution": > > agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))] > # [1] "45-54" > > gives an incorrect answer, since with these data it returns a group > whose maximum age is below the median. This is because the "<=" is > satisfied by "<" also. > > Hoping this helps! > Ted. > > ------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@wlandres.net> > Date: 12-Jan-2015 Time: 11:12:39 > This message was sent by XFMail > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@wlandres.net> Date: 12-Jan-2015 Time: 11:21:11 This message was sent by XFMail ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.