Hello again John, I was going to suggest that you just use qbinom to generate the expected number of extinctions. For example, for the family with 80 spp the central 95% expectation is:
qbinom(c(0.025, 0.975), 80, 0.0748) which gives 2 - 11 spp. If you wanted to do look across a large number of families you'd need to deal with multiple comparison error but as a quick first look it might be helpful. However, I've just got a copy of teh paper and it seems that the authors are calculating something different to a simple binomial expecation: they are differentiating between high-risk (red listed) and low-risk species within a family. They state that this equation (expressed here in R-ese)... choose(N, R) * p^R * b^(N - R) ...gives the probabilitiy of an entire family becoming extinct, where N is number of spp in family; R is number of those that are red listed; p is extinction probability for red list spp (presumably over some period but I haven't read the paper properly yet); b is extinction probability for other spp. Then, in their simulations they hold b constant but play around with a range of values for p. So this sounds a bit different to what you originally posted as your objective (?) Michael On 15 October 2010 22:49, Michael Bedward <michael.bedw...@gmail.com> wrote: > Hi John, > > The word "species" attracted my attention :) > > Like Dennis, I'm not sure I understand your idea properly. In > particular, I don't see what you need the simulation for. > > If family F has Fn species, your random expectation is that p * Fn of > them will be at risk (p = 0.0748). The variance on that expectation > will be p * (1-p) * Fn. > > If you do your simulation that's the result you'll get. Perhaps to > initial identify families with disproportionate observed extinction > rates all you need is the dbinom function ? > > Michael > > > On 15 October 2010 22:29, John Haart <anothe...@me.com> wrote: >> Hi Denis and list >> >> Thanks for this , and sorry for not providing enough information >> >> First let me put the study into a bit more context : - >> >> I know the number of species at risk in each family, what i am asking is >> "Is risk random according to family or do certain families have a >> disproportionate number of at risk species?" >> >> My idea was to randomly allocate risk to the families based on the criteria >> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" >> and see if there was a significant difference. >> >> So in answer to your questions, (assuming my method is correct !) >> >>> Is this over all families, or within a particular family? If the former, why >>> does a distinction of family matter? >> >> Within a particular family - this is because i am looking to see if risk in >> the "observed" data set is random in respect to family so this will provide >> the baseline to compare against. >> >>> I guess you've stated the p, but what's the n? The number of species in each >>> family? >> >> This varies largely, for instance i have some families that are monotypic >> (with 1 species) and then i have other families with 100+ species >> >> >>> Assuming you have multiple families, do you want separate simulations per >>> family, or do you want to do some sort of weighting (perhaps proportional to >>> size) over all families? >> >> I am assuming i want some sort of weighting. This is because i am wanting to >> calculate the number of species expected to be at risk in EACH family under >> the random binomial distribution ( assuming every species has a 7.48% chance >> of being at risk. >> >> Thanks >> >> John >> >> >> >> >> On 15 Oct 2010, at 11:19, Dennis Murphy wrote: >> >> Hi: >> >> I don't believe you've provided quite enough information just yet... >> >> On Fri, Oct 15, 2010 at 2:22 AM, John Haart <anothe...@me.com> wrote: >> >>> Dear List, >>> >>> I am doing some simulation in R and need basic help! >>> >>> I have a list of animal families for which i know the number of species in >>> each family. >>> >>> I am working under the assumption that a species has a 7.48% chance of >>> being at risk. >>> >> >> Is this over all families, or within a particular family? If the former, why >> does a distinction of family matter? >> >>> >>> I want to simulate the number of species expected to be at risk under a >>> random binomial distribution with 10,000 randomizations. >>> >> >> I guess you've stated the p, but what's the n? The number of species in each >> family? If you're simulating on a family by family basis, then it would seem >> that a binomial(nspecies, 0.0748) distribution would be the reference. >> Assuming you have multiple families, do you want separate simulations per >> family, or do you want to do some sort of weighting (perhaps proportional to >> size) over all families? The latter is doable, but it would require a >> two-stage simulation: one to randomly select a family and then to randomly >> select a species. >> >> Dennis >> >> >>> >>> I am relatively knew to this field and would greatly appreciate a >>> "idiot-proof" response, I.e how should the data be entered into R? I was >>> thinking of using read.table, header = T, where the table has F = Family >>> Name, and SP = Number of species in that family? >>> >>> John >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > On 15 October 2010 23:34, Michael Bedward <michael.bedw...@gmail.com> wrote: > Hi John, > > I haven't read that particular paper but in answer to your question... > >> So if i do this for all the families it will be the same as doing the >> simulation experiment >> outline in the method above? > > Yes :) > > Michael > > > On 15 October 2010 23:18, John Haart <anothe...@me.com> wrote: >> Hi Michael, >> >> Thanks for this - the reason i am following this approach is that it >> appeared in a paper i was reading, and i thought it was a interesting angle >> to take >> >> The paper is >> >> Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of >> angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053. >> >> and the specific method i am following states :- >> >>> We calculated the number of species expected to be at risk in each family >>> under a random binomial distribution in 10 000 randomizations [generated >>> using R version 2.6.0 (R Development Team 2007)] assuming every species has >>> a 7.48% chance of being at risk. >> >> I guess the reason i am doing the simulation is because i am not hugely >> statistically minded and the paper was asking the same question i am >> interested in answering :). >> >> So following your approach - >> >>> if family F has Fn species, your random expectation is that p * Fn of >>> them will be at risk (p = 0.0748). The variance on that expectation >>> will be p * (1-p) * Fn. >> >> >> Family f = Bromeliaceae , with Fn = 80, p=0.0748 >> random expectation = p*Fn = (0.0748*80) = 5.984 >> variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968 >> >> So the random expectation is that the Bromeliaceae will have 6 species at >> risk, if risk is assigned randomly? >> >> So if i do this for all the families it will be the same as doing the >> simulation experiment outline in the method above? >> >> Thanks >> >> John >> >> >> >> >> On 15 Oct 2010, at 12:49, Michael Bedward wrote: >> >> Hi John, >> >> The word "species" attracted my attention :) >> >> Like Dennis, I'm not sure I understand your idea properly. In >> particular, I don't see what you need the simulation for. >> >> If family F has Fn species, your random expectation is that p * Fn of >> them will be at risk (p = 0.0748). The variance on that expectation >> will be p * (1-p) * Fn. >> >> If you do your simulation that's the result you'll get. Perhaps to >> initial identify families with disproportionate observed extinction >> rates all you need is the dbinom function ? >> >> Michael >> >> >> On 15 October 2010 22:29, John Haart <anothe...@me.com> wrote: >>> Hi Denis and list >>> >>> Thanks for this , and sorry for not providing enough information >>> >>> First let me put the study into a bit more context : - >>> >>> I know the number of species at risk in each family, what i am asking is >>> "Is risk random according to family or do certain families have a >>> disproportionate number of at risk species?" >>> >>> My idea was to randomly allocate risk to the families based on the criteria >>> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" >>> and see if there was a significant difference. >>> >>> So in answer to your questions, (assuming my method is correct !) >>> >>>> Is this over all families, or within a particular family? If the former, >>>> why >>>> does a distinction of family matter? >>> >>> Within a particular family - this is because i am looking to see if risk >>> in the "observed" data set is random in respect to family so this will >>> provide the baseline to compare against. >>> >>>> I guess you've stated the p, but what's the n? The number of species in >>>> each >>>> family? >>> >>> This varies largely, for instance i have some families that are monotypic >>> (with 1 species) and then i have other families with 100+ species >>> >>> >>>> Assuming you have multiple families, do you want separate simulations per >>>> family, or do you want to do some sort of weighting (perhaps proportional >>>> to >>>> size) over all families? >>> >>> I am assuming i want some sort of weighting. This is because i am wanting >>> to calculate the number of species expected to be at risk in EACH family >>> under the random binomial distribution ( assuming every species has a 7.48% >>> chance of being at risk. >>> >>> Thanks >>> >>> John >>> >>> >>> >>> >>> On 15 Oct 2010, at 11:19, Dennis Murphy wrote: >>> >>> Hi: >>> >>> I don't believe you've provided quite enough information just yet... >>> >>> On Fri, Oct 15, 2010 at 2:22 AM, John Haart <anothe...@me.com> wrote: >>> >>>> Dear List, >>>> >>>> I am doing some simulation in R and need basic help! >>>> >>>> I have a list of animal families for which i know the number of species in >>>> each family. >>>> >>>> I am working under the assumption that a species has a 7.48% chance of >>>> being at risk. >>>> >>> >>> Is this over all families, or within a particular family? If the former, why >>> does a distinction of family matter? >>> >>>> >>>> I want to simulate the number of species expected to be at risk under a >>>> random binomial distribution with 10,000 randomizations. >>>> >>> >>> I guess you've stated the p, but what's the n? The number of species in each >>> family? If you're simulating on a family by family basis, then it would seem >>> that a binomial(nspecies, 0.0748) distribution would be the reference. >>> Assuming you have multiple families, do you want separate simulations per >>> family, or do you want to do some sort of weighting (perhaps proportional to >>> size) over all families? The latter is doable, but it would require a >>> two-stage simulation: one to randomly select a family and then to randomly >>> select a species. >>> >>> Dennis >>> >>> >>>> >>>> I am relatively knew to this field and would greatly appreciate a >>>> "idiot-proof" response, I.e how should the data be entered into R? I was >>>> thinking of using read.table, header = T, where the table has F = Family >>>> Name, and SP = Number of species in that family? >>>> >>>> John >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.