Thanks a lot for your answer Achim, this helped a lot. I have done a lot of 
reading, following your recommendations and I think I have a better idea of 
what I should use. My dataset contains binary data on survival of the calf 
depending on the mass of the mother. We know that the probability of survival 
of the calf should vary according to the mass of the mother: 3 groups of mass 
expected, lower survival of the calves for small and large females, best 
survival of calf for intermediate-sized females. I want to identify at which 
masses those changes in survival occur.

I think the code I need to use in order to test what I want is something of 
that type:
gmass<-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2<-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

The second question, which is related I think to the first one is whether it 
makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a 
single parameter in the model (Mass), and this parameter is also the parameter 
we use as the order.by= parameter 
plot(gmass, functional=meanL2BB, aggregate=FALSE)
I think the whole point around questions 1 and 2  is that I don't understand 
the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional? I have seen that you 
discuss that in your CSDA(2006 & 2003) papers and, in the 2006 paper you say: 
"in situations where there is a shift in the parameters and then a second shift 
back, it is advantageous to aggregate over time using the range and ....". 
Which means, if I understand well that rangeBB would be adapted to the kind of 
test I want to perform. However, since I want to determine the timing of the 
peaks, I need my functional to produce a "time series plot", for example like 
meanL2BB does. Do you think I can use meanL2BB functional in my case or should 
I compute an "home-made functional" which would use the range of efp but with 
comp applied first and time after (is this possible???).

Fourth question: is it OK to only make a visual estimation of the "breakpoints" 
 from the peaks seen on the graph after plotting the efp or should I use the 
breakpoints() function to properly date the breakpoints??? I'm not sure this 
breakpoints() function can be applied to binary data?

Fifth question: I have noticed that the p-values I obtain after performing the 
sctest(gmass, functional=meanL2BB) for example are a bit different depending on 
if I introduce family=binomial as an argument in my gefp() call. Should I use 
this argument or is it used by default when you specify fit=glm??? 

Last question, you said in your previous message that I could look at the  
maxstat_test() from package "coin" for an interesting nonparametric alternative 
but I think this package does not allow estimation of more than one 
breakpoint???

Thanks heaps if you can help again with those issues,
Best,
Geraldine





 

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Achim Zeileis
Sent: 30. mai 2012 08:23
To: R-help@r-project.org
Subject: Re: [R] strucchange Fstats() example

On Tue, 29 May 2012, Mabille, Geraldine wrote:

<snip>

> In the second example, the authors state the presence of "at least" 
> two breakpoints. When plotting the F-statistics using the following 
> code, we see indeed two peaks in the F-statistics, that coincides with 
> the dates given by the authors: c.a 1973 and 1983 but when trying to 
> add those breakpoints to the time series, only one is taken into 
> account

The breakpoints() method for "Fstats" objects can just extract a single 
breakpoint. The reason is that maximizing the F statistics is equivalent to 
minimizing the residual sum of squares of a model with a single breakpoint. If 
you want to estimate more than a single breakpoint, you need to minimize the 
corresponding segmented sums of squares. This can be done with the formula 
method of breakpoints(), see ?breakpoints.

More specificially: In your example with breakpoints(fs, breaks = 2), the 
breaks argument is simply ignored. The method just does not have a breaks 
argument and it goes through ...

> We see that even though the F-statistics seem to show the existence of 
> 2 breakpoints, only one is detected by the breakpoints() function. 
> Does anyone know how this is possible? I'm totally new to strucchange 
> so it might well be something obvious I'm missing here!

Please have a closer look at the package's documentation and the corresponding 
papers. See citation("strucchange") for the most important references and the 
corresponding manual pages for more details. For the breakpoints issue you 
should probably start reading the CSDA paper.

> OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???

Testing for breakpoints can be done with the function gefp(). See its manual 
pages for references and details. The manual page just has a Poisson GLM 
example but the corresponding papers (in Stat Neerl and CSDA) also have binary 
response examples.

If you have a binary response and just want to test whether the proportion of 
successes changes across "time" (or some other variable of interest), then
maxstat_test() from package "coin" might be an interesting nonparametric 
alternative.

hth,
Z

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to