On Fri, 1 Jun 2012, Mabille, Geraldine wrote:

Hi and thanks again for your  answer. I have just a last question regarding the 
choice of the functional...if you have time to help on that again.

I have tried running the sctest using the functionals you recommended
sctest(gmass2,functional=meanL2BB)
sctest(gmass2,functional=rangeBB)
sctest(gmass2,functional=supLM(0.1))

and I have a question regarding the meanL2BB test. The P-value obtained for this test is P=0.18 and the plot obtained is given as an attachment to the message. What I don't understand is why the overall test for meanL2BB is not significant while we see on the graph at least two peaks above the red line?

This is discussed at the end of Section 5.1 (middle of page 3001) in the 2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ of the process, not the _maximum_. Hence, the mean (visualized by the dashed horizontal line) needs to exceed the critical value (red horizontal line) to obtain a significant result.

Best,
Z

The other two tests (rangeBB and supLM) are both significant at P=0.03.

Thanks again for your precious help,
Geraldine

-----Original Message-----
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: 1. juni 2012 08:56
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Thu, 31 May 2012, Mabille, Geraldine wrote:

Thanks a lot for your answer Achim, this helped a lot. I have done a
lot of reading, following your recommendations and I think I have a
better idea of what I should use. My dataset contains binary data on
survival of the calf depending on the mass of the mother. We know that
the probability of survival of the calf should vary according to the
mass of the mother: 3 groups of mass expected, lower survival of the
calves for small and large females, best survival of calf for
intermediate-sized females. I want to identify at which masses those
changes in survival occur.

I think the code I need to use in order to test what I want is something of 
that type:
gmass<-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2<-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

gmass2 will be appropriate if under the null hypothesis there is a constant 
probability of survival and under the alternative there are different groups 
each of which has a constant probability of survival.

gmass will be appropriate if under the null hypothesis the logit of the 
probability of survival increases/decreases linearly with mass - and under the 
alternative there are different groups each of which has a probability that 
increases/decreases with mass.

From your description above I suspect that gmass2 is what you are looking for.

The second question, which is related I think to the first one is
whether it makes sense to plot the gmass model as aggregate=FALSE,
knowing that we have a single parameter in the model (Mass),

In the gmass2 model, there is a single parameter (probability of survival) 
which may or may not change across mass.

In the gmass model, there are two parameters (intercept and slope) which may or 
may not vary across mass.

and this parameter is also the parameter we use as the order.by=
parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think
the whole point around questions 1 and 2 is that I don't understand
the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional?

Always difficult because there are no strong optimality results. If you test 
just a single parameter (i.e., gmass2), then I would expect that most 
functionals should lead to similar performance. I would try

plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = 
meanL2BB) plot(gmass2, functional = supLM(0.1))

where the latter two would probably have somewhat higher power. But the rangeBB 
might also work rather well here...

I have seen that you discuss that in your CSDA(2006 & 2003) papers
and, in the 2006 paper you say: "in situations where there is a shift
in the parameters and then a second shift back, it is advantageous to
aggregate over time using the range and ....". Which means, if I
understand well that rangeBB would be adapted to the kind of test I want to 
perform.
However, since I want to determine the timing of the peaks, I need my
functional to produce a "time series plot", for example like meanL2BB
does. Do you think I can use meanL2BB functional in my case or should
I compute an "home-made functional" which would use the range of efp
but with comp applied first and time after (is this possible???).

When you test only a single parameter, then you don't need to aggregate across 
the components anyway because there is just a single one.

Fourth question: is it OK to only make a visual estimation of the
"breakpoints"  from the peaks seen on the graph after plotting the efp
or should I use the breakpoints() function to properly date the
breakpoints??? I'm not sure this breakpoints() function can be applied
to binary data?

In the "fxregime" package there is an unexported gbreakpoints() function that 
can be used. For example, if your data is in a data.frame d:

gbp <- fxregime:::gbreakpoints(Success ~ 1, data = d,
  order.by = d$Mass, fit = glm, family = binomial, h = 20, ic = "BIC")
plot(gbp)
breakpoints(gbp)

Note that the code is not optimized for glm and hence can be rather slow - 
extremely slow if you have many observations (more than a few hundred).
How many observations do you have?

Fifth question: I have noticed that the p-values I obtain after
performing the sctest(gmass, functional=meanL2BB) for example are a
bit different depending on if I introduce family=binomial as an
argument in my gefp() call. Should I use this argument or is it used
by default when you specify fit=glm???

If you want to fit a binomial model yes. Otherwise a linear regression is used. 
(The 'family' is simply passed on to glm.)

Last question, you said in your previous message that I could look at
the maxstat_test() from package "coin" for an interesting
nonparametric alternative but I think this package does not allow
estimation of more than one breakpoint???

True, it just uses a single breakpoint. You can apply it recursively, though, if you use 
the ctree() function from the "party" package and set the control parameters 
accordingly.

Best,
Z



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to