See below
On Fri, 1 Jul 2016, Mark Shanks wrote:
Hi,
Imagine the two problems:
1) You have an event that occurs repeatedly over time. You want to
identify periods when the event occurs more frequently than the base
rate of occurrence. Ideally, you don't want to have to specify the
period (e.g., break into months), so the analysis can be sensitive to
scenarios such as many events happening only between, e.g., June 10 and
June 15 - even though the overall number of events for the month may not
be much greater than usual. Similarly, there may be a cluster of events
that occur from March 28 to April 3. Ideally, you want to pull out the
base rate of occurrence and highlight only the periods when the
frequency is less or greater than the base rate.
A good place to start is:
Siegmund, D. O., N. R. Zhang, and B. Yakir. "False discovery rate
for scanning statistics." Biometrika 98.4 (2011): 979-985.
and
Aldous, David. Probability approximations via the Poisson clumping
heuristic. Vol. 77. Springer Science & Business Media, 2013.
---
A nice illustration of how scan statistcis can be used is:
Aberdein, Jody, and David Spiegelhalter. "Have London's roads
become more dangerous for cyclists?." Significance 10.6 (2013):
46-48.
2) Events again occur repeatedly over time in an inconsistent way.
However, this time, the event has positive or negative outcomes - such
as a spot check of conformity to regulations. You again want to know
whether there is a group of negative outcomes close together in time.
This analysis should take into account the negative outcomes as well
though. E.g., if from June 10 to June 15 you get 5 negative outcomes and
no positive outcomes it should be flagged. On the other hand, if from
June 10 to June 15 you get 5 negative outcomes interspersed between many
positive outcomes it should be ignored.
I'm guessing that there is some statistical approach designed to look at
these types of issues. What is it called?
`Scan statistic' is a good search term. `Poisson clumping', too.
What package in R implements it? I basically just need to know where to
start.
There are some R packages.
CRAN has packages SNscan and graphscan, which sound like they
might interest you.
My BioConductor package geneRxCluster:
http://bioconductor.org/packages/release/bioc/html/geneRxCluster.html
seeks clusters in a binary sequence as described in detail at
http://bioinformatics.oxfordjournals.org/content/30/11/1493
HTH,
Chuck
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.