On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz <phil.ste...@gmail.com> wrote:

> 0) Did you or anyone else ever analyze the bigram data in the paper
> using Fisher's test stats?
>

That bigram data isn't particularly interesting; any text will show similar
effects.

Others have tested Fisher's exact test, but only a few cases turned up
where there was any mileage.  The cost of Fisher's test makes it much less
interesting for the text, genomic, classification and recommendation
applications of G^2.

1) Is the bigram data from [1] available anywhere?
>

I don't think so.  Any small technical text should exhibit similar
characteristics.

You can find more examples in my longer work on the subject:

http://arxiv.org/abs/1207.1847

Most of these examples are based on publicly available data.


>  1) Do you think a direct implementation of Fisher's test for 2x2
> designs and a monte carlo impl for r x c would be useful?  I have
> this in C from years ago and could translate it fairly easily.
>

I have no clue if people want this.   G^2 is pretty well entrenched in text
analysis and recommendations and there have been hundreds of citations to
my original paper, many of which replicated the value of the test.  As
such, I wouldn't expect a lot of value in those applications.

Other areas may well be a different story.  A fully featured implementation
of Fisher's exact test is pretty complex, however, since you have to take
such different tacks at different data scales and with differently shaped
tables.

Reply via email to