On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz <phil.ste...@gmail.com> wrote:
> 0) Did you or anyone else ever analyze the bigram data in the paper > using Fisher's test stats? > That bigram data isn't particularly interesting; any text will show similar effects. Others have tested Fisher's exact test, but only a few cases turned up where there was any mileage. The cost of Fisher's test makes it much less interesting for the text, genomic, classification and recommendation applications of G^2. 1) Is the bigram data from [1] available anywhere? > I don't think so. Any small technical text should exhibit similar characteristics. You can find more examples in my longer work on the subject: http://arxiv.org/abs/1207.1847 Most of these examples are based on publicly available data. > 1) Do you think a direct implementation of Fisher's test for 2x2 > designs and a monte carlo impl for r x c would be useful? I have > this in C from years ago and could translate it fairly easily. > I have no clue if people want this. G^2 is pretty well entrenched in text analysis and recommendations and there have been hundreds of citations to my original paper, many of which replicated the value of the test. As such, I wouldn't expect a lot of value in those applications. Other areas may well be a different story. A fully featured implementation of Fisher's exact test is pretty complex, however, since you have to take such different tacks at different data scales and with differently shaped tables.