Danny, sounds like you already have a certain idea how a 'nugget' distribution could look like. Maybe you also could intentionally produce some experimental data having such distributions, harvest the related patterns from the microarray and then apply a method as it was described in http://www.cs.uwaterloo.ca/~shai/TALKS/NIPS07_prob_wkshp.pdf But this is an uneducated guess only.
Best Hugo On Saturday 05 February 2011 00:21:01 DB1984 wrote: > > Greg, Dennis - thanks for your input, I really appreciate the feedback, as it > is not easy to source. > > In terms of the data; I've described it as 20 columns, which is the smallest > dataset, but this can run to 320 columns, so in some cases there is likely > to be enough power to detect non-normality. That said, a better solution > would be useful. > > As a first approximation, I looked at the mean/median ratio to indicate > simple skew in the data - which suggested that most of the data was normally > distributed. I took the 'nuggets' to be those with a mean/median ratio in > the top or bottom 1% of the data. This was a small group - overall the data > appears relatively normally distributed within rows. > > The aim is really to find those nuggets with significantly non-normal > distributions. My hope was to be able to take the tails of the p-values for > Shapiro-Wilk, or some similar test, and find these enriched with nuggets. > This may not be an appropriately robust approach - but is there a better > option? > > One idea was to sort the data in each row, and perform a linear regression. > For normal distributions I am expecting the intercept to be close to the > mean. Using the (intercept-mean) and p-values for the fit of the regression > was again another way to filter out the nuggets in the dataset. > > If it helps, the nuggets I am expecting are either grouped 80% grouped > around the mean with 20% forming a uni-directional tail, or an approximate > bimodal distribution. > > As I'd imagine is obvious - I don't have an ideal solution to finding these > nuggets, and so coming up with the R code to do so is harder still. If > anybody has insight into this sort of problem, and can point me in the > direction of further reading, that would be helpful. If there is a > ready-made solution, even better! > > As I said, thanks for your time with this... > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.