No, if you want to work on that feel free, it should be pretty easy to add that 
option. However be aware that LLR is used in the  downsampling step so you 
don’t get all elements of llr(A’A) for reasons that keep the calculation at 
O(n) downsampling is based on number of non-zero elements in a row of both A 
and A’A keeping the highest LLR scoring elements. These are params that you can 
control in the current implementation.

For some types of analysis where you would like A’A downsampled based on a 
purely probabilistic metric like confidence in non-correlation it might be nice 
to have a threshold based downsampler where the threshold is some fraction of 
all elements or some confidence value rather than a fixed value of LLR, which 
is trivial to add but not very useful. This requires that we find a way to 
calculate the distribution parameters of LLR in A’A so a confidence threshold 
can be derived. I haven’t put a lot of thought into this but iirc LLR is 
Chi-square with 2 degrees of freedom (going from old brain cells here) and root 
LLR is normally distributed.  If there is some clever way to find the threshold 
without calculating all of rllr(A’A), which would be O(n^2), then the 
confidence threshold downsampling could be kept O(n) and this would be a very 
useful contribution.


On Dec 14, 2015, at 8:04 PM, Nikaash Puri <[email protected]> wrote:

Hi,

Just wondering whether there is support to use root Log Likelihood Ratio
using some sort of flag in the cooccurrencesIDSs function
in org.apache.mahout.math.cf.SimilarityAnalysis. Else, I can create and
issue and work on it to add said support.

Thank you,
Nikaash Puri

Reply via email to