Hi, 2012/11/8 Gilles Sadowski <gil...@harfang.homelinux.org>: > On Thu, Nov 08, 2012 at 09:39:00AM +0100, Thomas Neidhart wrote: >> Hi Patrick, >> >> On 11/07/2012 04:37 PM, Patrick Meyer wrote: >> > I agree that it would be nice to have a constructor that allows you to >> > specific the ranking algorithm only. >> > >> > As far as NaN and the Spearman correlation, maybe we should add a default >> > strategy of NaNStrategy.FAIL so that an exception would occur if any NaN is >> > encountered. R uses this treatment of missing data and forces users to >> > choose how to handle it. If we implemented something like listwise or >> > pairwise deletion it could be used in other classes too. As such, treatment >> > of missing data should be part of a larger discussion and handled in a more >> > comprehensive and systematic way. >> >> I think this additional option makes sense, but I forward this >> discussion to the dev mailing list where it is better suited. > > I'm wary of having CM handle "missing" data. > For one thing we'd have to define a "convention" to represent missing data. > There is no good way to do that in Java. Using NaN for this purpose in a > low-level library is not a good idea IMHO. > I agree with Gilles, here. If I remember correctly, R has a special value NA, or something similar, which differs from NaN. > > Then, any convention might not be > suitable for some user applications, which would lead such an application's > developer to filter the data anyway in order to change his representation to > CM's representation. Rather that calling two redundant filtering codes, I'd > rather assume that CM gets a clean input on which its algorithm can operate. > As usual, the input is subjected to precondition checks, and exceptions are > thrown if the data is not clean enough. > > In summary: data validation (in the sense of discarding input) should not be > done _before_ calling CM routines. > +1.
Sébastien > > Regards, > Gilles > >> Thomas >> >> > -----Original Message----- >> > From: Thomas Neidhart [mailto:thomas.neidh...@gmail.com] >> > Sent: Wednesday, November 07, 2012 8:09 AM >> > To: u...@commons.apache.org >> > Subject: Re: [math] correlation analysis with NaNs >> > >> > On 11/07/2012 01:38 PM, Patrick Meyer wrote: >> >> You are getting values like 2.5 because of the default ties strategy. >> >> If you do not want to use that method, create an instance of >> >> RankingAlgorithm with a different ties strategy and pass it to the >> >> constructor for the SpearmanCorrelation. This approach also gives you >> >> control over the method for dealing with NaNs. Something like, >> >> >> >> //create data matrix >> >> double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = >> >> new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new >> >> Array2DRowRealMatrix(); For(int i=0;i<column1.length;i++){ >> >> mydata.addToEntry(i, 0, column1[i]); >> >> mydata.addToEntry(i, 1, column2[i]); >> >> } >> >> >> >> //compute correlation >> >> NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED, >> >> TiesStrategy.RANDOM); SpearmanCorrelation spearman = new >> >> SpearmanCorrelation(ranking, mydata); >> >> >> >> Try that. >> > >> > Hi, >> > >> > this will not really help imho. >> > >> > As far as I can see, there are at least two problems with the current use >> > of >> > the RankingAlgorithm in the SpearmanCorrelation class: >> > >> > * there is no way to select the ranking algorithm in the constructor >> > without passing the values at the same time >> > * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes >> > the NaN only from the input array where it occurs but not in the >> > corresponding array, thus rendering it useless as it will result in >> > exceptions (array lengths differ) >> > >> > Would you be able to create an issue for this on the issue tracker and >> > provide the test case? >> > >> > Thanks, >> > >> > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org