Le 30/06/2014 19:15, Phil Steitz a écrit : > > >> On Jun 30, 2014, at 9:59 AM, Gilles <gil...@harfang.homelinux.org> >> wrote: >> >> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote: >>>> On Jun 29, 2014, at 3:41 PM, Gilles >>>> <gil...@harfang.homelinux.org> wrote: >>>> >>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote: >>>>>>> On 6/29/14, 2:30 PM, Gilles wrote: >>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: >>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun, >>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz >>>>>>>> <phil.ste...@gmail.com> wrote: >>>>>>>> >>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le >>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit : >>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe >>>>>>>>>>> <l...@spaceroots.org> >>>>>>>>> wrote: >>>>>>>>>>>> Hi Venkat, >>>>>>>>>>>> >>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit >>>>>>>>>>>> : >>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc >>>>>>>>>>>>> Maisonobe <l...@spaceroots.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> While looking further in Percentile class >>>>>>>>>>>>>> for MATH-1120, I have found another problem >>>>>>>>>>>>>> in the current implementation. >>>>>>>>>>>>>> NaNStrategy.FIXED >>>>>>>>> should >>>>>>>>>>>>>> leave the NaNs in place, but at the end of >>>>>>>>>>>>>> the KthSelector.select method, a call to >>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of >>>>>>>>>>>>>> the small sub-array. What is really >>>>>>>>>>>>>> implemented here is therefore closer to >>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. >>>>>>>>>>>>>> This always occur in the test cases because >>>>>>>>>>>>>> they use very short arrays, and we >>>>>>>>>>>>>> directly >>>>>>>>> switch to >>>>>>>>>>>>>> this part of the select method. >>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If >>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you >>>>>>>>>>>>> need something to indicate beyond this for >>>>>>>>>>>>> NaN. >>>>>>>>>>>> Well, we can just keep the NaN themselves and >>>>>>>>>>>> handled them appropriately, hoping not to trash >>>>>>>>>>>> performances too much. >>>>>>>>>>> Agreed. >>>>>>>>>>> >>>>>>>>>>>>> What is the test input you see an issue and >>>>>>>>>>>>> what is the actual error you are seeing. >>>>>>>>>>>>> Please share the test case. >>>>>>>>>>>> Just look at >>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The >>>>>>>>>>>> first check in the test corresponds to a >>>>>>>>>>>> Percentile configuration at 50% percentile, and >>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number >>>>>>>>>>>> of entries, so the 50% percentile is exactly >>>>>>>>>>>> one of the entries: the one at index 5 in the >>>>>>>>>>>> final array. >>>>>>>>>>>> >>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, >>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the >>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be >>>>>>>>>>>> modified at all in the selection algorithm and >>>>>>>>>>>> the result for 50% should be the first NaN of >>>>>>>>>>>> the array, at index 5. In fact, due to the >>>>>>>>>>>> Arrays.sort, we *do* reorder the array into { >>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the >>>>>>>>>>>> result is 5. >>>>>>>>>>>> >>>>>>>>>>>> Agreed. just verified by putting back the >>>>>>>>>>>> earlier insertionSort >>>>>>>>> function. >>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile >>>>>>>>>>>> above 67%, we get as a result >>>>>>>>>>>> Double.POSITIVE_INFINITY instead of >>>>>>>>>>>> Double.NaN. >>>>>>>>>>>> >>>>>>>>>>>> If we agree to leave FIXED as unchanged >>>>>>>>>>>> behaviour with your >>>>>>>>> insertionSort >>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be >>>>>>>>>>> allowed for transformation >>>>>>>>> of >>>>>>>>>>> NaN to +/-Inf >>>>>>>>>> I'm fine with it, as long as we clearly state it in >>>>>>>>>> the NaNStrategy documentation, saying somethig >>>>>>>>>> like: >>>>>>>>>> >>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are >>>>>>>>>> effecively *replaced* by +/- infinity, hence the >>>>>>>>>> results will be computed as if infinities were >>>>>>>>>> there in the first place. >>>>>>>>> -1 - this is mixing concerns. NaNStrategy exists for >>>>>>>>> one specific purpose - to turn extend the ordering on >>>>>>>>> doubles a total ordering. Strategies prescribe only >>>>>>>>> how NaNs are to be treated in the ordering. Side >>>>>>>>> effects on the input array don't make sense in this >>>>>>>>> context. The use case for which this class was >>>>>>>>> created was rank transformations. Returning >>>>>>>>> infinities in rank transformations would blow up >>>>>>>>> computations in these classes. If a floating point >>>>>>>>> computations need to transform NaNs, the specific >>>>>>>>> stats / other classes that do this transformation >>>>>>>>> should perform and document the behavior. >>>>>>>>> >>>>>>>>> Phil >>>>>>>> OK. Agreed realized it later. Hence i have not >>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have >>>>>>>> added a separate transformer to transform NaNs but it >>>>>>>> uses NanStrategy. Please let me know on this as this >>>>>>>> trasnformation itself can be used in different places. >>>>>>> >>>>>>> Honestly, I don't see the value of this. I would be more >>>>>>> favorable to an addition to MathArrays supporting NaN (or >>>>>>> infinity) filtering / transformation. Several years back >>>>>>> we made the decision to just let NaNs propagate in >>>>>>> computations, which basically means that if you feed NaNs >>>>>>> to [math] you are going to get NaNs out in results. If we >>>>>>> do get into the business of filtering NaNs (or infinities >>>>>>> for that matter), I think it is probably best to do that >>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs >>>>>>> with NaN or infinity-handling handling descriptions. >>>>>>> That would be a huge task and would likely end up >>>>>>> bleeding implementation detail into the APIs. As I said >>>>>>> above, NaNStrategy has to exist for rank transformations >>>>>>> to be well-defined. NaN-handling in floating point >>>>>>> computations is defined and as long as we fully document >>>>>>> what our algorithms do, I think it is fair enough to "let >>>>>>> the NaNs fall where they may" - which basically means >>>>>>> users need to do the filtering / transformation >>>>>>> themselves. >>>>>> >>>>>> So, do I understand correctly that it's OK to add the >>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in >>>>>> "MathArrays"? Or does this thread conclude that we don't >>>>>> have a use for this within Commons Math? >>>>> >>>>> You raise a good point here. Personally, I don't see an >>>>> internal use and if we are sticking with MathArrays including >>>>> only things we have internal use for, I would say no, don't >>>>> include them. >>>> >>>> A third way to look at it would be that since some algorithms >>>> require being passed "clean" data (e.g. without NaNs), we could >>>> provide some simple means to do so. A user could then write, >>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 }; >>>> >>>> Percentile p = new Percentile(); double q = >>>> p.evaluate(MathArrays.replace(data, Double.NaN, >>>> Double.POSITIVE_INFINITY), 0.1); --- >>>> >>>> Then, if we provide that utility, is it still necessary to >>>> complicate the Percentile code with a NaNStrategy parameter? >>> >>> I may be missing something, but it seems to me that percentile >>> does need a NaNStrategy in the presence if NaNs like the other >>> order statistics do (basically to define their position In the >>> ordering). It doesn't logically need more than that. I would >>> be ok defaulting the strategy or even returning NaN in the >>> presence of NaNs (or whenever interpolation involves a NaN). All >>> of this can be documented. >> >> I am _surely_ missing something because I don't figure out how NaNs >> can be useful when computing those statistics... :-} > > Good point. Basically the different strategies allow you to > effectively ignore or work around them so stats can be well-defined. > Without NaNStrategy, order statistics including NaNs are undefined.
The problem with Percentile is that the elements are both reordered *and* used when interpolating between two values. Tor the rank methods, only reordering was needed so the various NaNStrategies could be implemented by replacing at will with +/- infinity and a classical sort would do the ordering for us. Here, we do interpolate on the values, so if we first replace NaN with +infinity to put them at the end, we may interpolate between a finite value and +inf and have +inf as a result, whereas if the NaN "value" was preserved and some magic used to sort the NaN, we would end up interpolating with a NaN and get a NaN as a result. Setting up the magic for MINIMAL and MAXIMAL seems achievable withing the sort, as we already have a specific sort (in fact not a sort but a selection algorithm, which is the core of the class). Setting up the magic for FIXED seems more difficult. So I guess implementing NaNStrategies may be done in different ways depending on how they will be used by the calling algorithm. I don't yet know if we can set up a general method to be used for all statistics. Looking only at Percentile, replacement with +/- infinity already seems to not be an appropriate solution. best regards, Luc > > > Phil > >> >>> I still don't see a within-math use for recoding methods; though >>> I do see the value from a user perspective. The problem is if we >>> open this door, we risk MathArrays becoming a mini commons lang. >> >> If Commons Lang already provides a feature, then we indeed don't >> need to offer the same within CM (if there is no internal use). Can >> someone confirm? >> >> >> Otherwise we could define a new inner interface in MathArrays, >> similar to "MathArrays.Function": --- public class MathArrays { >> >> // ... >> >> public interface Transformer { /** Transform the whole array */ >> double[] transform(double[] input); /** Transform the array in the >> interval [from, to) and return the whole array */ double[] >> transform(double[] input, int from, int to); /** Transform the >> array in the interval [from, to) and return the transformed >> sub-array */ double[] transformAndSplice(double[] input, int from, >> int to); } } --- >> >> Then we can provide "replace" and "remove" transformers through >> factory methods (example/untested code): --- public class >> MathArrays { >> >> // ... >> >> public static Transformer createReplaceTransformer(final double >> old, final double replace) { return new Transformer() { public >> double[] transform(double[] input) { final int len = input.length; >> final double[] output = new double[len]; for (int i = 0; i < len; >> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? >> replace : input[i]; } return output; } >> >> // etc. }; } } --- >> >> >> Gilles >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org