Le 26/09/2010 18:21, Ted Dunning a écrit : > I was about to write this suggestion down. > > Instead, I will simply concur. > > +1 > > The safe version can do the checksum or make a copy. Either is safe and the > costs are similar (O(n) on every call and no memory or O(1) on every call > after the first, O(n) memory).
OK. Then I will have a copy of the data in the class and for each evaluate method with a double[] values parameter there will also be a method without this parameter that will run on the already provided array. The implementations with the value parameter will have an implementation roughly similar to: public double evaluate(final double[] values, final double p) { checkCache(values); return evaluate(p); } private void checkCache(final double[] values) { final int hash = Arrays.hashcode(values); if ((values != originalValues) || (hash != originalHash)) { clearCache(); originalValues = values; originalHash = hash; copiedValues = values.clone(); } } Starting to work on it ... Luc > > I would tend toward the copy option. > > On Sun, Sep 26, 2010 at 8:39 AM, Mikkel Meyer Andersen <m...@mikl.dk> wrote: > >> Hi, >> >> Why not simply make 3) the default, but also supply a >> *IMSureWhatIMDoing-method only implementing 2) (it is so cheap that we >> might as well do it instead of not doing any check at all). This means >> that users who read the documentation can gain something when they >> explicitly ask for it, while users who don't read the docs are still >> safe. >> >> Cheers, Mikkel. >> >> 2010/9/26 Luc Maisonobe <luc.maison...@free.fr>: >>> Le 23/09/2010 20:57, Luc Maisonobe a écrit : >>>> Hi, >>>> >>>> Resuming this thread, as I will be soon able to work on this (I hope). >>>> I propose to start by solving >>>> <https://issues.apache.org/jira/browse/MATH-417> using a simple >>>> partition-based selection algorithm and preserving pivot information. >>>> This would help both the single call use-case (we don't sort everything) >>>> and the multiple call case (we don't redo on subsequent calls work >>>> already done in the first calls. >>> >>> The current API of the Percentile method is derived from >>> AbstractUnivariateStatistic and provides several evaluate methods that >>> have the data array as a parameter. >>> >>> This is compatible with the one-call use case without any semantic >>> change, but does not fit the multiple-calls use case seamlessly. >>> >>> If we want to cache some work already done on the array between calls >>> with different values of p (the percentile value), we have to check if >>> the array is still the same as the cached one on which we have done some >>> partitioning and saved some pivots. This can be done by storing a >>> reference to the original array in addition to a copy of its content >>> (which we reorganize) and with an "if (values == originalValues)" >>> statement. This would however not protect against changes in the array >>> elements themselves done by the user between calls. A better protection >>> could be to compute a hash code of the array content and compare it to >>> the original one at each call. However, this would add a linear cost. >>> The user who encountered the performances problems added an addValues >>> method to set up the array once and simply ignored the values passed as >>> an argument to the evaluate methods. This breaks the semantics of the >> API. >>> >>> So I see four choices about partitioning caching and would like to have >>> people express which one they prefer. A clearCache or similar public >>> method would be made available to users to ensure clean computation if >>> the caching mechanism gets on their way. >>> >>> 1) do nothing to check the array is the same between calls and blindly >>> assumes it IS the same. Users would really need to call clearCache >>> when they provide a new array >>> pros: very simple >>> cons: error-prone for the user as it relies only on reading and >>> understanding a documentation that would change with new version >>> >>> 2) check only array reference equality (with ==) to check the array >>> is the same as the previous call or not. Users would need to call >>> clearCache only if they use the same array but have changed its >>> content. >>> pros: trade-off between efficiency and reliability, >>> handles most cases properly >>> cons: may be wrong in corner cases >>> >>> 3) check array content using an hash code. Users would generally don't >>> need to call clearCache at all so it would not be provided >>> pros: works in all cases >>> cons: add linear cost for array checking >>> >>> 4) remove the double[] values parameter from the API and use a separate >>> addValues method, hence the class will reset its cache only when >>> addValues is called >>> pros: works in all cases >>> cons: Percentile could not implement UnivariateStatistic anymore >>> >>> My preference is choice 2. >>> >>> What do you think ? >>> >>> Luc >>> >>>> >>>> Partition-based algorithms are O(n) in expected time but not in worst >>>> case time. If end users ask for it later, we could improve again by >>>> adding the Musser's introselect that guarantees O(n) behavior even in >>>> worst case. >>>> >>>> Another possibility would be to have an additional method that would >>>> compute percentiles for several thresholds at once, but this would >>>> require returning an array of results and I think it would be difficult >>>> to mix with other statistics (mainly the DescriptiveStatistics.apply >>>> method, which is used by the users I talked to). >>>> >>>> Would this meet everyone concerns ? >>>> Luc >>>> >>>> >>>> Le 18/09/2010 21:50, Dimitri Pourbaix a écrit : >>>>> Ted, >>>>> >>>>>> O(n), not n >>>>>> >>>>>> Expected case is n + n/2 + n/4 ... < 2n >>>>> >>>>> Yes, that I am already more incline to buy. >>>>> >>>>> Dim. >>>>> >> ---------------------------------------------------------------------------- >>>>> >>>>> Dimitri Pourbaix * >>>>> Institut d'Astronomie et d'Astrophysique * Don't worry, be happy >>>>> CP 226, office 2.N4.211, building NO * and CARPE DIEM. >>>>> Universite Libre de Bruxelles * >>>>> Boulevard du Triomphe * Tel : +32-2-650.35.71 >>>>> B-1050 Bruxelles * Fax : +32-2-650.42.26 >>>>> http://sb9.astro.ulb.ac.be/~pourbaix * mailto: >> pourb...@astro.ulb.ac.be >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org