Le 26/09/2010 21:05, Phil Steitz a écrit : > On 9/26/10 2:30 PM, Luc Maisonobe wrote: >> Le 26/09/2010 19:27, Mikkel Meyer Andersen a écrit : >>> 2010/9/26 Gilles Sadowski<gil...@harfang.homelinux.org>: >>>>> [...] >>>>> >>>>> 1) do nothing to check the array is the same between calls and >>>>> blindly >>>>> assumes it IS the same. Users would really need to call >>>>> clearCache >>>>> when they provide a new array >>>>> pros: very simple >>>>> cons: error-prone for the user as it relies only on reading and >>>>> understanding a documentation that would change with new version >>>>> >>>>> 2) check only array reference equality (with ==) to check the array >>>>> is the same as the previous call or not. Users would need to call >>>>> clearCache only if they use the same array but have changed its >>>>> content. >>>>> pros: trade-off between efficiency and reliability, >>>>> handles most cases properly >>>>> cons: may be wrong in corner cases >>>>> >>>>> 3) check array content using an hash code. Users would generally >>>>> don't >>>>> need to call clearCache at all so it would not be provided >>>>> pros: works in all cases >>>>> cons: add linear cost for array checking >>>>> >>>>> 4) remove the double[] values parameter from the API and use a >>>>> separate >>>>> addValues method, hence the class will reset its cache only when >>>>> addValues is called >>>>> pros: works in all cases >>>>> cons: Percentile could not implement UnivariateStatistic anymore >>>>> >>>>> My preference is choice 2. >>>>> >>>>> What do you think ? >>>> >>>> IIUC, the interface method ("evaluate") was designed around the >>>> assumption >>>> that successive calls are independent: the array argument is not >>>> encapsulated for further processing (e.g. caching). >>>> IMO, the equality check and the "clearCache" method look like >>>> workarounds, >>>> not a clean solution. >>>> As an example, what if a user calls "evaluate" several times >>>> alternating on >>>> two different arrays: >>>> >>>> ---CUT--- >>>> double[] a = new double[] {1, 2, 3}; >>>> double[] b = new double[] {1, 2, 3, 4, 5}; >>>> Percentile p = new Percentile(); >>>> >>>> dourble r; >>>> r = p.evaluate(a, 50); >>>> r = p.evaluate(b, 50); >>>> r = p.evaluate(a, 50); >>>> r = p.evaluate(b, 50); >>>> >>>> // etc. >>>> ---CUT--- >>>> >>>> Doesn't this kind of use-case nullify the expected optimization? >>>> >>>> If the array is going to be reused, the user call should reflect >>>> that fact; >>>> e.g. by passing the array in a constructor: >>>> >>>> ---CUT--- >>>> double[] a = new double[] {1, 2, 3}; >>>> double[] b = new double[] {1, 2, 3, 4, 5}; >>>> Percentile pA = new Percentile(a); >>>> Percentile pB = new Percentile(b); >>>> >>>> double r; >>>> r = pA.evaluate(50); >>>> r = pB.evaluate(50); >>>> r = pA.evaluate(50); >>>> r = pB.evaluate(50); >>>> ---CUT--- >>>> >>>> That way, later calls can benefit from whatever preprocessing was >>>> done in >>>> previous calls. >>>> The instance will always control all the information needed (e.g. >>>> after a >>>> call to an "addValues" method) for the processing without the need >>>> to rely >>>> on the user for calling "clearCache" whenever necessary. >>>> >>>> >>>> Gilles >>> >>> +1 >>> I think that is a really good idea and I agree on the points made. >> >> This proposal is point 4. It breaks the UnivariateStatistic API and it >> breaks what the user found interesting in this API, i.e. have a general >> statistics framework where one statistic can be replaced by another one. >> >> If you read again one of my earlier messages from today, we will combine >> this method (i.e. evaluate without values) and the >> UnivariateStatistics API. >> >> Perhaps we could add these new methods (i.e. addValues and evaluate >> without values) to UnivariateStatistics, but this can only be done on >> 3.0. >> > > We could add to AbstractUnivariateStatistics in 2.x and update the > interface in 3.0, if we can work out and agree on the setData() idea in > my last post. While only some statistics can benefit internally from > caching, I think the setData idea is consistent with > UnivariateStatistics, as this is the interface for stats that require > the full set of data to be stored in memory.
This is fine for me. Luc > > Phil >> Luc >> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org