On Wed, 27 Feb 2008 17:07:37 -0800, Paul Rubin wrote: > Steven D'Aprano <[EMAIL PROTECTED]> writes: >> Oh come on. With a function named "mean" that calculates the sum of a >> list of numbers and then divides by the number of items, what else >> could it be? > > You have a bunch of marbles you want to put into bins. The division > tells you how many marbles to put into each bin. That would be an > integer since you cannot cut up individual marbles.
(Actually you can. As a small child, one of my most precious possessions was a marble which had cracked into two halves.) No, that doesn't follow, because you don't get the result you want if the number of marbles is entered as Decimals or floats. Maybe the data came from a marble-counting device that always returns floats. You're expecting the function to magically know what you want to do with the result and return the right kind of answer, which is the wrong way to go about it. For example, there are situations where your data is given in integers, but the number you want is a float. # number of 20kg bags of flour per order >>> data = [5, 7, 20, 2, 7, 6, 1, 37, 3] >>> weights = [20*n for n in data] >>> mean(weights) 195.55555555555554 If I was using a library that arbitrarily decided to round the mean weight per order to 195kg, I'd report that as a bug. Maybe I want the next highest integer, not lowest. Maybe I do care about that extra 5/9th of a kilo. It simply isn't acceptable for the function to try to guess what I'm going to do with the result. >> You can always imagine corner cases where some programmer, somewhere, >> has some bizarre need for a mean() function that truncates when given a >> list of integers but not when given a list of floats. Making that the >> default makes life easy for the 0.1% corner cases and life harder for >> the 99.9% of regular cases, which is far from the Python philosophy. > > I think it's more important that a program never give a wrong answer, > than save a few keystrokes. So, that polymorphic mean function is a bit > scary. It might be best to throw an error if the args are all integers. > There is no definitely correct way to handle it so it's better to > require explicit directions. Of course there's a correct way to handle it. You write a function that returns the mathematical mean. And then, if you need special processing of that mean, (say) truncating if the numbers are all ints, or on Tuesdays, you do so afterwards: x = mean(data) if all(isinstance(n, int) for n in data) or today() == Tuesday: x = int(x) I suppose that if your application is always going to truncate the mean you might be justified in writing an optimized function that does that. But don't call it "truncated_mean", because that has a specific meaning to statisticians that is not the same as what you're talking about. Paul, I'm pretty sure you've publicly defended duck typing before. Now you're all scared of some imagined type non-safety that results from numeric coercions. I can't imagine why you think that this should be allowed: class Float(float): pass x = Float(1.0) mean([x, 2.0, 3.0, 5.0]) but this gives you the heebie-geebies: mean([1, 2.0, 3.0, 5.0]) As a general principle, I'd agree that arbitrarily coercing any old type into any other type is a bad idea. But in the specific case of numeric coercions, 99% of the time the Right Way is to treat all numbers identically, and then restrict the result if you want a restricted result, so the language should make that the easy case, and leave the 1% to the developer to write special code: def pmean(data): # Paul Rubin's mean """Returns the arithmetic mean of data, unless data is all ints, in which case returns the mean rounded to the nearest integer less than the arithmetic mean.""" s = sum(data) if isinstance(s, int): return s//len(data) else: return s/len(data) -- Steven -- http://mail.python.org/mailman/listinfo/python-list