On Mon, 07 Jan 2013 15:20:57 +0000, Oscar Benjamin wrote: > There are sometimes good reasons to get a line of best fit by eye. In > particular if your data contains clusters that are hard to separate, > sometimes it's useful to just pick out roughly where you think a line > through a subset of the data is.
Cherry picking subsets of your data as well as line fitting by eye? Two wrongs do not make a right. If you're going to just invent a line based on where you think it should be, what do you need the data for? Just declare "this is the line I wish to believe in" and save yourself the time and energy of collecting the data in the first place. Your conclusion will be no less valid. How do you distinguish between "data contains clusters that are hard to separate" from "data doesn't fit a line at all"? Even if the data actually is linear, on what basis could we distinguish between the line you fit by eye (say) y = 2.5x + 3.7, and the line I fit by eye (say) y = 3.1x + 4.1? The line you assert on the basis of purely subjective judgement can be equally denied on the basis of subjective judgement. Anyone can fool themselves into placing a line through a subset of non- linear data. Or, sadly more often, *deliberately* cherry picking fake clusters in order to fool others. Here is a real world example of what happens when people pick out the data clusters that they like based on visual inspection: http://www.skepticalscience.com/images/TempEscalator.gif And not linear by any means, but related to the cherry picking theme: http://www.skepticalscience.com/pics/1_ArcticEscalator2012.gif To put it another way, when we fit patterns to data by eye, we can easily fool ourselves into seeing patterns that aren't there, or missing the patterns which are there. At best line fitting by eye is prone to honest errors; at worst, it is open to the most deliberate abuse. We have eyes and brains that evolved to spot the ripe fruit in trees, not to spot linear trends in noisy data, and fitting by eye is not safe or appropriate. -- Steven -- http://mail.python.org/mailman/listinfo/python-list