The question really is, why form groups when you already have the two, numerical continuous variables that you want? That is, what is the benefit of grouping? I can think of none. I personally think this is a historical thing that started when computers were unavavailable and it reduced the mathematics to do-able level. Today, the stats works without grouping.
Jim On Fri, Mar 26, 2010 at 09:30, Francisco de Castro <[email protected]>wrote: > Hi all, > > I have a question for the list regarding grouping (binning) of the > independent variable in a linear regression. This is routinely done > (at least in limnology) in studies involving so-called biomass > size-spectra. I'm aware of other (better) methods to fit non-linear > models. However, I need to compare my results with older literature > where this method is used widely, and I'd like to know first if the > method has a problem or if it is outright wrong. > > My independent variable is mean body size of the individuals of a > species (M) and the dependent is either biomass (B, g/m2) or > population density (D, indiv/m2) of the species. Body size is > lognormally distributed, and the number of species in the sample is > ~100. The model to fit is: D= aM^b. First, data are log-transformed in > order to apply linear least-squares regression. So the model becomes > log(D)= log(a)+ b log(M). The appropriateness of this transformation > and possible bias in the estimation of parameters have been discussed > before (Zar, Smith, others) so my question in not about that. After > log-transforming, sizes are grouped into even-spaced categories, and > the densities/biomasses for all sizes within a size group are summed > up. So, the independent variable becomes the center of each > log-size-bin, and the dependent becomes the sum of all log-densities > for each size-bin. Obviously, the number of data gets reduced from the > original N to the number of size groups/bins used. After grouping, the > log-log model is fitted by least-squares regression. > > So my questions are: > Is this binning of a log-transformed variable statistically > appropriate for this problem? > Shouldn't be better to use directly the size and density for each > species without any grouping? > > Thanks in advance for any suggestion or literature. > Cheers > > Francisco de Castro > Potsdam University >
