> I posted row sizes (min/max/mean) of our largest data set in my original
> message, but had zero responses on the mailing list. The folks in IRC told
> me to wait it out, see if to rebalanced on its own (it didn't), or to run a
> repair on each node one at a time (didn't help), and that it wasn't a big
> concern until we had "dozens of GBs" worth of data.

Ok. It may not be a concern practically right now, but an unexplained
imbalance is not good. First off, is this the very latest 0.6 release
or else one of the 0.7 rc:s, or is this an old 0.6? Not that I
remember off hand whether there were any bugs fixed in the 0.6 series
that would explain this particular behavior, but it's probably a good
start to ask if you have the latest version.

Also, you mentioned originally that "Our row min/max/mean values are
mostly the same". I'm not entirely positive to what you are referring;
the important points I wanted to ask about are:

(1) Do you have "many" keys (say, thousands or more) so that there
should be no statistically significant imbalance between the nodes in
terms of the *number* of rows?

(2) How sure are you about the distribution of row sizes; is it
possible you have a small number of very large rows that are screwing
up the statistics?

-- 
/ Peter Schuller

Reply via email to