I should have done more research before asking the question. I mean real research, too :)
I did a before repair, after repair, and after scrub cfstat. On a hunch I also did a before/after repair but with no scrub - instead I left the cluster alone for the length of time that a scrub normally takes (which can be hours on our dataset). It turns out that in all probability it's just a waiting game. The bloom filter stats were relatively identical at the end of the equivalent time period, as was the query performance. I guess I just needed to wait longer for the streamed files to "settle" or some such. Thanks Charles On Fri, Sep 28, 2012 at 7:20 AM, Charles Brophy <cbro...@zulily.com> wrote: > Odd indeed. > > 1) It is observable after the compactions are through and the system has > "settled" > 2) We're using SizeTiered strategy > 3) CentOS 6 & Oracle JVM 1.6.31 > > I'll do a repair and get some before/after stats to answer your remaining > questions. > > Thanks Aaron > > On Wed, Sep 26, 2012 at 2:51 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> Sounds very odd. >> >> Is read performance degrading _after_ repair and compactions that >> normally result have completed ? >> What Compaction Strategy ? >> What OS and JVM ? >> >> What are are the bloom filter false positive stats from cf stats ? >> >> Do you have some read latency numbers from cfstats ? >> Also, could you take a look at cfhistograms ? >> >> Cheers >> >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 26/09/2012, at 3:05 AM, Charles Brophy <cbro...@zulily.com> wrote: >> >> Hey guys, >> >> I've begun to notice that read operations take a performance nose-dive >> after a standard (full) repair of a fairly large column family: ~11 million >> records. Interestingly, I've then noticed that read performance returns to >> normal after a full scrub of the column family. Is it possible that the >> repair operation is not correctly establishing the bloom filter afterwards? >> I've noticed an interesting note of the scrub operation is that it will >> "rebuild sstables with correct bloom filters" which is what is leading me >> to this conclusion. Does this make sense? >> >> I'm using 1.1.3 and Oracle JDK 1.6.31 >> The column family is a stanard type and I've noticed this exact behavior >> regardless of the key/column/value serializers in use. >> >> Charles >> >> >> >