On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem <potek...@bnl.gov> wrote:
> Well I know the cache is there for a reason, I just can't explain the factor > of 4 when I run my queries on a hot vs cold cache. My queries are actually a > chain of one on an inverted index, which produces a tuple of keys to be used > in the "main" query. The inverted index query should be downright trivial. > > I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing > something? Why such a large factor? (simplified for discussion purposes, not necessarily exhaustive description of.. ) Path in the cold key cache case : a) check all bloom filters, 1 per sstable in the CF, which is in memory b) read the index file (not in memory) and traverse index for every sstable which returns positive in a) c) read the actual data file once for every sstable Path in the hot key cache case : a) read list of filenames and offsets from key cache b) read the actual data file You will notice that the former involves a lot more seeking than the latter, especially if you have "many" sstables. This seeking almost certainly is the cause of your observed difference. If you graph I/O throughput in the two different cases, you will almost certainly see yourself doing more (slow) I/O in the cold cache case. Memory spent on key cache is usually relatively well spent, for this reason. =Rob