"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Before we start debating merits of proposals based on random reads, can > someone confirm that the sampling code actually does read randomly? I > looked at it yesterday; there is a comment that states that blocks to be > scanned are passed to the analyze function in physical order, and AFAICT > the function that chooses blocks does so based strictly on applying a > probability function to block numbers as it increments a counter. It > seems that any reading is actually sequential and not random, which > makes all the random_page_cost hand-waving null and void.
Hm. I'm curious just how much that behaves like a sequential scan actually. I think I'll do some experiments. Reading 1% (1267 read, 126733 skipped): 7748264us Reading 2% (2609 read, 125391 skipped): 12672025us Reading 5% (6502 read, 121498 skipped): 19005678us Reading 5% (6246 read, 121754 skipped): 18509770us Reading 10% (12975 read, 115025 skipped): 19305446us Reading 20% (25716 read, 102284 skipped): 18147151us Reading 50% (63656 read, 64344 skipped): 18089229us Reading 100% (128000 read, 0 skipped): 18173003us These numbers don't make much sense to me. It seems like 5% is about as slow as reading the whole file which is even worse than I expected. I thought I was being a bit pessimistic to think reading 5% would be as slow as reading 20% of the table. Anyone see anything wrong my my methodology?
#include <sys/types.h> #include <sys/stat.h> #include <sys/time.h> #include <time.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #define BLOCKSIZE 8192 int main(int argc, char *argv[], char *arge[]) { char *fn; int fd; int perc; struct stat statbuf; struct timeval tv1,tv2; off_t size, offset; char *buf[BLOCKSIZE]; int b_read=0, b_skipped=0; fn = argv[1]; perc = atoi(argv[2]); fd = open(fn, O_RDONLY); fstat(fd, &statbuf); size = statbuf.st_size; size = size/BLOCKSIZE*BLOCKSIZE; gettimeofday(&tv1, NULL); srandom(getpid()^tv1.tv_sec^tv1.tv_usec); for(offset=0;offset<size;offset+=BLOCKSIZE) { if (random()%100 < perc) { lseek(fd, offset, SEEK_SET); read(fd, buf, BLOCKSIZE); b_read++; } else { b_skipped++; } } gettimeofday(&tv2, NULL); fprintf(stderr, "Reading %d%% (%d read, %d skipped): %ldus\n", (int)perc, b_read, b_skipped, (tv2.tv_sec-tv1.tv_sec)*1000000 + (tv2.tv_usec-tv1.tv_usec) ); exit(0); }
-- greg
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org