On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: >> >> It is not obvious to me that the parabola is wrong. I've certainly seen >> cases where reading every 2nd or 3rd block (either stochastically, or >> modulus) actually does take longer than reading every block, because it >> defeats read-ahead. But it depends on a lot on your kernel version and >> your kernel settings and your file system and probably other things as well. >> > > Well, that's an interesting point, too. Maybe we need another graph that > also shows the actual runtime of a bitmap scan and a sequential scan. > I've did some low level IO benchmarking, and I actually get 13 times slower to read every 3rd block than every block using CentOS6.9 with ext4 and the setting: blockdev --setra 8192 /dev/sdb1 On some virtualized storage which I don't know the details of, but it behaves as if it were RAID/JBOD with around 6 independent spindles.. If I pick the 1/3 of the blocks to read stochastically rather than by modulus, it is only 2 times slower than reading all of them, I assume because having occasional reads which are adjacent to each other does make read-ahead kick in, while evenly spaced never-adjacent reads does not. This is probably a better model of how bitmap table scans actually work, as there is no general reason to think they would be evenly spaced and non-adjacent. So this result is in reasonable agreement with how the current cost estimation works, the parabola peaks at about twice the cost as the sequence scan. I used a file of about 10GB, because I happened to have one laying around. ## read every block ($_%3>5 is never true) sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c 1295683 8192 4317 0 real 0m38.329s ## read every 3rd block sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c 431894 8192 1439 0 real 8m54.698s time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x=""; next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024; print length $x} '|uniq -c 431710 8192 1434 0 real 1m23.130s Dropping the caches is a reasonable way to do this type of benchmark, because it simulates what would happen if your data set was much larger than RAM, without needing to actually use a data set much larger than RAM. It would be interesting to see what other people get for similar low level tests, as well actual bitmap scans. Cheers, Jeff