On Mon, May 6, 2019 at 8:15 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> Nope, that's not how it works. It's the array of batches that gets > sliced, not the batches themselves. > > It does slightly increase the amount of data we need to shuffle between > the temp files, because we can't write the data directly to batches in > "future" slices. But that amplification is capped to ~2.2x (compared to > the ~1.4x in master) - I've shared some measurements in [1]. > > [1] > https://www.postgresql.org/message-id/20190428141901.5dsbge2ka3rxmpk6%40development > > Cool, I misunderstood. I looked at the code again today, and, at the email thread where you measured "amplification". In terms of how many times you write each tuple, is it accurate to say that a tuple can now be spilled three times (in the worst case) whereas, before, it could be spilled only twice? 1 - when building the inner side hashtable, tuple is spilled to a "slice" file 2 - (assuming the number of batches was increased) during execution, when a tuple belonging to a later slice's spill file is found, it is re-spilled to that slice's spill file 3 - during execution, when reading from its slice file, it is re-spilled (again) to its batch's spill file Is it correct that the max number of BufFile structs you will have is equal to the number of slices + number of batches in a slice because that is the max number of open BufFiles you would have at a time? By the way, applying v4 patch on master, in an assert build, I am tripping some asserts -- starting with Assert(!file->readOnly); in BufFileWrite One thing I was a little confused by was the nbatch_inmemory member of the hashtable. The comment in ExecChooseHashTableSize says that it is determining the number of batches we can fit in memory. I thought that the problem was the amount of space taken up by the BufFile data structure itself--which is related to the number of open BufFiles you need at a time. This comment in ExecChooseHashTableSize makes it sound like you are talking about fitting more than one batch of tuples into memory at a time. I was under the impression that you could only fit one batch of tuples in memory at a time. So, I was stepping through the code with work_mem set to the lower bound, and in ExecHashIncreaseNumBatches, I got confused. hashtable->nbatch_inmemory was 2 for me, thus, nbatch_tmp was 2 so, I didn't meet this condition if (nbatch_tmp > hashtable->nbatch_inmemory) since I just set nbatch_tmp using hashtable->nbatch_inmemory So, I didn't increase the number of slices, which is what I was expecting. What happens when hashtable->nbatch_inmemory is equal to nbatch_tmp? -- Melanie Plageman