Hi Thomas,

Some more data points:

create table t_heap as select generate_series(1, 100000000) i;

Query: select count(*) from t_heap;
shared_buffers=32MB (so that I don't have to clear buffers, OS page
cache)
OS: FreeBSD 12.1 with UFS on GCP
4 vCPUs, 4GB RAM Intel Skylake
22G Google PersistentDisk
Time is measured with \timing on.

Without your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           33.88s
                              1           57.62s
                              2           62.01s
                              6          222.94s

With your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           29.04s
                              1           29.17s
                              2           28.78s
                              6          291.27s

I checked with explain analyze to ensure that the number of workers
planned = max_parallel_workers_per_gather

Apart from the last result (max_parallel_workers_per_gather=6), all
the other results seem favorable.
Could the last result be down to the fact that the number of workers
planned exceeded the number of vCPUs?

I also wanted to evaluate Zedstore with your patch.
I used the same setup as above.
No discernible difference though, maybe I'm missing something:

Without your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           25.86s
                              1           15.70s
                              2           12.60s
                              6           12.41s


With your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           26.96s
                              1           15.73s
                              2           12.46s
                              6           12.10s
--
Soumyadeep


On Thu, May 21, 2020 at 3:28 PM Thomas Munro <thomas.mu...@gmail.com> wrote:

> On Fri, May 22, 2020 at 10:00 AM David Rowley <dgrowle...@gmail.com>
> wrote:
> > On Thu, 21 May 2020 at 17:06, David Rowley <dgrowle...@gmail.com> wrote:
> > > For the patch. I know you just put it together quickly, but I don't
> > > think you can do that ramp up the way you have. It looks like there's
> > > a risk of torn reads and torn writes and I'm unsure how much that
> > > could affect the test results here.
> >
> > Oops. On closer inspection, I see that memory is per worker, not
> > global to the scan.
>
> Right, I think it's safe.  I think you were probably right that
> ramp-up isn't actually useful though, it's only the end of the scan
> that requires special treatment so we don't get unfair allocation as
> the work runs out, due to course grain.  I suppose that even if you
> have a scheme that falls back to fine grained allocation for the final
> N pages, it's still possible that a highly distracted process (most
> likely the leader given its double duties) can finish up sitting on a
> large range of pages and eventually have to process them all at the
> end after the other workers have already knocked off and gone for a
> pint.
>
>
>

Reply via email to