On Thu, Feb 27, 2020 at 01:01:08PM -0800, Jeff Davis wrote:
> Attached is a patch that exports a new function from logtape.c:
> 
>    extern LogicalTapeSet *LogicalTapeSetExtend(
>             LogicalTapeSet *lts, int nAdditional);
> 
> The purpose is to allow adding new tapes dynamically for the upcoming
> Hash Aggregation work[0]. HashAgg doesn't know in advance how many
> tapes it will need, though only a limited number are actually active at
> a time.
> 
> This was proposed and originally written by Adam Lee[1] (extract only
> the changes to logtape.c/h from his posted patch). Strangely, I'm
> seeing ~5% regression with his version when doing:
> 
>     -- t20m_1_int4 has 20 million random integers
>     select * from t20m_1_int4 order by i offset 100000000;
> 
> Which seems to be due to using a pointer rather than a flexible array
> member (I'm using gcc[2]). My version (attached) still uses a flexible
> array member, which doesn't see the regression; but I repalloc the
> whole struct so the caller needs to save the new pointer to the tape
> set.
> 
> That doesn't entirely make sense to me, and I'm wondering if someone
> else can repro that result and/or make a suggestion, because I don't
> have a good explanation. I'm fine with my version of the patch, but it
> would be nice to know why there's such a big difference using a pointer
> versus a flexible array member.
> 
> Regards,
>       Jeff Davis

I noticed another difference, I was using palloc0(), which could be one of the
reason, but not sure.

Tested your hashagg-20200226.patch on my laptop(Apple clang version 11.0.0),
the average time is 25.9s:

```
create table t20m_1_int4(i int);
copy t20m_1_int4 from program 'shuf -i 1-4294967295 -n 20000000';
analyze t20m_1_int4;
```
```
explain analyze select * from t20m_1_int4 order by i offset 100000000;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual 
time=25666.471..25666.471 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=20000096 width=4) (actual 
time=20663.065..24715.269 rows=20000000 loops=1)
         Sort Key: i
         Sort Method: external merge  Disk: 274056kB
         ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=20000096 
width=4) (actual time=0.075..2749.385 rows=20000000 loops=1)
 Planning Time: 0.109 ms
 Execution Time: 25911.542 ms
(7 rows)
```

But if use the palloc0() or do the MemSet() like:

```
lts = (LogicalTapeSet *) palloc0(offsetof(LogicalTapeSet, tapes) +
                                                                ntapes * 
sizeof(LogicalTape));
...
MemSet(lts->tapes, 0, lts->nTapes * sizeof(LogicalTape)); <--- this line 
doesn't matter as I observed,
                                                               which makes a 
little sense the compiler
                                                               might know it's 
already zero.
```

The average time goes up to 26.6s:

```
explain analyze select * from t20m_1_int4 order by i offset 100000000;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual 
time=26419.712..26419.712 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=20000096 width=4) (actual 
time=21430.044..25468.661 rows=20000000 loops=1)
         Sort Key: i
         Sort Method: external merge  Disk: 274056kB
         ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=20000096 
width=4) (actual time=0.060..2839.452 rows=20000000 loops=1)
 Planning Time: 0.111 ms
 Execution Time: 26652.255 ms
(7 rows)
```

-- 
Adam Lee


Reply via email to