RE: [PoC] Partition path cache

2024-10-28 Thread Bykov Ivan
Hello

> This sounds like an interesting idea, I like it because it omit the needs for 
> "global statistics" effort for partitioned table since it just use the first 
> partition it knows. Of couse it has its drawback that "first"
> partition can't represent other partitions.

This method uses global statistics for all partitions. 
The cache uses standard path building functions (it calculates selectivity for 
path), but it avoids calling all of them for the second and later partitions in 
a group.

The concept is similar to the GEQO method used for joins.
We skip creating some path variants if building all paths would take too long.

> One of the Arguments of this patch might be "What if other partitions have a 
> pretty different statistics from the first partition?". If I were you, I 
> might check all the used statistics on this stage and try to find out a 
> similar algorithms to > prove that the best path would be similar too. This 
> can happens once when the statistics is gathered. However this might be not 
> easy.

Yes, maybe we can split partitions by groups not only by available index lists 
but also by some statistical property ranges.

--
Best Regards
Ivan Bykov



Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2024-11-26 Thread Bykov Ivan
Hi, all!

It seems I have found a bug in the Query ID calculation.

Problem
===
In some cases, we could have same IDs for not identical query trees.
For two structurally similar query subnodes, we may have query
trees that look like this:


QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;

QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;


When the query jumble walker calculates the query ID, it traverses the
Query members from top to bottom and generates the same IDs for these
two queries because it does not change the hash value when visiting
an empty node (= NULL).

Here is an example (the collection of jstate->jumble is omitted):

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = ; */

/* QueryB_ID  =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = ; */
QueryB->subNodeTwo = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/


There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).

Here is an example of such errors (all queries run on REL_17_0):

SET compute_query_id = on;

/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;
/* Query Identifier: 751948508603549510 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";
/* Query Identifier: 751948508603549510 */

/* LIMIT / OFFSET **/
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;
/* Query Identifier: 5185884322440896420 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
/* Query Identifier: 5185884322440896420 */


Solution One


The simplest way to fix the problem is to place the scalar field used in
the query ID calculation between similar subnodes. A patch for this
solution is attached below (0001-Query-ID-Calculation-Fix-Variant-A.patch).

Here is an example:

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) =  
*/
QueryA->scalar = Value Y;
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = ; */

/* QueryB_ID =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = ; */
QueryB->scalar = Value Y;
/* QueryB_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryB_ID) =  
*/
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/


Solution Two

Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors
in the future. A patch for this solution is attached below
(0001-Query-ID-Calculation-Fix-Variant-B.patch).

Here is an example:

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = hash_any_extended(&('\0'), sizeof(char), QueryA_ID) =  */

/* QueryB_ID  =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = hash_any_extended(&('\0'), sizeof(char), QueryB_ID) =  */
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/



RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-07 Thread Bykov Ivan
Hello!

Last time, I forgot to attach the patches.
The problem still persists in the 17.3 release.

Solution One

The simplest way to fix the problem is to place the scalar field used in the 
query ID calculation
between similar subnodes.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-A.patch).

Solution Two

Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors in 
the future.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-B.patch).


==
SELECT version();
 version
-
PostgreSQL 17.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 
12.2.0, 64-bit

SET compute_query_id = on;

/* LIMIT / OFFSET */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;

 QUERY PLAN

Limit  (cost=0.00..0.04 rows=1 width=4)
   Output: oid
   ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.15 rows=415 width=4)
 Output: oid
Query Identifier: 5185884322440896420

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
 QUERY PLAN

Limit  (cost=0.04..18.15 rows=414 width=4)
   Output: oid
   ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.15 rows=415 width=4)
 Output: oid
Query Identifier: 5185884322440896420

/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;

 QUERY PLAN

Unique  (cost=0.27..23.54 rows=415 width=4)
   Output: oid
   ->  Index Only Scan using pg_class_oid_index on pg_catalog.pg_class  
(cost=0.27..22.50 rows=415 width=4)
 Output: oid
Query Identifier: 751948508603549510

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";

  QUERY PLAN
--
Index Only Scan using pg_class_oid_index on pg_catalog.pg_class  
(cost=0.27..22.50 rows=415 width=4)
   Output: oid
Query Identifier: 751948508603549510



0001-Query-ID-Calculation-Fix-Variant-A.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-A.patch


0001-Query-ID-Calculation-Fix-Variant-B.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-B.patch


RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-06 Thread Bykov Ivan
Here is
0001-Query-ID-Calculation-Fix-Variant-A.patch
and
0001-Query-ID-Calculation-Fix-Variant-B.patch




0001-Query-ID-Calculation-Fix-Variant-B.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-B.patch


0001-Query-ID-Calculation-Fix-Variant-A.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-A.patch


RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread Bykov Ivan
Hello!

>> Variant B is not acceptable IMO as it adds a whole bunch of 
>> null-terminators unnecessarily. For example, in a simple "select 1", 
>> (expr == NULL) is true 19 times, so that is an extra 19 bytes.

> Variant B is not acceptable here.

Could we improve Variant B?

I was thinking about adding a count of NULL pointers encountered by the query
jumble walker since the last scalar or non-null field visited.
This way, we can traverse the query as follows:

QueryA->subNodeOne = &(Value X);
/* JumbleState->Count = 0;
   QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) */
QueryA->subNodeTwo = NULL;
/* JumbleState->Count = 1; */
QueryA->subNodeThee = NULL;
/* JumbleState->Count = 2; */
QueryA->subNodeFour = NULL;
/* JumbleState->Count = 3; */
QueryA->scalar = Value Z;
/* QueryA_ID = hash_any_extended(&(JumbleState->Count),
sizeof(JumbleState->Count), QueryA_ID)
   JumbleState->Count = 0;
   QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) */


Variant A improvement
-

I have attached the updated Variant A patch with the following changes:
- Implemented a pg_stat_statements regression test (see similar test results
  on REL_17_3 in Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txt).
- Added extra description about the Query ID collision
  in src/include/nodes/parsenodes.h.



-- End of contrib/pg_stat_statements/expected/select.out for REL_17_3

CREATE TABLE pgss_c AS
  SELECT id FROM generate_series(1,2) AS id;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

SELECT DISTINCT id FROM pgss_c;
 id 

  2
  1
(2 rows)

SELECT id FROM pgss_c ORDER BY id;
 id 

  1
  2
(2 rows)

SELECT id FROM pgss_c LIMIT 1;
 id 

  1
(1 row)

SELECT id FROM pgss_c OFFSET 1;
 id 

  2
(1 row)

SELECT query FROM pg_stat_statements
  WHERE query LIKE '%pgss_c%'
  ORDER BY query COLLATE "C";
 query  

 SELECT DISTINCT id FROM pgss_c
 SELECT id FROM pgss_c LIMIT $1
(2 rows)

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

SELECT id FROM pgss_c ORDER BY id;
 id 

  1
  2
(2 rows)

SELECT DISTINCT id FROM pgss_c;
 id 

  2
  1
(2 rows)

SELECT id FROM pgss_c OFFSET 1;
 id 

  2
(1 row)

SELECT id FROM pgss_c LIMIT 1;
 id 

  1
(1 row)

SELECT query FROM pg_stat_statements
  WHERE query LIKE '%pgss_c%'
  ORDER BY query COLLATE "C";
   query   
---
 SELECT id FROM pgss_c OFFSET $1
 SELECT id FROM pgss_c ORDER BY id
(2 rows)

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

DROP TABLE pgss_c;


v2-0001-Query-ID-Calculation-Fix-Variant-A.patch
Description: v2-0001-Query-ID-Calculation-Fix-Variant-A.patch


RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread Bykov Ivan
Here is bug description from
https://www.postgresql.org/message-id/flat/ca447b72d15745b9a877fad7e258407a%40localhost.localdomain

Problem
===
In some cases, we could have same IDs for not identical query trees.
For two structurally similar query subnodes, we may have query
trees that look like this:

QueryA->subNodeOne = Value X;
QueryA->subNodeTwo = NULL;
QueryB->subNodeOne = NULL;
QueryB->subNodeTwo = Value X;

When the query jumble walker calculates the query ID, it traverses the
Query members from top to bottom and generates the same IDs for these
two queries because it does not change the hash value when visiting
an empty node (= NULL).
Here is an example (the collection of jstate->jumble is omitted):

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = ; */
/* QueryB_ID =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = ; */
QueryB->subNodeTwo = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/

There are two pairs of subnodes that can trigger this error:
- distinctClause and sortClause (both contain a list of SortGroupClauses);
- limitOffset and limitCount (both contain an int8 expression).
Here is an example of such errors (all queries run on REL_17_0):

SET compute_query_id = on;
/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;
/* Query Identifier: 751948508603549510 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";
/* Query Identifier: 751948508603549510 */
/* LIMIT / OFFSET **/
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;
/* Query Identifier: 5185884322440896420 */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
/* Query Identifier: 5185884322440896420 */

Solution One

The simplest way to fix the problem is to place the scalar field used in
the query ID calculation between similar subnodes. A patch for this
solution is attached below (0001-Query-ID-Calculation-Fix-Variant-A.patch).
Here is an example:

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) =  
*/
QueryA->scalar = Value Y;
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = ; */
/* QueryB_ID =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = ; */
QueryB->scalar = Value Y;
/* QueryB_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryB_ID) =  
*/
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/

Solution Two

Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors
in the future. A patch for this solution is attached below
(0001-Query-ID-Calculation-Fix-Variant-B.patch).
Here is an example:

/* QueryA_ID =  */
QueryA->subNodeOne = &(Value X);
/* QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) =  
*/
QueryA->subNodeTwo = NULL;
/* QueryA_ID = hash_any_extended(&('\0'), sizeof(char), QueryA_ID) =  */
/* QueryB_ID =  */
QueryB->subNodeOne = NULL;
/* QueryB_ID = hash_any_extended(&('\0'), sizeof(char), QueryB_ID) =  */
QueryB->subNodeOne = &(Value X);
/* QueryB_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryB_ID) =  
*/



From: Быков Иван Александрович
Sent: Thursday, March 6, 2025 7:32 PM
To: 'pgsql-hack...@postgresql.org' 
Subject: RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

Hello!
Last time, I forgot to attach the patches.
The problem still persists in the 17.3 release.
Solution One

The simplest way to fix the problem is to place the scalar field used in the 
query ID calculation
between similar subnodes.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-A.patch).
Solution Two

Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors in 
the future.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-B.patch).

==
SELECT version();
 version
-
PostgreSQL 17.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 
12.2.0, 64-bit

SET compute_query_id = on;

/* LIMIT / OFFSET */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;

 QUERY PLAN

Limit  (cost=0.00..0.04 rows=1 width=4)
   Output: oid
   ->  Seq S

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-15 Thread Bykov Ivan
Hello!

> Since then, I see that Ivan
> has already submitted a patch that accounts for NULL nodes and adds a
> byte to the jumble buffer to account for NULLs. This seems quite clean
> and simple. However, Sami seems to have concerns about the overhead of
> doing this. Is that warranted at all? Potentially, there could be a
> decent number of NULL fields. It'll probably be much cheaper than the
> offsetof idea I came up with.

To address the issue of it consuming a lot of bytes in the jumble buffer
for NULL marks, I have added a new patch version for Variant B.
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch).

This patch adds to JumbleState the count of NULL nodes visited before a
non-NULL one appears. The least significant byte of this count is appended
to the jumble buffer every time the count is not null (indicating that we have
visited non-NULL nodes previously), and a new chunk of data is also appended
to the jumble buffer. I intentionally did not add a final append for the NULL
count in the JumbleQuery processing, as NULL tail nodes of the query do not
improve the reliability of query ID collision resolution.

I believe that my first version, Variant B of the patch, is simple and easy to
understand. I would prefer to use the first version of the Variant B patch,
so I have attached an updated version along with tests
(v3-0001-Query-ID-Calculation-Fix-Variant-B.patch).

As you can see, v2-0001-Query-ID-Calculation-Fix-Variant-B.patch is more
complex and invasive than v3-0001-Query-ID-Calculation-Fix-Variant-B.patch.
I don't think that saving a few bytes in a 1024-byte sized buffer is worth
implementing, even for this simple optimization (v2).

> Can you share your thoughts on Ivan's NULL jumble idea?

I would appreciate any feedback. Thank you.


-Original Message-
From: David Rowley  
Sent: Tuesday, March 11, 2025 12:49 PM
To: Michael Paquier 
Cc: Быков Иван Александрович ; Sami Imseih 
; pgsql-hack...@postgresql.org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier  wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar 
> > one is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the 
> shadows?  :)
>
> This issue exists since the query jumbling exists in pgss, so it goes 
> really down the road.  I've spent a bit more than a few minutes on 
> that.

I didn't mean to cause offence here. I just wanted to make it clear that I 
don't think fixing these types of issues by swapping the order of the fields is 
going to be a sustainable practice, and it would be good to come up with a 
solution that eliminates the chances of this class of bug from ever appearing 
again.  Even if we were to trawl the entire Query struct and everything that 
could ever be attached to it and we deem that today it's fine and no more such 
bugs exist, the person adding some new SQL feature in the future that adds new 
data to store in the Query probably isn't going to give query jumbling much 
thought, especially now that the code generation is automatic.

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're 
> > jumbling and incorporate that seed into the hash?  I don't think we 
> > could possibly change the offset in a minor version, so I don't 
> > think there's a risk that could cause jumble value instability. 
> > Possibly different CPU architectures would come up with different 
> > offsets through having different struct alignment requirements, but 
> > we don't make any guarantees about the jumble values matching across 
> > different CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems 
> forever, particularly thanks to the fact that we automate the 
> generation of this code now so it is mostly cost-free.  This has a 
> cost for the custom jumbling functions where one needs some 
> imagination, but with models being around while we keep the number of 
> custom functions low, that should be neither complicated nor costly, 
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday and my 
response was mostly aimed at the latest patch on the thread at the time, which 
I had looked at quickly and wanted to put a stop to changing field orders as a 
fix for this.  Since then, I see that Ivan has already submitted a patch that 
accounts for NULL nodes and adds a byte to the jumble buffer to account for 
NULLs. This seems quite clean and simple. However, Sami seems to have concerns 
about the overhead of doing this. Is that warranted at all? Potentially, there 
could be a decent number of NULL fields. It'll probably be much cheaper than 
the offsetof idea I came up with.

> When I was thinking about the addition 

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-25 Thread Bykov Ivan
Hello, David!

As I can see, your patch has the same idea as my 
v2-0001-Query-ID-Calculation-Fix-Variant-B.patch from [1]. 
I think it would be better to extract the jumble buffer update with hash 
calculation into a function
(CompressJumble in my patch). This will result in fewer changes to the source 
code. 
We can also simply dump the null count to the buffer when we encounter a 
non-null field
(which will be processed in AppendJumble, indeed).

What do your thing about my patch 
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch)?

[1] 
https://www.postgresql.org/message-id/5ac172e0b77a4baba50671cd1a15285f%40localhost.localdomain



RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-17 Thread Bykov Ivan
Hello, Michael!

> So, here is attached a counter-proposal, where we can simply added a
> counter tracking a node count in _jumbleNode() to add more entropy to
> the mix, incrementing it as well for NULL nodes.

It definitely looks like a more reliable solution than my variant, which only
counts NULL nodes.

However, we already knew about the overhead of adding `\0` bytes for
every NULL field.

> So that adds about 9.1% overhead to jumbling, on average.

See:
https://www.postgresql.org/message-id/flat/5ac172e0b77a4baba50671cd1a15285f%40localhost.localdomain#6c43f354f5f42d2a27e6824faa660a86

Is it really worth spending extra execution time to increase entropy
when we have non-NULL nodes?

Maybe we should choose to add node_count to the hash every time we visit
non-NULL or NULL nodes.
We could also add entropy if we see a change in the node->type value for
non-NULL variants.

Your Variant


< node_count = 1 > < node 1 >
< node_count = 2 > /* node 2 = NULL */
< node_count = 3 > < node 3 >

Alternative 1 (mark only NULL Nodes)


/* node_count = 1 */ < node 1 >
< node_count = 2 > /* node 2 = NULL */
/* node_count = 3 */ < node 3 >

Alternative 2 (mark only non-NULL Nodes)

This could address concerns about problems related to visiting nodes with the
same content placed in different query tree branches.

< node_count = 1 > < node 1 >
/* node_count = 2 */ /* node 2 = NULL */
< node_count = 3 > < node 3 >




RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-26 Thread Bykov Ivan
Hello, David!

> I see you opted to only jumble the low-order byte of the pending null
> counter. I used the full 4 bytes as I don't think the extra 3 bytes is
> a problem. Witrh v7 the memcpy to copy the pending_nulls into the
> buffer is inlined into a single instruction. It's semi-conceivable
> that you could have more than 255 NULLs given that a List can contain
> Nodes. I don't know if we ever have any NULL items in Lists at the
> moment, but I don't think that that's disallowed. I'd feel much safer
> taking the full 4 bytes of the counter to help prevent future
> surprises.

Yes _jumbleList could theoretically handle cases with over 256 Node elements in 
same list,
but most (or all) of them is non-NULL (as I know only PlannedStmt->subplans 
accepts NULL elements), 
Most of the foreach cycles over query lists don't have any NULL safety checks;
we just assume that the lists contain no NULL pointers.
I still believe storing just one byte is sufficient.

> I'm happy to proceed with the v7 version. I'm also happy to credit you
> As the primary author of it, given that you came up with the v2b
> patch.  It might be best to extract the performance stuff that's in v7
> and apply that separately from the bug fix. If you're still interested
> and have time in the next 7 hours to do it, would you be able to split
> the v7 as described, making v8-0001 as the bug fix and v8-0002 as the
> performance stuff? If so, could you also add the total_jumble_len to
> v8-0001 like Michael's last patch had, but adjust so it's only
> maintained for Assert builds. Michael is quite keen on having more
> assurance that custom jumble functions don't decide to forego any
> jumbling.

As I can see, you have already split the patches. I apologize for the delay; I 
don't have
much time to work on this issue. Thank you for your proposal to credit me as 
the patch author.