On 8/29/22 16:02, Amit Kapila wrote:
> On Mon, Aug 29, 2022 at 7:17 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>>
>> David Rowley <dgrowle...@gmail.com> writes:
>>> I suspect, going by all 3 failing animals being 32-bit which have a
>>> MAXIMUM_ALIGNOF 8 and SIZEOF_SIZE_T of 4 that this is due to the lack
>>> of padding in the MemoryChunk struct.
>>> AllocChunkData and GenerationChunk had padding to account for
>>> sizeof(Size) being 4 and sizeof(void *) being 8, I didn't add that to
>>> MemoryChunk, so I'll do that now.
>>
>> Doesn't seem to have fixed it. IMO, the fact that we can get through
>> core regression tests and pg_upgrade is a strong indicator that
>> there's not anything fundamentally wrong with memory context
>> management. I'm inclined to think the problem is in d2169c9985,
>> instead ... though I can't see anything wrong with it.
>>
>
> Yeah, I also thought that way but couldn't find a reason. I think if
> David is able to reproduce it on one of his systems then he can try
> locally reverting both the commits one by one.
>
I can reproduce it on my system (rpi4 running 32-bit raspbian). I can't
grant access very easily at the moment, so I'll continue investigating
do more debugging on perhaps I can grant access to the system.
So far all I know is that it doesn't happen on d2169c9985 (so ~5 commits
back), and then it starts failing on c6e0fe1f2a. The extra padding added
by df0f4feef8 makes no difference, because the struct looked like this:
struct MemoryChunk {
Size requested_size; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
uint64 hdrmask; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* sum members: 12, holes: 1, sum holes: 4 */
/* last cacheline: 16 bytes */
};
and the padding makes it look like this:
struct MemoryChunk {
Size requested_size; /* 0 4 */
char padding[4]; /* 4 8 */
uint64 hdrmask; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* sum members: 12, holes: 1, sum holes: 4 */
/* last cacheline: 16 bytes */
};
so it makes no difference.
I did look at the pointers in GetMemoryChunkMethodID, and it looks like
this (p1 is result of MAXALIGN(pointer):
(gdb) p pointer
$1 = (void *) 0x1ca1d2c
(gdb) p p1
$2 = 0x1ca1d30 ""
(gdb) p p1 - pointer
$3 = 4
(gdb) p (long int) pointer
$4 = 30022956
(gdb) p (long int) p1
$5 = 30022960
(gdb) p 30022956 % 8
$6 = 4
So the input pointer is not actually aligned to MAXIMUM_ALIGNOF (8B),
but only to 4B. That seems a bit strange.
>> Another possibility is that there's a pre-existing bug in the
>> logical decoding stuff that your changes accidentally exposed.
>>
>
> Yeah, this is another possibility.
No idea.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company