Hi,

On Tue, Aug 05, 2025 at 11:24:04PM -0500, Sami Imseih wrote:
> > I'll fix this in the next rev.
> 
> v5 fixes the above 2 issues found above.

Thanks!

> For the issue that was throwing ".... segment that has been freed",
> indeed we should be freeing LWLockTrancheNames.shmem->list_ptr,

Yeah.

> list_ptr is a dsa_pointer that stores an array of dsa_pointers, which
> then made me realize that I was not freeing the actual dsa_pointers
> holding the tranche names. I fixed that as well.
> 
> I also made some improvements to the code that copies the old
> list to the new list and fixed the lookup in
> GetLWTrancheName.
> 

Still doing a bit of testing without looking closely to the code.

Issue 1 --

If I:

LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);

So that this process is locked on the LWLock BDT_Lck and if I query 
pg_stat_activity
in another backend, then I get:

postgres=# select wait_event from pg_stat_activity where wait_event_type = 
'LWLock';
 wait_event
------------
 BDT_Lck
(1 row)

which is what we expect.

But if I look at LWLockTrancheNames.local (for the process that queried 
pg_stat_activity):

(gdb) p LWLockTrancheNames.local[0]
$1 = 0x0
(gdb) p LWLockTrancheNames.local[1]
$2 = 0x7b759a81b038 "BDT_Lck"

It looks like there is an indexing issue, as we should start at index 0.

Issue 2 / question --

If I call LWLockNewTrancheId("BDT_play2") multiple times to ensure that 

trancheId >= LWLockTrancheNames.allocated and then ensure that a backend is
locked doing:

"
LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);
"

Then if, in another backend, I call GetLWTrancheName by querying 
pg_stat_activity
then I see "BDT_Lck" being reported as wait event name (which is good).

The thing that worries me a bit is that the local cache is populated only for
"BDT_Lck", but not for all the other "BDT_play2".

(gdb) p LWLockTrancheNames.local[2]
$9 = 0x0
(gdb) p LWLockTrancheNames.local[3]
$10 = 0x0
(gdb) p LWLockTrancheNames.local[12]
$11 = 0x0
(gdb) p LWLockTrancheNames.local[13]
$12 = 0x780593a1b138 "BDT_Lck"

When we need to populate the local cache, would it be better to populate
it with the whole dsa content instead of just the current missing ID as done
in GetLWTrancheName():

+               if (tranche_name != NULL)
+                       SetLocalTrancheName(trancheId, tranche_name);?

I think that would make more sense, as that would avoid to take the
LWLockTrancheNames.shmem->lock in GetLWTrancheNameByID() multiple times
should the wait event change, thoughts?

Issue 3 --

If I call LWLockNewTrancheId("BDT_play2") only one time to ensure that
trancheId < LWLockTrancheNames.allocated and then ensure that a backend is
locked doing:

"
LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);
"

Then if, in another backend, I call GetLWTrancheName by querying 
pg_stat_activity
then I see "extension" being reported as wait event name (which is not good).

The issue is that in GetLWTrancheName(), the patch does:

+       if (trancheId >= LWLockTrancheNames.allocated)
+       {
+               tranche_name = GetLWTrancheNameByID(trancheId);
+               if (tranche_name != NULL)
+                       SetLocalTrancheName(trancheId, tranche_name);
+       }
+       else
+               tranche_name = LWLockTrancheNames.local[trancheId];

Then does not find the lock in the local cache and then returns "extension".

I think that in that case we should fall back to querying the DSA (and
populate the local cache), thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to