Hi, On 2021-03-19 14:27:38 -0700, Andres Freund wrote: > Yep, it's not even particularly hard to hit: > > S0: CREATE TABLE a_table(); > S0: INSERT INTO a_table(); > S0: disconnect > S1: set a breakpoint to just after the dshash_release_lock(), with an > if objid == a_table_oid > S1: SELECT pg_stat_get_live_tuples('a_table'::regclass); > (will break at above breakpoint, without having incremented the > refcount yet) > S2: DROP TABLE a_table; > S2: VACUUM pg_class; > > At that point S2's call to pgstat_vacuum_stat() will find the shared > stats entry for a_table, delete the entry from the shared hash table, > see that the stats data has a zero refcount, and free it. Once S1 wakes > up it'll use already freed (and potentially since reused) memory.
I fixed this by initializing / increment the refcount while holding dshash partition lock. To avoid the potential refcount leak in case the lookup cache insertion fails due to OOM I changed things so that the lookup cache is inserted, not just looked up, earlier. That also avoids needing two hashtable ops in the cache miss case. The price of an empty hashtable entry in the !create case doesn't seem high. Related issue: delete_current_stats_entry() there's the following comment: /* * Let the referrers drop the entry if any. Refcount won't be decremented * until "dropped" is set true and StatsShmem->gc_count is incremented * later. So we can check refcount to set dropped without holding a lock. * If no one is referring this entry, free it immediately. */ I don't think this explanations is correct. gc_count might have been incremented by another backend, or cleanup_dropped_stats_entries() might run. So the whole bit about refcounts seems wrong. I don't see what prevents a double-free here. Consider what happens if S1: cleanup_dropped_stats_entries() does pg_atomic_sub_fetch_u32(&ent->shared->refcount, 1) S2: delete_current_stats_entry() pg_atomic_read_u32(&header->refcount), reading 0 S1: dsa_free(area, ent->dsapointer); S2: dsa_free(area, pdsa); World: boom I think the appropriate fix might be to not have ->dropped (or rather have it just as a crosscheck), and have every non-dropped entry have an extra refcount. When dropping the entry the refcount is dropped, and we can safely free the entry. That way normal paths don't need to check ->dropped at all. Greetings, Andres Freund