connection_track_table -----+---------- key1 | conn |-----------------------------------------------------------+ -----+---------- | key2 | conn |----------------------------------+ | -----+---------- | | key3 | conn |---------+ | | -----+---------- | | | | | | | | | + CompareState ++ | | | | | V V V +---------------+ +---------------+ +---------------+ |conn_list +--->conn +--------->conn | connx +---------------+ +---------------+ +---------------+ | | | | | | +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ I recalled that we should above relationships between connection_track_table conn_list and conn. That means both connection_track_table and conn_list reference to the same conn instance.
So before this patch, connection_get() is possible to use-after-free/double free conn. where 1st was in connection_hashtable_reset() and 2nd was 221 while (!g_queue_is_empty(conn_list)) { 222 connection_destroy(g_queue_pop_head(conn_list)); 223 } I also doubt that your current abort was just due to above use-after-free/double free. If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd place. Thanks Zhijian On 28/03/2022 17:13, Zhang, Chen wrote: > >> -----Original Message----- >> From: lizhij...@fujitsu.com <lizhij...@fujitsu.com> >> Sent: Monday, March 21, 2022 11:06 AM >> To: Zhang, Chen <chen.zh...@intel.com>; Jason Wang >> <jasow...@redhat.com>; lizhij...@fujitsu.com >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like...@linux.intel.com> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the >> conn_list >> >> >> >> On 09/03/2022 16:38, Zhang Chen wrote: >>> We notice the QEMU may crash when the guest has too many incoming >>> network connections with the following log: >>> >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection >>> hashtable full, clear it >>> free(): invalid pointer >>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... >>> >>> This is because we create the s->connection_track_table with >>> g_hash_table_new_full() which is defined as: >>> >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, >>> GEqualFunc key_equal_func, >>> GDestroyNotify key_destroy_func, >>> GDestroyNotify value_destroy_func); >>> >>> The fourth parameter connection_destroy() will be called to free the >>> memory allocated for all 'Connection' values in the hashtable when we >>> call g_hash_table_remove_all() in the connection_hashtable_reset(). >>> >>> It's unnecessary because we clear the conn_list explicitly later, and >>> it's buggy when other agents try to call connection_get() with the >>> same connection_track_table. >>> >>> Signed-off-by: Like Xu <like...@linux.intel.com> >>> Signed-off-by: Zhang Chen <chen.zh...@intel.com> >>> --- >>> net/colo-compare.c | 2 +- >>> net/filter-rewriter.c | 2 +- >>> 2 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index >>> 62554b5b3c..ab054cfd21 100644 >>> --- a/net/colo-compare.c >>> +++ b/net/colo-compare.c >>> @@ -1324,7 +1324,7 @@ static void >> colo_compare_complete(UserCreatable *uc, Error **errp) >>> s->connection_track_table = >> g_hash_table_new_full(connection_key_hash, >>> >>> connection_key_equal, >>> g_free, >>> - connection_destroy); >>> + NULL); >> >> 202 /* if not found, create a new connection and add to hash table */ >> 203 Connection *connection_get(GHashTable *connection_track_table, >> 204 ConnectionKey *key, >> 205 GQueue *conn_list) >> 206 { >> 207 Connection *conn = g_hash_table_lookup(connection_track_table, >> key); >> 208 >> 209 if (conn == NULL) { >> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); >> 211 >> 212 conn = connection_new(key); >> 213 >> 214 if (g_hash_table_size(connection_track_table) > >> HASHTABLE_MAX_SIZE) { >> 215 trace_colo_proxy_main("colo proxy connection hashtable full," >> 216 " clear it"); >> 217 connection_hashtable_reset(connection_track_table); >> >> 197 void connection_hashtable_reset(GHashTable *connection_track_table) >> 198 { >> 199 g_hash_table_remove_all(connection_track_table); >> 200 } >> >> IIUC, above subroutine will do some cleanup explicitly. And before your >> patch, connection_hashtable_reset() will release all keys and their values in >> this hashtable. But now, you remove all keys and just one value(conn_list) >> instead. Does it means other values will be leaked ? > Thanks Zhijian. Re-think about it. Yes, I think you are right. > It looks need a lock to avoid into connection_get() when triggered the clear > hashtable operation. > What do you think? > > Thanks > Chen > > >> >> 218 /* >> 219 * clear the conn_list >> 220 */ >> 221 while (!g_queue_is_empty(conn_list)) { >> 222 connection_destroy(g_queue_pop_head(conn_list)); >> 223 } >> 224 } >> 225 >> 226 g_hash_table_insert(connection_track_table, new_key, conn); >> 227 } >> 228 >> 229 return conn; >> 230 } >> >> >> Thanks >> Zhijian >> >>> colo_compare_iothread(s); >>> >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index >>> bf05023dc3..c18c4c2019 100644 >>> --- a/net/filter-rewriter.c >>> +++ b/net/filter-rewriter.c >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, >> Error **errp) >>> s->connection_track_table = >> g_hash_table_new_full(connection_key_hash, >>> >>> connection_key_equal, >>> g_free, >>> - connection_destroy); >>> + NULL); >>> s->incoming_queue = >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); >>> } >>>