On 31/03/2022 10:25, Zhang, Chen wrote: > >> -----Original Message----- >> From: lizhij...@fujitsu.com <lizhij...@fujitsu.com> >> Sent: Thursday, March 31, 2022 9:15 AM >> To: Zhang, Chen <chen.zh...@intel.com>; Jason Wang >> <jasow...@redhat.com> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like...@linux.intel.com> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the >> conn_list >> >> >> connection_track_table >> -----+---------- >> key1 | conn |-----------------------------------------------------------+ >> -----+---------- | >> key2 | conn |----------------------------------+ | >> -----+---------- | | >> key3 | conn |---------+ | | >> -----+---------- | | | >> | | >> | >> | | >> | >> + CompareState ++ | | >> | >> | | V V >> V >> +---------------+ +---------------+ +---------------+ >> |conn_list +--->conn +--------->conn | >> connx >> +---------------+ +---------------+ +---------------+ >> | | | | | | >> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> | | | | >> +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> | | | | >> +---v----+ +---v----+ +---v----+ +---v----+ >> |primary | |secondary |primary | |secondary >> |packet | |packet + |packet | |packet + >> +--------+ +--------+ +--------+ +--------+ >> >> I recalled that we should above relationships between >> connection_track_table conn_list and conn. >> That means both connection_track_table and conn_list reference to the >> same conn instance. >> >> So before this patch, connection_get() is possible to use-after-free/double >> free conn. where 1st was in >> connection_hashtable_reset() and 2nd was >> 221 while (!g_queue_is_empty(conn_list)) { >> 222 connection_destroy(g_queue_pop_head(conn_list)); >> 223 } >> >> I also doubt that your current abort was just due to above use-after- >> free/double free. >> If so, looks it's enough we just update to g_queue_clear(conn_list) in the >> 2nd >> place. > Make sense, but It also means the original patch works here, skip free conn > in connection_hashtable_reset() and do it in: > 221 while (!g_queue_is_empty(conn_list)) { > 222 connection_destroy(g_queue_pop_head(conn_list)); > 223 }. > It also avoid use-after-free/double free conn. Although you will not use-after-free here, you have to consider other situations carefully that g_hash_table_remove_all() g_hash_table_destroy() were called where the conn_list should also be freed with you approach.
> Maybe we can keep the original version to fix it? And your commit log should be more clear. Thanks Zhijian > > Thanks > Chen > >> Thanks >> Zhijian >> >> >> On 28/03/2022 17:13, Zhang, Chen wrote: >>>> -----Original Message----- >>>> From: lizhij...@fujitsu.com <lizhij...@fujitsu.com> >>>> Sent: Monday, March 21, 2022 11:06 AM >>>> To: Zhang, Chen <chen.zh...@intel.com>; Jason Wang >>>> <jasow...@redhat.com>; lizhij...@fujitsu.com >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu >>>> <like...@linux.intel.com> >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear >>>> the conn_list >>>> >>>> >>>> >>>> On 09/03/2022 16:38, Zhang Chen wrote: >>>>> We notice the QEMU may crash when the guest has too many incoming >>>>> network connections with the following log: >>>>> >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection >>>>> hashtable full, clear it >>>>> free(): invalid pointer >>>>> [1] 15195 abort (core dumped) qemu-system-x86_64 .... >>>>> >>>>> This is because we create the s->connection_track_table with >>>>> g_hash_table_new_full() which is defined as: >>>>> >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func, >>>>> GEqualFunc key_equal_func, >>>>> GDestroyNotify key_destroy_func, >>>>> GDestroyNotify value_destroy_func); >>>>> >>>>> The fourth parameter connection_destroy() will be called to free the >>>>> memory allocated for all 'Connection' values in the hashtable when >>>>> we call g_hash_table_remove_all() in the connection_hashtable_reset(). >>>>> >>>>> It's unnecessary because we clear the conn_list explicitly later, >>>>> and it's buggy when other agents try to call connection_get() with >>>>> the same connection_track_table. >>>>> >>>>> Signed-off-by: Like Xu <like...@linux.intel.com> >>>>> Signed-off-by: Zhang Chen <chen.zh...@intel.com> >>>>> --- >>>>> net/colo-compare.c | 2 +- >>>>> net/filter-rewriter.c | 2 +- >>>>> 2 files changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index >>>>> 62554b5b3c..ab054cfd21 100644 >>>>> --- a/net/colo-compare.c >>>>> +++ b/net/colo-compare.c >>>>> @@ -1324,7 +1324,7 @@ static void >>>> colo_compare_complete(UserCreatable *uc, Error **errp) >>>>> s->connection_track_table = >>>> g_hash_table_new_full(connection_key_hash, >>>>> >>>>> connection_key_equal, >>>>> g_free, >>>>> - >>>>> connection_destroy); >>>>> + NULL); >>>> 202 /* if not found, create a new connection and add to hash table */ >>>> 203 Connection *connection_get(GHashTable *connection_track_table, >>>> 204 ConnectionKey *key, >>>> 205 GQueue *conn_list) >>>> 206 { >>>> 207 Connection *conn = >>>> g_hash_table_lookup(connection_track_table, >>>> key); >>>> 208 >>>> 209 if (conn == NULL) { >>>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key)); >>>> 211 >>>> 212 conn = connection_new(key); >>>> 213 >>>> 214 if (g_hash_table_size(connection_track_table) > >>>> HASHTABLE_MAX_SIZE) { >>>> 215 trace_colo_proxy_main("colo proxy connection hashtable >>>> full," >>>> 216 " clear it"); >>>> 217 connection_hashtable_reset(connection_track_table); >>>> >>>> 197 void connection_hashtable_reset(GHashTable >>>> *connection_track_table) >>>> 198 { >>>> 199 g_hash_table_remove_all(connection_track_table); >>>> 200 } >>>> >>>> IIUC, above subroutine will do some cleanup explicitly. And before >>>> your patch, connection_hashtable_reset() will release all keys and >>>> their values in this hashtable. But now, you remove all keys and just >>>> one value(conn_list) instead. Does it means other values will be leaked ? >>> Thanks Zhijian. Re-think about it. Yes, I think you are right. >>> It looks need a lock to avoid into connection_get() when triggered the clear >> hashtable operation. >>> What do you think? >>> >>> Thanks >>> Chen >>> >>> >>>> 218 /* >>>> 219 * clear the conn_list >>>> 220 */ >>>> 221 while (!g_queue_is_empty(conn_list)) { >>>> 222 connection_destroy(g_queue_pop_head(conn_list)); >>>> 223 } >>>> 224 } >>>> 225 >>>> 226 g_hash_table_insert(connection_track_table, new_key, >>>> conn); >>>> 227 } >>>> 228 >>>> 229 return conn; >>>> 230 } >>>> >>>> >>>> Thanks >>>> Zhijian >>>> >>>>> colo_compare_iothread(s); >>>>> >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index >>>>> bf05023dc3..c18c4c2019 100644 >>>>> --- a/net/filter-rewriter.c >>>>> +++ b/net/filter-rewriter.c >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState >>>>> *nf, >>>> Error **errp) >>>>> s->connection_track_table = >>>> g_hash_table_new_full(connection_key_hash, >>>>> >>>>> connection_key_equal, >>>>> g_free, >>>>> - >>>>> connection_destroy); >>>>> + NULL); >>>>> s->incoming_queue = >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); >>>>> } >>>>>