On 06/02/2020 12:57, Jan Beulich wrote: > On 06.02.2020 12:44, Durrant, Paul wrote: >>> -----Original Message----- >>> From: Julien Grall <jul...@xen.org> >>> Sent: 06 February 2020 11:17 >>> To: Durrant, Paul <pdurr...@amazon.co.uk>; xen-devel@lists.xenproject.org >>> Cc: Grall, Julien <jgr...@amazon.com> >>> Subject: Re: [PATCH v2] xen/mm: Avoid assuming the page is inuse in >>> assign_pages() >>> >>> Hi Paul, >>> >>> On 06/02/2020 10:52, Durrant, Paul wrote: >>>>> -----Original Message----- >>>>> From: Julien Grall <jul...@xen.org> >>>>> Sent: 06 February 2020 10:39 >>>>> To: xen-devel@lists.xenproject.org >>>>> Cc: jul...@xen.org; Durrant, Paul <pdurr...@amazon.co.uk>; Grall, >>> Julien >>>>> <jgr...@amazon.com> >>>>> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in >>>>> assign_pages() >>>>> >>>>> From: Julien Grall <jgr...@amazon.com> >>>>> >>>>> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse) >>>>> and the state value to be 0. >>>>> >>>>> However, the code may race with the page offlining code (see >>>>> offline_page()). Depending on the ordering, the page may be in >>> offlining >>>>> state (PGC_state_offlining) before it is assigned to a domain. >>>>> >>>>> On debug build, this may result to hit the assert or just clobber the >>>>> state. On non-debug build, the state will get clobbered. >>>>> >>>>> Incidentally the flag PGC_broken will get clobbered as well. >>>>> >>>>> Grab the heap_lock to prevent a race with offline_page() and keep the >>>>> state and broken flag around. >>>>> >>>>> Signed-off-by: Julien Grall <jgr...@amazon.com> >>>> This seems like a reasonable change. I guess having assign_pages() take >>> the global lock is no more problem than its existing call to >>> domain_adjust_tot_pages() which also takes the same lock. >>> >>> That's my understanding. Summarizing our discussion IRL for the other, >>> it is not clear whether the lock is enough here. >>> >>> From my understanding the sequence >>> >>> pg[i].count_info &= ...; >>> pg[i].count_info |= ...; >>> >>> could result to multiple read/write from the compiler. We could use a >>> single assignment, but I still don't think this prevent the compiler to >>> be use multiple read/write. >>> >>> The concern would be a race with get_page_owner_and_reference(). If 1 is >>> set before the rest of the bits, then you may be able to get the page. >>> >>> So I might want to use write_atomic() below. Any opinion? >>> >> TBH I wonder if we ought to say that any update to count_info ought to >> be done by a write_atomic (where it's not already done by cmpxchg). > I agree.
It won't fix anything, and gives the compiler a harder time. write_atomic() is a mov instruction. It prohibits the use of and/or $imm, mem encodings. If multiple reads/writes are a concern then the only valid code generation is for *every* modification of count info to use locked operations. Swapping regular C for a single mov instruction here is not going to fix a bug, if such a bug exists. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel