On 15.08.2024 15:34, oleksii.kuroc...@gmail.com wrote:
> On Thu, 2024-08-15 at 14:16 +0200, Jan Beulich wrote:
>> On 15.08.2024 13:21, oleksii.kuroc...@gmail.com wrote:
>>> On Thu, 2024-08-15 at 10:09 +0200, Jan Beulich wrote:
>>>> On 14.08.2024 18:50, oleksii.kuroc...@gmail.com wrote:
>>>>> On Tue, 2024-08-13 at 12:31 +0200, Jan Beulich wrote:
>>>>>> On 09.08.2024 18:19, Oleksii Kurochko wrote:
>>>>>>> RISC-V detects superpages using `pte.x` and `pte.r`, as
>>>>>>> there
>>>>>>> is no specific bit in the PTE for this purpose. From the
>>>>>>> RISC-V
>>>>>>> spec:
>>>>>>> ```
>>>>>>>   ...
>>>>>>>   4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x =
>>>>>>> 1, go
>>>>>>> to
>>>>>>> step 5.
>>>>>>>      Otherwise, this PTE is a pointer to the next level of
>>>>>>> the
>>>>>>> page
>>>>>>> table.
>>>>>>>      ... .
>>>>>>>   5. A leaf PTE has been found.
>>>>>>>      ...
>>>>>>>   ...
>>>>>>> ```
>>>>>>>
>>>>>>> The code doesn’t support super page shattering so 4KB pages
>>>>>>> are
>>>>>>> used as
>>>>>>> default.
>>>>>>
>>>>>> ... you have this. Yet still callers expecting re-mapping in
>>>>>> the
>>>>>> (large)
>>>>>> range they map can request small-page mappings right away.
>>>>> I am not sure that I fully understand what do you mean by
>>>>> "expcting
>>>>> re-
>>>>> mapping".
>>>>
>>>> Right now you have callers pass PTE_BLOCK when they know that no
>>>> small
>>>> page re-mappings are going to occur for an area. What I'm
>>>> suggesting
>>>> is
>>>> that you invert this logic: Have callers pass PTE_SMALL when
>>>> there is
>>>> a possibility that re-mapping requests may be issued later. Then,
>>>> later, by simply grep-ing for PTE_SMALL you'll be able to easily
>>>> find
>>>> all candidates that possibly can be relaxed when super-page
>>>> shattering
>>>> starts being supported. That's going to be easier than finding
>>>> all
>>>> instances where PTE_BLOCK is _not_used.
>>> So if I understand correctly. Actually nothing will change in
>>> algorithm
>>> of pt_update() and only PTE_SMALL should be introduced instead of
>>> PTE_BLOCK. And if I will know that something will be better to map
>>> as
>>> PTE_SMALL to not face shattering in case of unmap (for example) I
>>> just
>>> can map this memory as PTE_SMALL and that is it?
>>
>> That is it.
>>
>>>>>>> +    spin_unlock(&xen_pt_lock);
>>>>>>
>>>>>> Does this really need to come after fence and flush?
>>>>> I think yes, as page table should be updated only by 1 CPU at
>>>>> the
>>>>> same
>>>>> time. And before give ability to other CPU to update page table
>>>>> we
>>>>> have
>>>>> to finish a work on current CPU.
>>>>
>>>> Can you then explain to me, perhaps by way of an example, what
>>>> will
>>>> go
>>>> wrong if the unlock is ahead of the flush? (I'm less certain
>>>> about
>>>> the
>>>> fence, and that's also less expensive.)
>>> pt_update() will be called for interleaved region, for example, by
>>> different CPUs:
>>>
>>>                      pt_update():
>>> CPU1:                                    CPU2:
>>>  ...                                spin_lock(&xen_pt_lock);
>>> RISCV_FENCE(rw, rw);                 ....
>>>
>>> /* After this function will be
>>>    executed the following thing
>>>    can happen ------------------>  start to update page table
>>> */                                 entries which was partially     
>>> spin_unlock(&xen_pt_lock);         created during CPU1 but
>>> CPU2       
>>> ....                               doesn't know about them
>>> yet        
>>> ....                               because flush_tlb() ( sfence.vma
>>> ) 
>>> ....                               wasn't done      
>>> ....                                                               
>>>    
>>> flush_tlb_range_va();
>>
>> Not exactly: CPU2 knows about them as far as the memory used /
>> modified
>> goes, and that's all that matters for further page table
>> modifications.
>> CPU2 only doesn't know about the new page table entries yet when it
>> comes
>> to using them for a translation (by the hardware page walker). Yet
>> this
>> aspect is irrelevant here, if I'm not mistaken.
> And it isn't an issue that CPU2 will add these new page table entries
> again during execution of CPU2's pt_update()?

Add these page table entries again? That's only going to happen due to
another bug somewhere, I suppose. And it would be as much (or as little)
of an issue if that happened right after dropping the lock.

Jan

Reply via email to