On 15.08.2024 13:21, oleksii.kuroc...@gmail.com wrote:
> On Thu, 2024-08-15 at 10:09 +0200, Jan Beulich wrote:
>> On 14.08.2024 18:50, oleksii.kuroc...@gmail.com wrote:
>>> On Tue, 2024-08-13 at 12:31 +0200, Jan Beulich wrote:
>>>> On 09.08.2024 18:19, Oleksii Kurochko wrote:
>>>>> RISC-V detects superpages using `pte.x` and `pte.r`, as there
>>>>> is no specific bit in the PTE for this purpose. From the RISC-V
>>>>> spec:
>>>>> ```
>>>>>   ...
>>>>>   4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x = 1, go
>>>>> to
>>>>> step 5.
>>>>>      Otherwise, this PTE is a pointer to the next level of the
>>>>> page
>>>>> table.
>>>>>      ... .
>>>>>   5. A leaf PTE has been found.
>>>>>      ...
>>>>>   ...
>>>>> ```
>>>>>
>>>>> The code doesn’t support super page shattering so 4KB pages are
>>>>> used as
>>>>> default.
>>>>
>>>> ... you have this. Yet still callers expecting re-mapping in the
>>>> (large)
>>>> range they map can request small-page mappings right away.
>>> I am not sure that I fully understand what do you mean by "expcting
>>> re-
>>> mapping".
>>
>> Right now you have callers pass PTE_BLOCK when they know that no
>> small
>> page re-mappings are going to occur for an area. What I'm suggesting
>> is
>> that you invert this logic: Have callers pass PTE_SMALL when there is
>> a possibility that re-mapping requests may be issued later. Then,
>> later, by simply grep-ing for PTE_SMALL you'll be able to easily find
>> all candidates that possibly can be relaxed when super-page
>> shattering
>> starts being supported. That's going to be easier than finding all
>> instances where PTE_BLOCK is _not_used.
> So if I understand correctly. Actually nothing will change in algorithm
> of pt_update() and only PTE_SMALL should be introduced instead of
> PTE_BLOCK. And if I will know that something will be better to map as
> PTE_SMALL to not face shattering in case of unmap (for example) I just
> can map this memory as PTE_SMALL and that is it?

That is it.

>>>>> +    spin_unlock(&xen_pt_lock);
>>>>
>>>> Does this really need to come after fence and flush?
>>> I think yes, as page table should be updated only by 1 CPU at the
>>> same
>>> time. And before give ability to other CPU to update page table we
>>> have
>>> to finish a work on current CPU.
>>
>> Can you then explain to me, perhaps by way of an example, what will
>> go
>> wrong if the unlock is ahead of the flush? (I'm less certain about
>> the
>> fence, and that's also less expensive.)
> pt_update() will be called for interleaved region, for example, by
> different CPUs:
> 
>                      pt_update():
> CPU1:                                    CPU2:
>  ...                                spin_lock(&xen_pt_lock);
> RISCV_FENCE(rw, rw);                 ....
> 
> /* After this function will be
>    executed the following thing
>    can happen ------------------>  start to update page table
> */                                 entries which was partially      
> spin_unlock(&xen_pt_lock);         created during CPU1 but CPU2       
> ....                               doesn't know about them yet        
> ....                               because flush_tlb() ( sfence.vma ) 
> ....                               wasn't done      
> ....                                                                  
> flush_tlb_range_va();

Not exactly: CPU2 knows about them as far as the memory used / modified
goes, and that's all that matters for further page table modifications.
CPU2 only doesn't know about the new page table entries yet when it comes
to using them for a translation (by the hardware page walker). Yet this
aspect is irrelevant here, if I'm not mistaken.

Jan

Reply via email to