This is all extremely helpful! I'll dig in and try this approach soon. > On Feb 28, 2019, at 11:11, Richard Henderson <richard.hender...@linaro.org> > wrote: > >> Are you thinking that this should be modeled as independent sets of TLBs, >> one per mode? > > One per segment you mean?
Yes. > Yes, exactly. Since each segment can have > independent segment base + limit + permissions. All of which would be taken > into account by tlb_fill when populating the TLB. > >> It seems easier to have a linear address MMU mode and then for the MMU modes >> corresponding to segment registers, perform an access and limit check, >> adjust the address by the segment base, and then go through the linear >> address MMU mode translation. > Except you need to generate extra calls at runtime to perform this > translation, > and you are not able to cache the result of the lookup against a second access > to the same page. I see. That makes sense. I didn't realize the results of the calls were being cached. > >> In particular, code that uses segments spends a lot of time changing the >> values of segment registers. E.g., in the movs example above, the ds segment >> may be overridden but the es segment cannot be, so to use the string move >> instructions within ds, es needs to be saved, modified, and then restored. > You are correct that this would result in two TLB flushes. > > But if MOVS executes a non-trivial number of iterations, we still may win. > > The work that Emilio Cota has done in this development cycle to make the size > of the softmmu TLBs dynamic will help here. It may well be that MOVS is used > with small memcpy, and there are a fair few flushes. But in that case the TLB > will be kept very small, and so the flush will not be expensive. I wonder if it would make sense to maintain a small cache of TLBs. The majority of cases are likely to involving setting segment registers to one of a handful of segments (e.g., setting es to ds or ss). So it might be nice to avoid the flushes entirely. > On the other hand, DS changes are rare (depending on the programming model), > and SS changes only on context switches. Their TLBs will keep their contents, > even while ES gets flushed. Work has been saved over adding explicit calls to > a linear address helper function. In my case, ds changes are pretty frequent—I count 75 instances of mov ds, __ and 124 instances of pop ds—in the executive (ring 0) portion of this firmware. Obviously the dynamic count is more interesting, but I don't have that off-hand. > The vast majority of x86 instructions have exactly one memory access, and it > uses the default segment (ds/ss) or the segment override. We can set this > default mmu index as soon as we have seen any segment override. > >> Returning to the movs example, the order of operations _must_ be >> 1. lea ds:[esi] >> 2. load 4 bytes >> 3. lea es:[edi] >> 4. store 4 bytes > > MOVS is one of the rare examples of two memory accesses within one > instruction. > Yes, we would have to special case this, and be careful to get everything > right. I agree that the vast majority of x86 instructions access at most one segment, but off-hand, I can think of a handful that access two: - movs - cmps - push r/m32 - pop r/m32 - call m32 - call m16:m32 I'm not sure if there are others. -- Stephen Checkoway