On 30 Jan 2024, at 09:02, Samuel Thibault <samuel.thiba...@gnu.org> wrote: > > Jessica Clarke, le mar. 30 janv. 2024 02:32:07 +0000, a ecrit: >> On 29 Jan 2024, at 10:20, Samuel Thibault <samuel.thiba...@gnu.org> wrote: >>> >>> Damien Zammit, le lun. 29 janv. 2024 10:07:30 +0000, a ecrit: >>>> - ljmp $BOOT_CS, $M(0f) >>>> + xorl %eax, %eax >>>> + mov %cs, %ax >>>> + shll $4, %eax >>>> + addl $M(0f), %eax >>>> + movl %eax, M(ljmp_offset32) >>> >>> This won't work with pipelined processors, which assume a complete >>> separation between code and data, and will thus have already loaded >>> the jmp instruction before your modify it. >> >> That’s true of most architectures, but not x86. It architecturally >> guarantees that self-modifying code works, > > ?? It was a very common way to detect pentium processors, back in the > time.
Ok, so I went and read 12.6 Self-Modiyfing Code of the Intel SDM Volume 3A (from December 2023), and it has this to say: > A write to a memory location in a code segment that is currently cached > in the processor causes the associated cache line (or lines) to be > invalidated. This check is based on the physical address of the > instruction. In addition, the P6 family and Pentium processors check > whether a write to a code segment may modify an instruction that has > been prefetched for execution. If the write affects a prefetched > instruction, the prefetch queue is invalidated. This latter check is > based on the linear address of the instruction. For the Pentium 4 and > Intel Xeon processors, a write or a snoop of an instruction in a code > segment, where the target instruction is already decoded and resident > in the trace cache, invalidates the entire trace cache. The latter > behavior means that programs that self-modify code can cause severe > degradation of performance when run on the Pentium 4 and Intel Xeon > processors. > > In practice, the check on linear addresses should not create > compatibility problems among IA-32 processors. Appli- cations that > include self-modifying code use the same linear address for modifying > and fetching the instruction. Systems software, such as a debugger, > that might possibly modify an instruction using a different linear > address than that used to fetch the instruction, will execute a > serializing operation, such as a CPUID instruction, before the modified > instruction is executed, which will automatically resynchronize the > instruction cache and prefetch queue. (See Section 9.1.3, “Handling > Self- and Cross-Modifying Code,” for more information about the use of > self-modi- fying code.) > > For Intel486 processors, a write to an instruction in the cache will > modify it in both the cache and memory, but if the instruction was > prefetched before the write, the old version of the instruction could > be the one executed. To prevent the old instruction from being > executed, flush the instruction prefetch unit by coding a jump > instruction immediately after any write that modifies an instruction. So, for anything above a 486, this code is correct. For a 386 and 486 you need to jump to the next instruction to invalidate the prefetch unit. I guess that’s what you were getting at? I had interpreted your comments as meaning that *modern* processors needed it. >>> Rather either perform the relocation from the C code, >> >> Were your statement true, that wouldn’t fix the problem, > > Isn't an IPI a synchronizing thing? Oh that’s true, I was being stupid and was thinking the C code would be running on the AP, but of course that’s nonsense. Patching the code from the BSP makes sense, and I believe is what FreeBSD does. Jess