On Thu, Jan 18, 2024 at 5:42 PM LIU Hao <lh_mo...@126.com> wrote: > > 在 2024-01-18 17:02, Fangrui Song 写道: > > Thanks for the proposal. I hope that -masm=intel becomes more useful:) > > > > Do you have a list of assembly in the unambiguous cases that fail to > > be parsed today as a gas PR? > > For example, > > Not really. Most of these are results from high-level languages. For example: > > # Expected: `movl shr(%rip), %eax` > # Actual: error: invalid use of operator "shr" > mov eax, DWORD PTR shr[rip] > > # Expected: `movl dword(%rip), %eax` > # Actual: accepted as `movl 4(%rip), %eax` > mov eax, DWORD ptr dword[rip]
GCC seems to print a symbol displacement, possibly with a modifier (for a relocation), before the left bracket. mov edx, DWORD PTR bx@GOT[eax] mov edx, DWORD PTR bx[eax] mov edx, DWORD PTR and[eax] # Error: invalid use of operator "and" Technically, assemblers (gas and LLVM integrated assembler) can be made to parse "bx" as a symbol, even if it matches a register name or an operator name ("and"). However, a straightforward approach using one lookahead token cannot disambiguate the following two cases. mov edx, DWORD PTR fs:[eax] # segment override prefix mov edx, DWORD PTR fs[eax] # symbol So, we would need two lookahead tokens... (https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550 needs more code to parse `fs:` correctly.) It is also unfortunate that whether the displacement is an immediate or not change the behavior of brackets. mov eax, DWORD PTR 0 # mov $0x0,%eax mov eax, DWORD PTR [0] # mov 0x0,%eax mov eax, DWORD PTR sym # mov 0x0,%eax with relocation mov eax, DWORD PTR [sym] # mov 0x0,%eax with relocation The above reveals yet another inconsistency. For a memory reference, it seems that we should use [] but [sym] could be ambiguous if sym matches a register name or operator name. Does the proposal change the placement of the displacement depending on whether it is an immediate? This is inconsistent, but perhaps there is not much we can improve... extern int a[2]; int foo() { return a[1]+a[2]; } GCC's PIC -masm=intel output mov eax, DWORD PTR a[rip+8] add eax, DWORD PTR a[rip+4] The displacements (a+8 and a+4) involve a plus expression and `a` and `8`/`4` are printed in two places. > In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to > `.intel_syntax noprefix`: > > $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o > {standard input}: Assembler messages: > {standard input}:1: Error: invalid use of register > > $ as <<< '.intel_syntax noprefix; mov eax, DWORD PTR gs:0x48' -o a.o && > objdump -Mintel -d a.o > ... > 0000000000000000 <.text>: > 0: 65 8b 04 25 48 00 00 mov eax,DWORD PTR gs:0x48 Confirmed by Jan.