On Thu, Jan 18, 2024 at 5:42 PM LIU Hao <lh_mo...@126.com> wrote:
>
> 在 2024-01-18 17:02, Fangrui Song 写道:
> > Thanks for the proposal. I hope that -masm=intel becomes more useful:)
> >
> > Do you have a list of assembly in the unambiguous cases that fail to
> > be parsed today as a gas PR?
> > For example,
>
> Not really. Most of these are results from high-level languages. For example:
>
>     # Expected: `movl shr(%rip), %eax`
>     # Actual: error: invalid use of operator "shr"
>     mov eax, DWORD PTR shr[rip]
>
>     # Expected: `movl dword(%rip), %eax`
>     # Actual: accepted as `movl 4(%rip), %eax`
>     mov eax, DWORD ptr dword[rip]

GCC seems to print a symbol displacement, possibly with a modifier
(for a relocation), before the left bracket.

mov edx, DWORD PTR bx@GOT[eax]
mov edx, DWORD PTR bx[eax]
mov edx, DWORD PTR and[eax]    # Error: invalid use of operator "and"

Technically, assemblers (gas and LLVM integrated assembler) can be
made to parse "bx" as a symbol, even if it matches a register name or
an operator name ("and").
However, a straightforward approach using one lookahead token cannot
disambiguate the following two cases.

mov edx, DWORD PTR fs:[eax]   # segment override prefix
mov edx, DWORD PTR fs[eax]    # symbol

So, we would need two lookahead tokens...
(https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L2534-L2550
needs more code to parse `fs:` correctly.)

It is also unfortunate that whether the displacement is an immediate
or not change the behavior of brackets.

mov eax, DWORD PTR 0          # mov    $0x0,%eax
mov eax, DWORD PTR [0]        # mov    0x0,%eax
mov eax, DWORD PTR sym        # mov    0x0,%eax with relocation
mov eax, DWORD PTR [sym]      # mov    0x0,%eax with relocation

The above reveals yet another inconsistency. For a memory reference,
it seems that we should use [] but [sym] could be ambiguous if sym
matches a register name or operator name.

Does the proposal change the placement of the displacement depending
on whether it is an immediate?
This is inconsistent, but perhaps there is not much we can improve...

extern int a[2];
int foo() { return a[1]+a[2]; }

GCC's PIC -masm=intel output

        mov     eax, DWORD PTR a[rip+8]
        add     eax, DWORD PTR a[rip+4]

The displacements (a+8 and a+4) involve a plus expression and `a` and
`8`/`4` are printed in two places.

> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to 
> `.intel_syntax noprefix`:
>
>     $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
>     {standard input}: Assembler messages:
>     {standard input}:1: Error: invalid use of register
>
>     $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && 
> objdump -Mintel -d a.o
>     ...
>     0000000000000000 <.text>:
>        0:       65 8b 04 25 48 00 00    mov    eax,DWORD PTR gs:0x48

Confirmed by Jan.

Reply via email to