Hello, Preempting some future work I'm expecting to arrive, I had a go at using __builtin_*() in obj32.
This is formed of 2 patches on top of this series: https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 Patch 1 introduces lib32 beside obj32, with strlen() being the first broken-out function, and patch 2 swaps to __builtin_strlen(). Both compile, but the difference that patch 2 introduces was unexpected. With just lib32, and taking strsubcmp() as an example, we get: 00000000 <strsubcmp>: 0: 83 ec 0c sub $0xc,%esp 3: 89 5c 24 04 mov %ebx,0x4(%esp) 7: 89 74 24 08 mov %esi,0x8(%esp) b: 89 c6 mov %eax,%esi d: 89 d3 mov %edx,%ebx f: 89 d0 mov %edx,%eax 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> 12: R_386_PC32 strlen 16: 89 c1 mov %eax,%ecx 18: 89 da mov %ebx,%edx 1a: 89 f0 mov %esi,%eax 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> 1d: R_386_PC32 .text.strncmp 21: 8b 5c 24 04 mov 0x4(%esp),%ebx 25: 8b 74 24 08 mov 0x8(%esp),%esi 29: 83 c4 0c add $0xc,%esp 2c: c3 ret which all seems fine. We get a plain PC32 relocation to strlen (which is now in the separate library). However, with patch 2 in place (simply swapping the plain extern for __builtin_strlen(), we now get: 00000000 <strsubcmp>: 0: 83 ec 0c sub $0xc,%esp 3: 89 1c 24 mov %ebx,(%esp) 6: 89 74 24 04 mov %esi,0x4(%esp) a: 89 7c 24 08 mov %edi,0x8(%esp) e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> f: R_386_PC32 __x86.get_pc_thunk.bx 13: 81 c3 02 00 00 00 add $0x2,%ebx 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ 19: 89 c7 mov %eax,%edi 1b: 89 d6 mov %edx,%esi 1d: 89 d0 mov %edx,%eax 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> 20: R_386_PLT32 strlen 24: 89 c1 mov %eax,%ecx 26: 89 f2 mov %esi,%edx 28: 89 f8 mov %edi,%eax 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> 2b: R_386_PC32 .text.strncmp 2f: 8b 1c 24 mov (%esp),%ebx 32: 8b 74 24 04 mov 0x4(%esp),%esi 36: 8b 7c 24 08 mov 0x8(%esp),%edi 3a: 83 c4 0c add $0xc,%esp 3d: c3 ret The builtin hasn't managed to optimise away the call to strlen (that's fine). But, we've ended up spilling %ebx to the stack, calculating the location of the GOT and not using it. So, as it stands, trying to use __builtin_strlen() results in worse code generation. One thing I noticed was that we're not passing -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help either. We do have the pragma from compiler.h, so I'm out of visibility ideas. Anything else I've missed? ~Andrew