[RFC] Embedded bitcode and related upstream (Part II)

Steven Wu via cfe-commits Fri, 03 Jun 2016 11:37:28 -0700

Hi everyone

I am still in the process of upstreaming some improvements to the embed bitcode 
option. If you want more background, you can read the previous RFC 
(http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html 
<http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>). This is 
part II of the discussion.


Current Status:
A basic version of -fembed-bitcode option is upstreamed and functioning.
You can use -fembed-bitcode={off, all, bitcode, marker} option to control what 
gets embedded in the final object file output:
off: default, nothing gets embedded.
all: optimized bitcode and command line options gets embedded in the object 
file.
bitcode: only optimized bitcode is embedded
marker: only put a marker in the object file

What needs to be improved:
1. Whitelist for command line options that can be used with bitcode:
Current trunk implementation embeds all the cc1 command line options (that 
includes header include paths, warning flags and other front-end options) in 
the command line section. That is lot of redundant information. To re-create 
the object file from the embedded optimized bitcode, most of these options are 
useless. On the other hand, they can leak information of the source code. One 
solution will be keeping a list of all the options that can affect code 
generation but not encoded in the bitcode. I have internally prototyped with 
disallowing these options explicitly and allowed only the reminder of the  
options to be embedded (http://reviews.llvm.org/D17394 
<http://reviews.llvm.org/D17394>). A better solution might be encoding that 
information in "Options.td" as specific group.

2. Assembly input handling:
This is a workaround to allow source code written in assembly to work with 
"-fembed-bitcode" options. When compiling assembly source code with 
"-fembed-bitcode", clang-as creates an empty section "__LLVM, __asm" in the 
object file. That is just a way to distinguish object files compiled from 
assembly source from those compiled from higher level source code but forgot to 
use "-fembed-bitcode" options. Linker can use this section to diagnose if 
"-fembed-bitcode" is consistently used on all the object files participated in 
the linking.

3. Bitcode symbol hiding:
There was some concerns for leaking source code information when using bitcode 
feature. One approach to avoid the leak is to add a pass which renames all the 
globals and metadata strings. The also keeps a reverse map in case the original 
name needs to be recovered. The final bitcode should contain no more symbols or 
debug info than a stripped binary. To make sure modified bitcode can still be 
linked correctly, the renaming need to be consistent across all bitcode 
participated in the linking and everything that is external of the linkage unit 
need to be preserved. This means the pass can only be run during the linking 
and requires some LTO api.

4. Debug info strip to line-tables pass:
As the name suggested, this pass strip down the full debug info to line-tables 
only. This is also one of the steps we took to prevent the leak of source code 
information in bitcode.

Please let me know what do you think about the pieces above or if you have any 
concerns about the methodology. I will put up patches for review soon.

Thanks

Steven

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[RFC] Embedded bitcode and related upstream (Part II)

Reply via email to