> On Dec 20, 2015, at 1:01 PM, Daniel Dunbar <daniel_dun...@apple.com> wrote:
> 
> Hi Nadav,

Hi Daniel, 

> 
> One thing you didn't call out here was any connection to visibility.

This is a great point! 

Apps usually don’t export too many symbols, but they do import symbols from 
external shared objects and need to keep the long names around. I created a 
basic Swift one-page iOS App using the Xcode wizard and the generated binary 
had a string-table of 70K**. The same app in Obj-C had a 12K linkedit section 
("size -m" can print this info). 

> 
> I would expect that in practice it is quite common for most of the symbols to 
> not be visible outside the linkage unit. Inside the linkage unit, the 
> compiler can already rename many of the symbols in any way it likes. Wouldn't 
> one tact for solving this problem simply be to encourage more pervasive use 
> of whole module optimization, and support compiler renaming of non-exported 
> symbols?

I don’t think that Swift Access Control (public/internal/private) is affected 
by WMO, but I will check. 

Thanks,
Nadav



** This is one of the names that the sample app has:

__TTSf4n_s_s_n___TTSg5GVSs12_ArrayBufferPSs9AnyObject__GS_PS0___Ss16_ArrayBufferTypeSs_GVSs15CollectionOfOnePS0___GS2_PS0___Ss14CollectionTypeSs_PS0___GVSs17IndexingGeneratorGS_PS0____GS4_GS_PS0____Ss13GeneratorTypeSs_PS0___PS0___GVSs12_SliceBufferPS0___GS6_PS0___Ss23_CollectionDefaultsTypeSsGS6_PS0___Ss32_CollectionGeneratorDefaultsTypeSs_GS4_GS6_PS0____GS4_GS6_PS0____S5_Ss_PS0___SiSiSs16ForwardIndexTypeSs_SiSiSs18_SignedIntegerTypeSs_SiSiSs33_BuiltinIntegerLiteralConvertibleSs_Si_PS0___GVSs14GeneratorOfOnePS0___GS12_PS0___S5_Ss_OSs3BitS13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_VSs20_DisabledRangeIndex__PS0___GVSs12_prext_SliceGS2_PS0____GS15_GS2_PS0____S7_SsGS15_GS2_PS0____S8_Ss_GS4_GS15_GS2_PS0_____GS4_GS15_GS2_PS0_____S5_Ss_PS0___S13_S13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_S14__PS0_____TFSs28_arrayNonSliceInPlaceReplaceu0_Rq_Ss16_ArrayBufferTypeq0_Ss14CollectionTypezqq_S_7Elementqqq0_Ss12SequenceType9GeneratorSs13GeneratorType7Elementzqq_Ss23_CollectionDefaultsType5IndexSizqqq_S3_5IndexSs16ForwardIndexType8DistanceSizqqq_S3_5IndexS4_19_DisabledRangeIndexSizqqqq_S3_5IndexS4_8DistanceSs25IntegerLiteralConvertible18IntegerLiteralTypeSi_FTRq_GVSs5RangeSi_Siq0__T_

> 
>  - Daniel
> 
>> On Dec 18, 2015, at 4:42 PM, Nadav Rotem via swift-dev <swift-dev@swift.org 
>> <mailto:swift-dev@swift.org>> wrote:
>> 
>> Hi Swift-Dev, 
>> 
>> We care deeply about the size of binaries that are generated by the Swift 
>> compiler and make an effort to optimize and shrink the generated binaries. 
>> One of the problems that we have today is that swift symbols are mangled 
>> into extremely long strings. This is especially a problem for libraries, and 
>> almost half of the size of libswiftCore.dylib (the swift runtime library on 
>> x86_64 OSX) is string tables. On MacOSX you can use the command ’size -m 
>> file.dylib’ to read the size of the string table.  C++ also suffers from the 
>> problem of long names, but since we control the Swift ABI we can do better 
>> than C++.
>> 
>> Here are two names that I found in our libraries:
>> 
>>  
>> __TIF14StdlibUnittest13checkSequenceu0_Rxs14CollectionType_s12SequenceTypeWx9Generator7Element_zW_9GeneratorS3__rFTxq_KT_SS9showFrameSb10stackTraceVS_14SourceLocStack4fileSS4lineSu16resiliencyChecksVS_32CollectionMisuseResiliencyChecks9sameValueFTWxS2_S3__WxS2_S3___Sb_T_A6_
>> 
>>  
>> __TTSg5VSS13CharacterViewS_s14CollectionTypes_GVs17IndexingGeneratorS__GS1_S__s13GeneratorTypes_VS_5IndexS3_s16ForwardIndexTypes_SiSis18_SignedIntegerTypes_SiSis33_BuiltinIntegerLiteralConvertibles_Vs20_DisabledRangeIndex__S_S_s9IndexablesS_s12SequenceTypes_GS1_S__GS1_S__S2_s_Vs9Character_S3_S3_S4_s_SiSiS5_s_SiSiS6_s_S7__S__S10__S10____TFFSS12replaceRangeuRxs14CollectionTypeWx9Generator7Element_zVs9CharacterrFTGVs5RangeVVSS13CharacterView5Index_4withx_T_U_FRS4_T_
>> 
>> One way to solve this problem is to implement a better string table format 
>> that will include some kind of compression or tree encoding into MachO, ELF 
>> and PE. The work on MachO, ELF and PE is beyond the scope of the Swift 
>> project and I’d like to focus on improving things on the Swift side. 
>> 
>> I see two independent tasks that we can do to shorten the length of string 
>> symbols. First, we can improve the existing mangling. We can do things like 
>> use non-decimal length specifiers, etc. The second task, which is the 
>> subject of the rest of this email, is the implementation of a compression 
>> layer on top of the existing mangling scheme. 
>> 
>> Compressing long symbols
>>  
>> The Swift compiler is free to encode Swift names in whatever way it likes, 
>> but we need to follow a few basic rules. First, the character encoding space 
>> needs to be [_a-zA-Z0-9]. We can’t use non-ascii characters to encode Swift 
>> names because this is the character space that ELF supports. Second, names 
>> need to be unique and independent. We can’t just compress the whole string 
>> table or create names that are a running counter because we need to be able 
>> to inspect each name independently. We need to be able to decode the name 
>> independently and extract the information that’s stored in the name.  We 
>> also need to be able to decode the name to be able to present something 
>> useful in the debugger. This means that a simple one directional hashing of 
>> the name is not an option.
>> 
>> With these restrictions in mind, I believe that the problem that we need to 
>> solve is independent compression of names. I suggest that we add a 
>> post-processing string compression pass. The input to the compress method 
>> would be a string mangled name, like the two names I showed above, and the 
>> output would be a shorter string. We would also need a decompression method 
>> that would return the original string. 
>> 
>> Obviously, gzipping each string independently won’t be effective. I believe 
>> that we need to implement our own compression algorithm here that will work 
>> with the restrictions that I mentioned above (being textual) and with the 
>> prior knowledge we have. 
>> 
>> 
>> Example of a basic compression algorithm 
>> 
>> In this section I describe a trivial compression algorithm that can reduce 
>> the size of the string tables by 30%. This algorithm is not optimal but it 
>> is a good starting point for our discussion. One example of "prior 
>> knowledge” that I mentioned above is that we know what are the common 
>> sub-strings that are used in Swift names.  Some of the most common 
>> substrings in Swift mangled names are:
>> 
>> 1. "S_S_S_S"
>> 2. "ollectio”  
>> 3. “Type”
>> 4. “Index”
>> 5. “erator”
>> 6. “7Element"
>> 7. “able"
>> 
>> We can use this prior knowledge about names of Swift types to compress our 
>> mangling!  Consider the following encoding:
>> 
>> A string like this:
>> 
>> __TTSg5VSS13CharacterView
>> 
>> Would be translated into something like:
>> 
>> __TTSg5<reference to word #67>13<reference to word #43>View
>> 
>> Commonly used strings can be encoded as references to global tables. We need 
>> to have some escape character (that needs to be ascii-printable), and use 
>> that character to encode the reference. 
>> One interesting bit of information is that the character ‘Y’ is only used 4 
>> times in the entire standard library! The letter J, and a few other letters 
>> are also not used very frequently. We can use these letters as escape 
>> characters. 
>> 
>> The basic encoding that I experimented with used the following rules:
>> 
>> 1. Use two escape characters that are not frequently used in names (Y and 
>> Z). These characters are escape character and can’t be used without 
>> escaping. ‘Y’ will be encoded as ‘YY’, and ‘Z’ would be encoded as ‘YZ’. 
>> 
>> 2. The most commonly used sub-strings (calculated as length of substring 
>> times number of occurrences) would be encoded with a single escape character 
>> and a single index character. The table of highly repetitive substrings can 
>> only contain 61 entries (a-zA-Z0-9_, minus two escape characters).
>> 
>> A reference to the very frequent substrings would be encoded as “Yx”, where 
>> x is the character that’s translated into a numeric index. This is two chars 
>> per substring. 
>> 
>> 3. The less commonly used substrings are encoded as three-character 
>> sequence. “Zxx”, where xx is the numeric index in the large table (that can 
>> hold 63*63 substrings).
>> 
>> It is obvious how to reverse this compression using the same string table 
>> used to compress the names. 
>> 
>> 
>> With this encoding scheme the name 
>> “__TwxxV14StdlibUnittest24MinimalForwardCollection” becomes 
>> “Y5wxxJ0UJ1HJ3_Jyn4J5VJ6SdY9on” and the name “__TMPVs15ContiguousArray” 
>> becomes “Y5MPJ3r5J5EJ82”.
>> 
>> I benchmarked this basic technique by using 5 dylibs as the training set 
>> (SwiftCore, Simd, unittests, etc) and compressed the string tables of these 
>> binaries. The result was a ~31% reduction in size.
>> 
>> <PastedGraphic-2.png>
>> 
>> 
>> On the Swift dylib itself (training set and input to the compressor) the 
>> wins are about ~25%.
>> 
>> This encoding makes the symbol name a lot less intuitive for humans, but at 
>> the SIL-level we already print human readable strings above function names 
>> as comments. I believe that the debugger would just work as-is since the 
>> internal APIs won’t change. 
>> 
>> One of the disadvantages of using a constant string table is that once we 
>> rename the APIs in the standard library the string table would be less 
>> effective in compressing it and code that uses it. 
>> 
>> What’s next?
>> 
>> The small experiment I described above showed that compressing the names in 
>> the string table has a huge potential for reducing the size of swift 
>> binaries. I’d like for us (swift-developers) to talk about the implications 
>> of this change and start working on the two tasks of tightening our existing 
>> mangling format and on implementing a new compression layer on top. 
>> 
>> 
>> Nadav
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev@swift.org <mailto:swift-dev@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev
> 

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to