> On Dec 20, 2015, at 1:01 PM, Daniel Dunbar <daniel_dun...@apple.com> wrote: > > Hi Nadav,
Hi Daniel, > > One thing you didn't call out here was any connection to visibility. This is a great point! Apps usually don’t export too many symbols, but they do import symbols from external shared objects and need to keep the long names around. I created a basic Swift one-page iOS App using the Xcode wizard and the generated binary had a string-table of 70K**. The same app in Obj-C had a 12K linkedit section ("size -m" can print this info). > > I would expect that in practice it is quite common for most of the symbols to > not be visible outside the linkage unit. Inside the linkage unit, the > compiler can already rename many of the symbols in any way it likes. Wouldn't > one tact for solving this problem simply be to encourage more pervasive use > of whole module optimization, and support compiler renaming of non-exported > symbols? I don’t think that Swift Access Control (public/internal/private) is affected by WMO, but I will check. Thanks, Nadav ** This is one of the names that the sample app has: __TTSf4n_s_s_n___TTSg5GVSs12_ArrayBufferPSs9AnyObject__GS_PS0___Ss16_ArrayBufferTypeSs_GVSs15CollectionOfOnePS0___GS2_PS0___Ss14CollectionTypeSs_PS0___GVSs17IndexingGeneratorGS_PS0____GS4_GS_PS0____Ss13GeneratorTypeSs_PS0___PS0___GVSs12_SliceBufferPS0___GS6_PS0___Ss23_CollectionDefaultsTypeSsGS6_PS0___Ss32_CollectionGeneratorDefaultsTypeSs_GS4_GS6_PS0____GS4_GS6_PS0____S5_Ss_PS0___SiSiSs16ForwardIndexTypeSs_SiSiSs18_SignedIntegerTypeSs_SiSiSs33_BuiltinIntegerLiteralConvertibleSs_Si_PS0___GVSs14GeneratorOfOnePS0___GS12_PS0___S5_Ss_OSs3BitS13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_VSs20_DisabledRangeIndex__PS0___GVSs12_prext_SliceGS2_PS0____GS15_GS2_PS0____S7_SsGS15_GS2_PS0____S8_Ss_GS4_GS15_GS2_PS0_____GS4_GS15_GS2_PS0_____S5_Ss_PS0___S13_S13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_S14__PS0_____TFSs28_arrayNonSliceInPlaceReplaceu0_Rq_Ss16_ArrayBufferTypeq0_Ss14CollectionTypezqq_S_7Elementqqq0_Ss12SequenceType9GeneratorSs13GeneratorType7Elementzqq_Ss23_CollectionDefaultsType5IndexSizqqq_S3_5IndexSs16ForwardIndexType8DistanceSizqqq_S3_5IndexS4_19_DisabledRangeIndexSizqqqq_S3_5IndexS4_8DistanceSs25IntegerLiteralConvertible18IntegerLiteralTypeSi_FTRq_GVSs5RangeSi_Siq0__T_ > > - Daniel > >> On Dec 18, 2015, at 4:42 PM, Nadav Rotem via swift-dev <swift-dev@swift.org >> <mailto:swift-dev@swift.org>> wrote: >> >> Hi Swift-Dev, >> >> We care deeply about the size of binaries that are generated by the Swift >> compiler and make an effort to optimize and shrink the generated binaries. >> One of the problems that we have today is that swift symbols are mangled >> into extremely long strings. This is especially a problem for libraries, and >> almost half of the size of libswiftCore.dylib (the swift runtime library on >> x86_64 OSX) is string tables. On MacOSX you can use the command ’size -m >> file.dylib’ to read the size of the string table. C++ also suffers from the >> problem of long names, but since we control the Swift ABI we can do better >> than C++. >> >> Here are two names that I found in our libraries: >> >> >> __TIF14StdlibUnittest13checkSequenceu0_Rxs14CollectionType_s12SequenceTypeWx9Generator7Element_zW_9GeneratorS3__rFTxq_KT_SS9showFrameSb10stackTraceVS_14SourceLocStack4fileSS4lineSu16resiliencyChecksVS_32CollectionMisuseResiliencyChecks9sameValueFTWxS2_S3__WxS2_S3___Sb_T_A6_ >> >> >> __TTSg5VSS13CharacterViewS_s14CollectionTypes_GVs17IndexingGeneratorS__GS1_S__s13GeneratorTypes_VS_5IndexS3_s16ForwardIndexTypes_SiSis18_SignedIntegerTypes_SiSis33_BuiltinIntegerLiteralConvertibles_Vs20_DisabledRangeIndex__S_S_s9IndexablesS_s12SequenceTypes_GS1_S__GS1_S__S2_s_Vs9Character_S3_S3_S4_s_SiSiS5_s_SiSiS6_s_S7__S__S10__S10____TFFSS12replaceRangeuRxs14CollectionTypeWx9Generator7Element_zVs9CharacterrFTGVs5RangeVVSS13CharacterView5Index_4withx_T_U_FRS4_T_ >> >> One way to solve this problem is to implement a better string table format >> that will include some kind of compression or tree encoding into MachO, ELF >> and PE. The work on MachO, ELF and PE is beyond the scope of the Swift >> project and I’d like to focus on improving things on the Swift side. >> >> I see two independent tasks that we can do to shorten the length of string >> symbols. First, we can improve the existing mangling. We can do things like >> use non-decimal length specifiers, etc. The second task, which is the >> subject of the rest of this email, is the implementation of a compression >> layer on top of the existing mangling scheme. >> >> Compressing long symbols >> >> The Swift compiler is free to encode Swift names in whatever way it likes, >> but we need to follow a few basic rules. First, the character encoding space >> needs to be [_a-zA-Z0-9]. We can’t use non-ascii characters to encode Swift >> names because this is the character space that ELF supports. Second, names >> need to be unique and independent. We can’t just compress the whole string >> table or create names that are a running counter because we need to be able >> to inspect each name independently. We need to be able to decode the name >> independently and extract the information that’s stored in the name. We >> also need to be able to decode the name to be able to present something >> useful in the debugger. This means that a simple one directional hashing of >> the name is not an option. >> >> With these restrictions in mind, I believe that the problem that we need to >> solve is independent compression of names. I suggest that we add a >> post-processing string compression pass. The input to the compress method >> would be a string mangled name, like the two names I showed above, and the >> output would be a shorter string. We would also need a decompression method >> that would return the original string. >> >> Obviously, gzipping each string independently won’t be effective. I believe >> that we need to implement our own compression algorithm here that will work >> with the restrictions that I mentioned above (being textual) and with the >> prior knowledge we have. >> >> >> Example of a basic compression algorithm >> >> In this section I describe a trivial compression algorithm that can reduce >> the size of the string tables by 30%. This algorithm is not optimal but it >> is a good starting point for our discussion. One example of "prior >> knowledge” that I mentioned above is that we know what are the common >> sub-strings that are used in Swift names. Some of the most common >> substrings in Swift mangled names are: >> >> 1. "S_S_S_S" >> 2. "ollectio” >> 3. “Type” >> 4. “Index” >> 5. “erator” >> 6. “7Element" >> 7. “able" >> >> We can use this prior knowledge about names of Swift types to compress our >> mangling! Consider the following encoding: >> >> A string like this: >> >> __TTSg5VSS13CharacterView >> >> Would be translated into something like: >> >> __TTSg5<reference to word #67>13<reference to word #43>View >> >> Commonly used strings can be encoded as references to global tables. We need >> to have some escape character (that needs to be ascii-printable), and use >> that character to encode the reference. >> One interesting bit of information is that the character ‘Y’ is only used 4 >> times in the entire standard library! The letter J, and a few other letters >> are also not used very frequently. We can use these letters as escape >> characters. >> >> The basic encoding that I experimented with used the following rules: >> >> 1. Use two escape characters that are not frequently used in names (Y and >> Z). These characters are escape character and can’t be used without >> escaping. ‘Y’ will be encoded as ‘YY’, and ‘Z’ would be encoded as ‘YZ’. >> >> 2. The most commonly used sub-strings (calculated as length of substring >> times number of occurrences) would be encoded with a single escape character >> and a single index character. The table of highly repetitive substrings can >> only contain 61 entries (a-zA-Z0-9_, minus two escape characters). >> >> A reference to the very frequent substrings would be encoded as “Yx”, where >> x is the character that’s translated into a numeric index. This is two chars >> per substring. >> >> 3. The less commonly used substrings are encoded as three-character >> sequence. “Zxx”, where xx is the numeric index in the large table (that can >> hold 63*63 substrings). >> >> It is obvious how to reverse this compression using the same string table >> used to compress the names. >> >> >> With this encoding scheme the name >> “__TwxxV14StdlibUnittest24MinimalForwardCollection” becomes >> “Y5wxxJ0UJ1HJ3_Jyn4J5VJ6SdY9on” and the name “__TMPVs15ContiguousArray” >> becomes “Y5MPJ3r5J5EJ82”. >> >> I benchmarked this basic technique by using 5 dylibs as the training set >> (SwiftCore, Simd, unittests, etc) and compressed the string tables of these >> binaries. The result was a ~31% reduction in size. >> >> <PastedGraphic-2.png> >> >> >> On the Swift dylib itself (training set and input to the compressor) the >> wins are about ~25%. >> >> This encoding makes the symbol name a lot less intuitive for humans, but at >> the SIL-level we already print human readable strings above function names >> as comments. I believe that the debugger would just work as-is since the >> internal APIs won’t change. >> >> One of the disadvantages of using a constant string table is that once we >> rename the APIs in the standard library the string table would be less >> effective in compressing it and code that uses it. >> >> What’s next? >> >> The small experiment I described above showed that compressing the names in >> the string table has a huge potential for reducing the size of swift >> binaries. I’d like for us (swift-developers) to talk about the implications >> of this change and start working on the two tasks of tightening our existing >> mangling format and on implementing a new compression layer on top. >> >> >> Nadav >> >> _______________________________________________ >> swift-dev mailing list >> swift-dev@swift.org <mailto:swift-dev@swift.org> >> https://lists.swift.org/mailman/listinfo/swift-dev >
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev