Re: [swift-dev] Reconsidering the global uniqueness of type metadata and protocol conformance instances

John McCall via swift-dev Mon, 31 Jul 2017 15:05:18 -0700

> On Jul 31, 2017, at 12:13 PM, Joe Groff <[email protected]> wrote:
> 
> 
>> On Jul 30, 2017, at 7:55 PM, John McCall via swift-dev <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> On Jul 30, 2017, at 9:08 PM, Slava Pestov <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> On Jul 30, 2017, at 5:47 PM, John McCall <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> On Jul 29, 2017, at 7:35 PM, Slava Pestov <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> On Jul 29, 2017, at 12:53 PM, John McCall via swift-dev 
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>> On Jul 29, 2017, at 12:48 AM, Andrew Trick <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> On Jul 28, 2017, at 8:13 PM, John McCall <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> On Jul 28, 2017, at 11:11 PM, John McCall via swift-dev 
>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>> On Jul 28, 2017, at 10:38 PM, Andrew Trick <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>> On Jul 28, 2017, at 3:15 PM, John McCall <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>> On Jul 28, 2017, at 6:02 PM, Andrew Trick via swift-dev 
>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 28, 2017, at 2:20 PM, Joe Groff via swift-dev 
>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The Swift runtime currently maintains globally unique pointer 
>>>>>>>>>>>>> identities for type metadata and protocol conformances. This 
>>>>>>>>>>>>> makes checking type equivalence a trivial pointer equality 
>>>>>>>>>>>>> comparison, but most operations on generic values do not really 
>>>>>>>>>>>>> care about exact type identity and only need to invoke value or 
>>>>>>>>>>>>> protocol witness methods or consult other data in the type 
>>>>>>>>>>>>> metadata structure. I think it's worth reevaluating whether 
>>>>>>>>>>>>> having globally unique type metadata objects is the correct 
>>>>>>>>>>>>> design choice. Maintaining global uniqueness of metadata 
>>>>>>>>>>>>> instances carries a number of costs. Any code that wants type 
>>>>>>>>>>>>> metadata for an instance of a generic type, even a fully concrete 
>>>>>>>>>>>>> one, must make a potentially expensive runtime call to get the 
>>>>>>>>>>>>> canonical metadata instance. This also greatly complicates our 
>>>>>>>>>>>>> ability to emit specializations of type metadata, value witness 
>>>>>>>>>>>>> tables, or protocol witness tables for concrete instances of 
>>>>>>>>>>>>> generic types, since specializations would need to be registered 
>>>>>>>>>>>>> with the runtime as canonical metadata objects, and it would be 
>>>>>>>>>>>>> difficult to do this lazily and still reliably favor 
>>>>>>>>>>>>> specializations over more generic witnesses. The lack of witness 
>>>>>>>>>>>>> table specializations leaves an obnoxious performance cliff for 
>>>>>>>>>>>>> instances of generic types that end up inside existential 
>>>>>>>>>>>>> containers or cross into unspecialized code. The runtime also 
>>>>>>>>>>>>> obligates binaries to provide the canonical metadata for all of 
>>>>>>>>>>>>> their public types, along with all the dependent value witnesses, 
>>>>>>>>>>>>> class methods, and protocol witness tables, meaning a type 
>>>>>>>>>>>>> abstraction can never be completely "zero-cost" across modules.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On the other hand, if type metadata did not need to be unique, 
>>>>>>>>>>>>> then the compiler would be free to emit specialized type metadata 
>>>>>>>>>>>>> and protocol witness tables for fully concrete non-concrete value 
>>>>>>>>>>>>> types without consulting the runtime. This would let us avoid 
>>>>>>>>>>>>> runtime calls to fetch metadata in specialized code, and would 
>>>>>>>>>>>>> make it much easier for us to implement witness specialization. 
>>>>>>>>>>>>> It would also give us the ability to potentially extend the 
>>>>>>>>>>>>> "inlinable" concept to public fragile types, making it a client's 
>>>>>>>>>>>>> responsibility to emit metadata for the type when needed and 
>>>>>>>>>>>>> keeping the type from affecting its home module's ABI. This could 
>>>>>>>>>>>>> significantly reduce the size and ABI surface area of the 
>>>>>>>>>>>>> standard library, since the standard library contains a lot of 
>>>>>>>>>>>>> generic lightweight adapter types for collections and other 
>>>>>>>>>>>>> abstractions that are intended to be optimized away in most use 
>>>>>>>>>>>>> cases.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are of course benefits to globally unique metadata objects 
>>>>>>>>>>>>> that we would lose if we gave up uniqueness. Operations that do 
>>>>>>>>>>>>> check type identity, such as comparison, hashing, and dynamic 
>>>>>>>>>>>>> casting, would have to perform more expensive checks, and 
>>>>>>>>>>>>> nonunique metadata objects would need to carry additional 
>>>>>>>>>>>>> information to enable those checks. It is likely that class 
>>>>>>>>>>>>> objects would have to remain globally unique, if for no other 
>>>>>>>>>>>>> reason than that the Objective-C runtime requires it on Apple 
>>>>>>>>>>>>> platforms. Having multiple equivalent copies of type metadata has 
>>>>>>>>>>>>> the potential to increase the working set of an app in some 
>>>>>>>>>>>>> situations, although it's likely that redundant compiler-emitted 
>>>>>>>>>>>>> copies of value type metadata would at least be able to live in 
>>>>>>>>>>>>> constant pages mapped from disk instead of getting dynamically 
>>>>>>>>>>>>> instantiated by the runtime like everything is today. There could 
>>>>>>>>>>>>> also be subtle source-breaking behavior for code that bitcasts 
>>>>>>>>>>>>> metatype values to integers or pointers and expects bit-level 
>>>>>>>>>>>>> equality to indicate type equality. It's unlikely to me that 
>>>>>>>>>>>>> giving up uniqueness would buy us any simplification to the 
>>>>>>>>>>>>> runtime, since the runtime would still need to be able to 
>>>>>>>>>>>>> instantiate metadata for unspecialized code, and we would still 
>>>>>>>>>>>>> want to unique runtime-instantiated metadata objects as an 
>>>>>>>>>>>>> optimization.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Overall, my intuition is that the tradeoffs come out in favor for 
>>>>>>>>>>>>> nonunique metadata objects, but what do you all think? Is there 
>>>>>>>>>>>>> anything I'm missing?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Joe
>>>>>>>>>>>> 
>>>>>>>>>>>> In a premature proposal two years ago, we agreed to ditch unique 
>>>>>>>>>>>> protocol conformances but install the canonical address as the 
>>>>>>>>>>>> first entry in each specialized table.
>>>>>>>>>>> 
>>>>>>>>>>> This would be a reference to (unique) global data about the 
>>>>>>>>>>> conformance, not a reference to some canonical version of the 
>>>>>>>>>>> protocol witness table.  We do not rely on having a canonical 
>>>>>>>>>>> protocol witness table.  The only reason we unique them (when we do 
>>>>>>>>>>> need to instantiate) is because we don't want to track their 
>>>>>>>>>>> lifetimes.
>>>>>>>>>>> 
>>>>>>>>>>>> That would mitigate the disadvantages that you pointed to. But, we 
>>>>>>>>>>>> would also lose the ability to emit specialized 
>>>>>>>>>>>> metadata/conformances in constant pages. How do you feel about 
>>>>>>>>>>>> that tradeoff?
>>>>>>>>>>> 
>>>>>>>>>>> Note that, per above, it's only specialized constant type metadata 
>>>>>>>>>>> that we would lose.
>>>>>>>>>>> 
>>>>>>>>>>> I continue to feel that having to do structural equality tests on 
>>>>>>>>>>> type metadata would be a huge loss.
>>>>>>>>>>> 
>>>>>>>>>>> John.
>>>>>>>>>> 
>>>>>>>>>> My question was really, are we going to runtime-initialize the 
>>>>>>>>>> specialized metadata and specialized witness tables in order to 
>>>>>>>>>> install the unique identifier, rather than requiring a runtime call 
>>>>>>>>>> whenever we need the unique ID. I think the answer is “yes”, we want 
>>>>>>>>>> to install the ID at initialization time for fast type comparison, 
>>>>>>>>>> hashing and casting.
>>>>>>>>> 
>>>>>>>>> Sorry, by "(unique) global data about the conformance" I meant that 
>>>>>>>>> we would emit a global conformance descriptor in constant data for 
>>>>>>>>> the conformance declaration.  There would be one of these, no matter 
>>>>>>>>> how many it was instantiated; it would therefore uniquely identify a 
>>>>>>>>> possible generic conformance the same way that a nominal type 
>>>>>>>>> descriptor uniquely identifies a possibly generic type.  The 
>>>>>>>>> reference to it would just be an ordinary symbol reference.
>>>>>>>> 
>>>>>>>> Naturally, eagerly emitting one of those has the same advantages and 
>>>>>>>> disadvantages as eagerly emitting type metadata and everything else, 
>>>>>>>> and can be solved in the same way.
>>>>>>>> 
>>>>>>>> John.
>>>>>>> 
>>>>>>> Sure, for witness tables each constant specialized conformance can 
>>>>>>> refer to a unique constant nominal conformance, resolved at link-time.
>>>>>>> 
>>>>>>> Whereas we expect specialized type metadata to always need some runtime 
>>>>>>> initialization because we want to unique some canonical entity for each 
>>>>>>> instantiation and possibly compress VWTs.
>>>>>> 
>>>>>> Oh, I missed that you were talking about both, sorry.  If we wanted to 
>>>>>> emit specialized type metadata, I think it would have to be an explicit 
>>>>>> goal that they could be emitted without any sort of dynamic 
>>>>>> initialization, which implies that they're non-unique. 
>>>>> 
>>>>> I was wondering about that. I’m still having trouble filling in the 
>>>>> details, but it seems that if non-unique type metadata never ‘escapes’ 
>>>>> from a function, we could stack-allocate ‘structural’ metadata, for 
>>>>> example if you have
>>>>> 
>>>>> func foo<T>(_: T) {}
>>>>> 
>>>>> func bar<T>(x: T, y: Y) {
>>>>>   foo((x, y))
>>>>> }
>>>>> 
>>>>> You would be able to compile bar() without any runtime calls at all, 
>>>>> building the tuple type metadata ‘from scratch’ on the stack and passing 
>>>>> it to foo(). Perhaps generic nominal types could also be constructed 
>>>>> non-uniquely without a runtime call.
>>>> 
>>>> Okay.  What would be required to prove that type metadata never escapes 
>>>> from a function?
>>> 
>>> Well, we could say that metadata is uniqued before being reified into a 
>>> value (T.self) or when constructing an existential, etc. Other than that, I 
>>> think the only thing we do with metadata is pass it to other functions?
>> 
>> Hmm, yes, I guess we could make sure that everything that uses metadata in 
>> any way that might escape just uniques it at that point.
>> 
>> Your tuple example is interesting because it would actually be quite 
>> elaborate to construct on the fly every time we needed it, since we'd have 
>> to perform type layout dynamically and form a complete value witness table.  
>> I hope you're not anticipating inlining that into every construction site?
> 
> Being able to reclaim memory for dynamically-generated type metadata in at 
> least some situations feels compelling to me, since our current design always 
> "leaks" metadata memory and could probably be induced to pathologically waste 
> memory if an attacker put their mind to it.


Supposing that there is such an attack, it seems unlikely to me that it 
couldn't be made to involve a class or an existential or some other thing that 
forced heap-allocation.

> It's conceivable that we could provide entry points for dynamically 
> generating temporary metadata into caller-provided stack space, or if LLVM 
> theoretically had alloca-into-caller, have runtime entry points that do the 
> stack allocation and temporary metadata instantiation. The analysis to 
> balance the tradeoff between regenerating a metadata record multiple times 
> vs. creating and caching it once is probably nontrivial for the compiler to 
> figure out ahead of time, though.

Profile-guided, maybe?

But I think I've made my point that this would be a huge research project that 
I can't imagine us finding the time for in the next year.  Forward declarations 
are already going to represent a huge revision to the metadata system, and one 
that is substantially more urgent to solve for ABI stability.

John.

_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] Reconsidering the global uniqueness of type metadata and protocol conformance instances

Reply via email to