[swift-dev] A type-checking performance case study

John McCall via swift-dev Fri, 08 Apr 2016 11:48:03 -0700

The standard library team asked the compiler team to take a look at the 
performance of the type-checker.  Their observation was that several 
in-progress additions to the standard library were causing a disproportionate 
increase in the time it required to compile a simple-seeming expression, and 
they were concerned that this was going to block library progress.  This is the 
resulting analysis.


Thanks go to Doug Gregor for being a sounding board for a lot of ideas that 
came up during this investigation.

1. Introduction

In general, it is expected that changes to the source code can cause non-linear 
increases in type-checking time.  For one, all type systems with some form of 
generic type propagation have the ability to create exponentially-large types 
that will generally require exponential work to manipulate.  But Swift also 
introduces several forms of disjunctive constraint, most notably overload 
resolution, and solving these can require combinatorial explosions in 
type-checker work.  Nonetheless, for any particular problem, there are usually 
things we can do to avoid or limit that combinatorial explosion, and so it is 
always reasonable to take a look.

Some background for the rest of this discussion.  The current semantics for 
name lookup, member or otherwise, are that all possible matches are dropped 
into an overload set with no explicit ranking between them.  If solutions can 
be found with multiple overloads, they are disambiguated (hopefully) using the 
ambiguity resolution rules.  Some aspects of the ambiguity resolution rules do 
apply to declarations in the abstract; for example, there is a rule that 
prefers a solution that picks a function with a strictly-more-specialized 
signature.  However, in general, such rules cannot be applied immediately 
during type-checking.  This is because of the "score" system in which solutions 
are ranked according to the kind and number of various undesired implicit 
conversions they perform: for example, it is possible that a solution using a 
more-specialized function will end up requiring more undesired conversions and 
thus rank worse.  Therefore, the type-checker must find all possible solutions 
and only then rank them, which often prohibits short-circuits of the 
combinatorial explosion.

2. The problem

Okay, concrete stuff.  The expression in question is:

  (0..<5).lazy.map { String($0) }

The types of '0' and '5' are not immediately constrained; in fact, it will turn 
out that they can only be constrained by assigning them their default literal 
type.  Many expressions that are expensive to type-check feature this property, 
because the assignment of default types must occur very late in type-checking, 
as a matter of last resort.

'..<' has two overloads, one requiring the value type to be Comparable and a 
more specialized one which also requires it to be Strideable.  They yield 
different types.

'lazy' is a member name and therefore can only be resolved when the base type 
is known.  As it happens, in both cases the overload set will end up the same: 
a small number of overloads from various protocol extensions.  The proposed 
library change adds several of these, greatly exacerbating the explosion.  All 
of these definitions yield different types.

'map' is a member name and therefore can only be resolved when the base type is 
known.  It has a small number of overloads in different types and protocol 
extensions.

'String.init' has a huge number of overloads, somewhere around fifty.

3. Current type-checker behavior

The type-checker ends up taking the right basic approach for solving this: it 
experimentally picks one of the overloads for '..<', then an overload for 
'.lazy', then an overload for '.map', then an overload for 'String.init'.

As the constraint system is progressively solved, it will turn out that the 
value type of the integer literals will propagate all the way down to be the 
type of '$0' and hence of the argument to String.  This sort of propagation 
could, in general, require the literals to have a specific type.  In this case, 
since 'String(x)' will take basically anything, the literals are not 
constrained, and we will end up defaulting their types.  I mention this only to 
underline why the type-checker cannot apply this defaulting quickly.

Because the literals end up needing to be defaulted, and because there are two 
possible default types for integer literals (we will use 'Double' if 'Int' does 
not work), the basic combinatoric formula for this constraint system is 
something like 2 x 4 x 4 x 50 x 2 x 2.  (I don't remember the actual numbers of 
overloads.)  This is why type-checking is slow.

4. Possible solutions

As a patch to the problem, I recognized that many of those 50 overloads of 
'String.init' were not viable due to argument-label mismatches, but were only 
being recognized as such in the innermost loop.  (The rule here is that you 
cannot pass a tuple value and have it automatically "explode" to match the 
parameter list; the syntactic call arguments must structurally match the 
parameters.  This was an accepted proposal with corresponding compiler 
implementation from Februrary.  Unfortunately, it turns out that our 
enforcement of this is inconsistent.)  It was simple enough to avoid this by 
marking the overloads as non-viable upon lookup; it even turns out that we 
already has this heuristic, and it just hadn't kept pace with the language 
change.  So this limits the explosion in this case.

There are various other type-checking heuristics that might also have helped 
with this.  For example, the overload sets of 'lazy' are the same in both 
cases; this is something that could theoretically be observed without expanding 
the overload set of '..<', and in some cases this might allow some downstream 
type-checking to proceed outside of any contingencies.  I don't know that it 
would have significantly helped in this case, however.

Anything beyond that will require language changes.

Note that there have been proposals about restricting various kinds of implicit 
conversions.  Those proposals are not pertinent here; the implicit conversion 
are not a significant contributor to this particular problem.  The real problem 
is overloading.

One very drastic solution to this problem would be to eliminate non-labelled 
overloads.  (A perhaps more-reasonable way to describe this is to say that 
argument labels are a part of the name, and therefore two declarations with 
different labels are not overloads.)  That is, given the syntactic form of an 
expression, we would always be able to find a single function that it could 
invoke.  The current argument-labelling design of the language does make this 
more palatable than it was in earlier iterations.  However, I don't think it's 
palatable enough: this change would be very limiting, and in particular I think 
we are very unlikely to accept it for operators or single-argument initializers.

A less drastic solution suggests itself when we examine the nature of these 
overloads.  '..<', 'lazy', and 'map' all obey an implicit hierarchy of 
overloads, each imposing stricter constraints than the last.  Furthermore, 
specifically for 'lazy' and 'map', this hierarchy actually follows the explicit 
hierarchy of types and the protocols they conform to.  Perhaps we could use 
this to declare that certain declarations unconditionally hide declarations 
with the same name from "base" types: for example, a declaration from a 
concrete type would always hide a declaration from a protocol (or an extension 
thereof) that the type declares conformance to, and a declaration from a 
protocol would hide a declaration from a protocol it is derived from.  (This 
rule could still be implemented in the presence of conditional conformances, 
but it would be more difficult to apply "statically" during type-checking; we 
might have to compute a full solution before we could recognize that an 
overload was actually hidden by a conditional conformance.)  In this case, I 
believe this would have left us with only a single overload to consider for 
both 'lazy' and 'map', greatly reducing the combinatorial space of solutions.  
Of course, it's also possible that this would remove viable solutions.

5. Actions and recommendations

I've landed the viability-limiting patch as 
563057ca980fb9a8fce899184761a812164dc9d2.  Hopefully it sticks.

I strongly recommend that we pursue declaration hiding as a language change.  
Even independent of type-checker performance, this seems quite defensible as a 
language direction, since it makes it much easier to reason about overloading 
in the presence of protocol extensions and so on.  However, there are still a 
lot of major open questions with the design.

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

[swift-dev] A type-checking performance case study

Reply via email to