We might want to leave some room in the design for a shared atomic cache reference to live in the buffer, FWIW. It would have to be mutable even when the buffer was multiply-referenced
Sent from my moss-covered three-handled family gradunza > On Oct 20, 2016, at 8:41 AM, Erik Eckstein <eeckst...@apple.com> wrote: > > >>> On Oct 19, 2016, at 6:36 PM, Andrew Trick via swift-dev >>> <swift-dev@swift.org> wrote: >>> >>> >>> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev >>> <swift-dev@swift.org> wrote: >>> >>> >>> on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: >>> >>>>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <dabrah...@apple.com> wrote: >>>>> >>>>> >>>>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com> wrote: >>>>> >>>> >>>>>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev >>>>>>> <swift-dev@swift.org> wrote: >>>>>>> >>>>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >>>>>>> <http://swift-dev-at-swift.org/>> wrote: >>>>>>> >>>>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>>>>>>> <swift-dev@swift.org> wrote: >>>>>>>>> >>>>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>>>> depends on how we move forward with SIL ownership. >>>>>>>>> <CopyOnWrite.rst> >>>>>>>>> If you have any comments, please let me know. >>>>>>>> >>>>>>>> The SIL-level design seems sensible to me at a glance. At the language >>>>>>>> level, I think it would make more sense to treat this as an attribute >>>>>>>> on class types rather than on properties in structs using the class. I >>>>>>>> don't think many people reuse class definitions as both shared >>>>>>>> reference types and as value type payloads, >>>>>>> >>>>>>> Foundation does, or would if they could. >>>>>>> >>>>>>>> but beyond that, I think that making it an attribute of classes would >>>>>>>> put us into a better position to leverage the borrow model to enforce >>>>>>>> the "mutable-only-when-unique" aspect of COW implementations. John >>>>>>>> alluded to this in the "SIL address types and borrowing" thread: >>>>>>>> >>>>>>>>> I wonder if it would make more sense to make copy-on-write buffer >>>>>>>>> references a move-only type, so that as long as you were just >>>>>>>>> working with the raw reference (as opposed to the CoW aggregate, >>>>>>>>> which would remain copyable) it wouldn't get implicitly copied >>>>>>>>> anymore. You could have mutable and immutable buffer reference >>>>>>>>> types, both move-only, and there could be a consuming checkUnique >>>>>>>>> operation on the immutable one that, I dunno, returned an Either of >>>>>>>>> the mutable and immutable versions. >>>>>>>>> >>>>>>>>> For CoW aggregates, you'd need some @copied attribute on the field >>>>>>>>> to make sure that the CoW attribute was still copyable. Within the >>>>>>>>> implementation of the type, though, you would be projecting out the >>>>>>>>> reference immediately, and thereafter you'd be certain that you were >>>>>>>>> borrowing / moving it around as appropriate. >>>>>>>> >>>>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish >>>>>>>> unique and nonunique references to the class. A unique reference would >>>>>>>> act like a move-only type to prevent accidental loss of uniqueness. >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>>> We can also allow a copy-on-write class to have "mutating" methods, >>>>>>>> and only allow mutation on unique references. It seems to me like, >>>>>>>> exploring this direction, we could also come up with a way for the >>>>>>>> high-level value-semantics operations on the struct to statically >>>>>>>> indicate which methods are known to leave the value's buffers in a >>>>>>>> unique state, or which return values that are uniquely owned, which >>>>>>>> would give the optimizer more ability to avoid uniqueness checks >>>>>>>> across calls without relying on inlining and IPO. >>>>>>> >>>>>>> That's pretty cool. However, I think there's nothing to prevent any >>>>>>> mutating method from storing a copy of self in a global, so I think we'd >>>>>>> need some participation from the programmer (either an agreement not to >>>>>>> do that, or an explicit claim of uniqueness on exit) in order to >>>>>>> identify operations that create/preserve uniqueness. >>>>>> >>>>>> If a mutating reference (like self in a mutating method) is move-only >>>>>> then you would not be able to “copy” it to a global. >>>>> >>>>> Yes, a reference to a move-only type would work for this purpose. >>>>> >>>>> >>>>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >>>>>>> <swift-dev@swift.org> wrote: >>>>>>> >>>>>>> >>>>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: >>>>>>> >>>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>>> depends on how we move forward with SIL ownership. >>>>>>>> >>>>>>>> :orphan: >>>>>>>> >>>>>>>> .. highlight:: sil >>>>>>>> >>>>>>>> =================================== >>>>>>>> Copy-On-Write Representation in SIL >>>>>>>> =================================== >>>>>>>> >>>>>>>> .. contents:: >>>>>>>> >>>>>>>> Overview >>>>>>>> ======== >>>>>>>> >>>>>>>> This document proposes: >>>>>>>> >>>>>>>> - An ownership attribute to define copy-on-write (COW) buffers in >>>>>>>> Swift data >>>>>>>> types. >>>>>>>> >>>>>>>> - A representation of COW buffers in SIL so that optimizations can >>>>>>>> take benefit >>>>>>>> of it. >>>>>>>> >>>>>>>> The basic idea is to enable the SIL optimizer to reason about COW data >>>>>>>> types >>>>>>>> in the same way as a programmer can do. >>>>>>>> This means: a COW buffer can only be modified by its owning SIL value, >>>>>>>> because >>>>>>>> either it's uniquely referenced or the buffer is copied before >>>>>>>> modified. >>>>>>>> >>>>>>>> .. note:: >>>>>>>> In the following the term "buffer" refers to a Swift heap object. >>>>>>>> It can be any heap object, not necessarily a “buffer” with e.g. >>>>>>>> tail-allocated elements. >>>>>>>> >>>>>>>> COW Types >>>>>>>> ========= >>>>>>>> >>>>>>>> The basic structure of COW data types can be simplified as follows:: >>>>>>>> >>>>>>>> class COWBuffer { >>>>>>>> var someData: Int >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> struct COWType { >>>>>>>> var b : COWBuffer >>>>>>>> >>>>>>>> mutating func change_it() { >>>>>>>> if (!isUniquelyReferenced(b)) { >>>>>>>> b = copy_buffer(b) >>>>>>>> } >>>>>>>> b.someData = ... >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> Currently the COW behavior of such types is just defined by their >>>>>>>> implementation. >>>>>>>> But there is no representation of this special behavior in the SIL. >>>>>>>> So the SIL optimizer has no clue about it and cannot take advantage of >>>>>>>> it. >>>>>>>> >>>>>>>> For example:: >>>>>>>> >>>>>>>> func foo(arr : [Int]) { >>>>>>>> x = arr[0] >>>>>>>> opaque_function() >>>>>>>> y = arr[0] // can RLE replace this with y = x? >>>>>>>> } >>>>>>>> >>>>>>>> If opaque_function() wants to change the contents of the array buffer >>>>>>>> it first >>>>>>>> has to copy it. >>>>>>> >>>>>>> ...or determine that it's uniquely-referenced. >>>>>> >>>>>> In this specific example, if opqaue_function holds a reference to arr’s >>>>>> buffer, the buffer is not >>>>>> uniquely-referenced. >>>>> >>>>> Right. >>>>> >>>>>>> >>>>>>>> But the optimizer does not know it so it has to conservatively assume >>>>>>>> that opaque_function() will write to the location of arr[0]. >>>>>>>> >>>>>>>> Copy-on-write Ownership Attribute >>>>>>>> ================================= >>>>>>>> >>>>>>>> This section proposes an ownership attribute to define a copy-on-write >>>>>>>> buffer. >>>>>>>> >>>>>>>> Swift Syntax >>>>>>>> ------------ >>>>>>>> >>>>>>>> A COW buffer reference can be defined with a new ownership attribute >>>>>>>> for the >>>>>>>> buffer variable declaration (similar to “weak” and “unowned”):: >>>>>>>> >>>>>>>> struct COWType { >>>>>>>> copy_on_write var b : COWBuffer >>>>>>>> >>>>>>>> // ... >>>>>>>> } >>>>>>>> >>>>>>>> The ``copy_on_write`` attribute is purely used for optimization >>>>>>>> purposes. >>>>>>>> It does not change the semantics of the program. >>>>>>> >>>>>>> Presumably, it changes what code you can execute on `b` without invoking >>>>>>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >>>>>>> to do anything differently to take advantage of the annotation. >>>>>> >>>>>> That’s true. >>>>>> >>>>>>> What are the rules for writing code that uses `copy_on_write`? >>>>>> >>>>>> See below ("The rules for using ``copy_on_write`` and the built-ins >>>>>> are:”) >>>>> >>>>> Yeah, I got there, eventually. But just saying “doesn't change >>>>> semantics” at this point in the proposal leaves a gap, because it does >>>>> change semantic *requirements*. You should mention that. >>>>> >>>>>>>> .. note:: >>>>>>>> >>>>>>>> “copy_on_write” is a working title. TODO: decide on the name. >>>>>>>> Maybe it should be a @-attribute, like @copy_on_write? >>>>>>>> Another question is if we should open this attribute for the public or >>>>>>>> just >>>>>>>> use it internally in the library, because violating the implied rules >>>>>>>> (see below) could break memory safety. >>>>>>>> >>>>>>>> Implementation >>>>>>>> -------------- >>>>>>>> >>>>>>>> The ``copy_on_write`` references can be represented in the AST as a >>>>>>>> special >>>>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >>>>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the >>>>>>>> referenced >>>>>>>> buffer class type. >>>>>>>> >>>>>>>> In SIL the buffer reference will have type:: >>>>>>>> >>>>>>>> $@sil_cow COWBuffer >>>>>>>> >>>>>>>> where ``COWBuffer`` is the type of the referenced heap object. >>>>>>>> >>>>>>>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>>>>>>> reference >>>>>>>> type to a regular reference type:: >>>>>>>> >>>>>>>> cow_to_ref >>>>>>>> ref_to_cow >>>>>>>> >>>>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>>>>>>> >>>>>>>> For example the SIL code for:: >>>>>>>> >>>>>>>> var c: COWType >>>>>>>> let x = c.b.someData >>>>>>>> >>>>>>>> would be:: >>>>>>>> >>>>>>>> %1 = struct_extract %1 : COWType, #COWType.b >>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>>>>>>> %4 = load %3 : $*Int >>>>>>>> >>>>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer >>>>>>>> reference into a >>>>>>>> COW type. >>>>>>>> >>>>>>>> COW Buffers and the Optimizer >>>>>>>> ============================= >>>>>>>> >>>>>>>> A reference to a COW buffer gives the optimizer additional information: >>>>>>>> >>>>>>>> *A buffer, referenced by a @sil_cow reference is considered to be >>>>>>>> immutable >>>>>>>> during the lifetime of the reference.* >>>>>>> >>>>>>> This seems like much too broad a rule to allow inplace mutations of >>>>>>> uniquely referenced buffers. >>>>>> >>>>>> The point is that all mutations must be guarded by an is_unique, which >>>>>> takes the _address_ of the buffer reference as argument. >>>>>> And the optimizer considers this instruction as a potential write to the >>>>>> buffer reference. >>>>>> The effect is that the lifetime of a buffer reference (as a SIL value) >>>>>> will not outlive a is_unique - regardless if this is inside a called >>>>>> function or inlined. >>>>> >>>>> I don't see how that allows me to mutate a uniquely referenced buffer held >>>>> in a @sil_cow reference, given what you wrote above. >>>> >>>> You would not be able to get a reference to a mutable buffer by >>>> reading the COW type’s @sil_cow field. Instead you would only get >>>> such a reference as a result of the is_unique instruction/builtin. Or, >>>> of course, by creating a new buffer. >>>> >>>> I’m not sure if this was the question, though. >>> >>> I think it just comes down to precise phrasing. AFAICT, what you really >>> mean to say is something like >>> >>> A buffer cannot be directly mutated through a @sil_cow reference; >>> instead one must mutate it indirectly via the result of is_unique or >>> start_unique. > > Exactly, that’s what I wanted to say. > >>> >>> Saying that the buffer is “considered to be immmutable during the >>> lifetime of the reference” could be taken to mean that the compiler will >>> assume no mutations of the buffer can occur while the reference exists. >>> IIUC you are not planning to formally end the reference's lifetime at >>> the moment is_unique/start_unique returns. >> >> To clarify: I proposed an alternate approach in which the @sil_cow reference >> is only mutable during the Array’s @inout scope—to be automatically enforced >> by the compiler once @inout scopes are enforced. But the text in question is >> not referring to that approach, so your comments are on target. > > After thinking about Joe’s suggestion (having the cow attribute on the class > type and make a reference to that type move-only), I’m more inclined to go > with the isUnique builtin. If such a reference can only be returned by > isUnique, it is really guaranteed that only a uniquely referenced buffer can > be mutated. With the inout approach, the programmer is not forced to make the > uniqueness check before modifying the buffer. > >> -Andy >> >>>> >>>> Plus: we will have an explicit conversion instruction (start_unique) to >>>> convert an immutable >>>> reference to a mutable referece. >>>> A SIL optimization can replace an is_unique with this instruction if it >>>> can prove that the reference >>>> is already unique at that point. >>>> >>>>> >>>>> >>>>>>> Unless you mean the reference is >>>>>>> immutable, rather than the storage being referred to by it. >>>>>>> >>>>>>>> This means any address derived from a ``cow_to_ref`` instruction can be >>>>>>>> considered to point to immutable memory. >>>>>>>> >>>>>>>> Some examples of optimizations which will benefit from copy-on-write >>>>>>>> representation in SIL: >>>>>>>> >>>>>>>> - Redundant load elimination >>>>>>>> >>>>>>>> RLE can assume that opaque code does not modify a COW buffer. >>>>>>> >>>>>>> How do you distinguish “opaque code” from “code that is meant to >>>>>>> modify the buffer and might do so in place if it's uniquely-referenced?” >>>>>> >>>>>> Again, the is_unique which takes the address of the reference, will >>>>>> guarantee that during the lifetime of a buffer there are no >>>>>> modifications of the buffer. >>>>> >>>>> Again, that sounds like it rules out inplace modification of uniquely >>>>> referenced buffers. >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>> Example:: >>>>>>>> >>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>>> %4 = load %3 : $*Int >>>>>>>> %5 = apply %foo() // Cannot overwrite memory >>>>>>>> location %3 >>>>>>>> %6 = load %3 : $*Int // Can be replaced by %4 >>>>>>>> >>>>>>>> Currently we do some ad-hoc optimizations for array, based on >>>>>>>> semantics, >>>>>>>> like array count propagation. These hacks would not be needed >>>>>>>> anymore. >>>>>>> >>>>>>> W0000000000000000000000t. >>>>>>> >>>>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or >>>>>>>> a >>>>>>>> projected address) escapes. Even if it escapes, it will reference >>>>>>>> immutable >>>>>>>> memory. >>>>>>>> >>>>>>>> - CSE, loop hoisting >>>>>>>> >>>>>>>> Similar to RLE: the optimizer can assume that opaque code cannot >>>>>>>> modify a >>>>>>>> COW buffer >>>>>>> >>>>>>> Same question here as above, then. >>>>>>>> >>>>>>>> - ARC optimization >>>>>>>> >>>>>>>> Knowing that some opaque code cannot overwrite a reference in the COW >>>>>>>> buffer >>>>>>>> can remove retain/release pairs across such code:: >>>>>>>> >>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>>>>>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>>>>>>> [guarantee] >>>>>>>> %5 = apply %foo() // Cannot overwrite >>>>>>>> someRef and dealloc the object >>>>>>>> // Use %4 >>>>>>>> destroy_value %4 : $MyClass >>>>>>>> >>>>>>>> Scoping instructions >>>>>>>> -------------------- >>>>>>>> >>>>>>>> To let the optimizer reason about the immutability of the COW buffer, >>>>>>>> it is >>>>>>>> important to *bind* the lifetime of the buffer content to the lifetime >>>>>>>> of the >>>>>>>> buffer reference. For example:: >>>>>>>> >>>>>>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>>>> // load something from %b1 >>>>>>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>>>>>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>>>> again >>>>>>>> // load something from %b2 >>>>>>>> >>>>>>>> The question is: can RLE forward the load of the buffer reference and >>>>>>>> replace >>>>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` >>>>>>>> modifies the >>>>>>>> buffer. >>>>>>>> >>>>>>>> To enforce this restriction, the scope of any buffer modification must >>>>>>>> be >>>>>>>> enclosed in a pair of SIL instructions. Those instructions define the >>>>>>>> scope >>>>>>>> of the mutation. Both instructions take the *address* of the buffer >>>>>>>> reference as operand and act as a potential write to the buffer >>>>>>>> reference. >>>>>>>> >>>>>>>> The purpose of the scoping instructions is to strictly separate the >>>>>>>> liferanges >>>>>>>> of references to an immutable buffer and references to the mutable >>>>>>>> buffer. >>>>>>> >>>>>>> Looks reasonable. >>>>>>> >>>>>>>> The following example shows why the scoping instructions (specifically >>>>>>>> the >>>>>>>> end-of-scope instruction) are required to prevent loop-hoisting from >>>>>>>> interleaving mutable and immutable liferanges:: >>>>>>>> >>>>>>>> // there should be a begin-of-scope %baddr >>>>>>>> %mut_b = load %baddr >>>>>>>> store %x to %mut_b // modification of the buffer >>>>>>>> // there should be a end-of-scope %baddr >>>>>>>> >>>>>>>> loop { >>>>>>>> %b = load %baddr >>>>>>>> %y = load %b // load from the buffer >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> If there is no end-of-scope instruction, loop hoisting could do:: >>>>>>>> >>>>>>>> %mut_b = load %baddr >>>>>>>> %b = load %baddr // moved out of the loop >>>>>>>> store %x to %mut_b >>>>>>>> >>>>>>>> loop { >>>>>>>> %y = load %b >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer, >>>>>>>> so it could >>>>>>>> also hoist the load:: >>>>>>>> >>>>>>>> %mut_b = load %baddr >>>>>>>> %b = load %baddr >>>>>>>> %y = load %b // Wrong! Will be overwritten by the following >>>>>>>> store >>>>>>>> store %x to %mut_b >>>>>>>> >>>>>>>> loop { >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> The following sections describe two alternatives to implement the >>>>>>>> scoping. >>>>>>>> >>>>>>>> Scoping Alternative 1: Explicit Built-ins >>>>>>>> ----------------------------------------- >>>>>>>> >>>>>>>> SIL instructions >>>>>>>> ^^^^^^^^^^^^^^^^ >>>>>>>> >>>>>>>> The existing ``is_unique`` instruction is changed to a terminator >>>>>>>> instruction:: >>>>>>>> >>>>>>>> bb0: >>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >>>>>>>> address of the COWBuffer reference >>>>>>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >>>>>>>> reference. Physically identical to "load %0” >>>>>>>> // usually empty >>>>>>>> br bb3(%1 : $COWBuffer) >>>>>>>> bb2: // the false-block >>>>>>>> // usually contains: >>>>>>>> %2 = apply %copy_buffer >>>>>>>> %3 = cow_to_ref %2 >>>>>>>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>>>>>>> br bb3(%2 : $COWBuffer) >>>>>>>> bb3(%4 : $COWBuffer): >>>>>>>> // Modify the buffer referenced by %4 >>>>>>>> // ... >>>>>>>> >>>>>>>> The end-of-scope instruction is:: >>>>>>>> >>>>>>>> end_unique_addr %0 : $*COWBuffer >>>>>>>> >>>>>>>> It is important that the references to the unique buffers (``%1``, >>>>>>>> ``%2``) must >>>>>>>> not outlive ``end_unique_addr``. In most cases this can be check by >>>>>>>> the SIL >>>>>>>> verifier. >>>>>>>> >>>>>>>> The two instructions must be paired properly but not necessarily in the >>>>>>>> same function. >>>>>>>> >>>>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is >>>>>>>> to >>>>>>>> separate the lifetimes of mutable and immutable accesses to the COW >>>>>>>> buffer. >>>>>>>> Both instructions take an address to the COW buffer reference and are >>>>>>>> considered as potential stores to the reference. >>>>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>>>>>>> lifetimes >>>>>>>> across these instructions. >>>>>>>> For example, RLE cannot combine two buffer loads which are interleaved >>>>>>>> with >>>>>>>> a ``is_unique_addr_br``:: >>>>>>>> >>>>>>>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>>>>>>> // do something with %1 >>>>>>>> … >>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>>>>>>> … >>>>>>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >>>>>>>> with %1 >>>>>>>> >>>>>>>> Another important thing is that the COW buffer can only be mutated by >>>>>>>> using the >>>>>>>> reference of the ``is_unique_addr_br`` true-block argument. >>>>>>>> The COW buffer cannot be modified by simply loading/extracting the >>>>>>>> reference >>>>>>>> from the COWType. >>>>>>>> Example:: >>>>>>>> >>>>>>>> %1 = load_strong %0 : $*COWBuffer >>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>>> store %7 : $Int to %3 : $*Int // Violation! >>>>>>>> >>>>>>>> Most obvious violations to this constraint can be catched by the >>>>>>>> SILVerifier. >>>>>>>> >>>>>>>> The ``_addr`` variants of the instructions also have a non-addr >>>>>>>> counterpart:: >>>>>>>> >>>>>>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces >>>>>>>> the true-block arg as owned >>>>>>>> >>>>>>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as >>>>>>>> owned >>>>>>>> >>>>>>>> These instructions are generated by Mem2reg (or a similar optimization) >>>>>>>> in case the COW value is stored (in a temporary alloc_stack location) >>>>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and >>>>>>>> ``end_unique_addr``. >>>>>>>> For example in the following code, where the COW data is passed >>>>>>>> as-value and >>>>>>>> all the mutating functions are inlined:: >>>>>>>> >>>>>>>> func foo(arr : [Int], x: Int) { >>>>>>>> arr[0] = 27 >>>>>>>> … >>>>>>>> y = arr[x] >>>>>>>> … >>>>>>>> } >>>>>>>> >>>>>>>> Finally it’s probably a good idea to add an instruction for converting >>>>>>>> an >>>>>>>> immutable reference to a mutable reference:: >>>>>>>> >>>>>>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>>>>>>> $COWBuffer as owned >>>>>>>> >>>>>>>> which is basically just a simpler representation of the following >>>>>>>> pattern:: >>>>>>>> >>>>>>>> bb0: >>>>>>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>>>>>>> bb1(%1 : $COWBuffer): >>>>>>>> … // main control flow continues here >>>>>>>> bb2: >>>>>>>> unreachable >>>>>>>> >>>>>>>> An optimizations, which eliminate uniqueness checks, would replace a >>>>>>>> ``is_unique_br`` by a ``start_unique``. >>>>>>>> >>>>>>>> Built-ins >>>>>>>> ^^^^^^^^^ >>>>>>>> >>>>>>>> A COW type implementor can generate the new instructions by using a >>>>>>>> set of built-ins:: >>>>>>>> >>>>>>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>>>>>>> func endUnique<BufferType>(_ buffer: inout BufferType) >>>>>>>> >>>>>>>> For example:: >>>>>>>> >>>>>>>> struct COWType { >>>>>>>> copy_on_write var b : COWBuffer >>>>>>>> >>>>>>>> mutating func makeMutable() -> COWBuffer { >>>>>>>> if let uniqueBuffer = isUnique(&self.b) { >>>>>>>> return uniqueBuffer >>>>>>>> } >>>>>>>> let copiedBuffer = copyBuffer(self.b) >>>>>>>> self.b = copiedBuffer >>>>>>>> return copiedBuffer >>>>>>>> } >>>>>>>> >>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>> uniqueBuffer.someData = x >>>>>>>> endUnique(&self.b) >>>>>>>> } >>>>>>>> } >>>>>>> >>>>>>> This seems reasonable, but it also looks like the compiler could do the >>>>>>> `endUnique` dance for us based, e.g., on the mutability of methods. >>>>>> >>>>>> I agree, that would be ideal, e.g. the compiler could insert the >>>>>> endUnique at the end of an inout >>>>>> scope. >>>>>> >>>>>>> >>>>>>>> The ``isUnique`` built-in returns an optional unique buffer reference. >>>>>>>> Physically this is the COW buffer which is passed as the inout >>>>>>>> argument. >>>>>>>> The result is nil if the buffer is not uniquely referenced. >>>>>>>> In this case usually the original buffer is copied and the reference >>>>>>>> to the >>>>>>>> copy is written back to the original buffer reference location >>>>>>>> (``self.b = copiedBuffer``). >>>>>>>> Starting at the point of the write-back, the reference to the copy >>>>>>>> also becomes >>>>>>>> a unique buffer reference. >>>>>>>> >>>>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` >>>>>>>> pattern which >>>>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` in >>>>>>>> an >>>>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>>>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` >>>>>>>> diamond. >>>>>>>> >>>>>>>> .. note:: >>>>>>>> This makes the definition of the unique buffer location lifetime a >>>>>>>> little bit >>>>>>>> problematic, because the false-branch of ``isUnique`` is not >>>>>>>> equivalent to >>>>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before >>>>>>>> SimplifyCFG >>>>>>>> can do its job). >>>>>>> >>>>>>> I don't know what the implications of these diamonds and the problem >>>>>>> described above might be, FWIW. >>>>>>> >>>>>>>> The rules for using ``copy_on_write`` and the built-ins are: >>>>>>>> >>>>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily >>>>>>>> in the >>>>>>>> same function. >>>>>>>> >>>>>>>> 2. The COW buffer may only be mutated by using the unique buffer >>>>>>>> reference. >>>>>>>> >>>>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>>>>>>> ``endUnique`` >>>>>>>> pair. >>>>>>>> >>>>>>>> 4. During the lifetime of the unique buffer reference, the original >>>>>>>> COW buffer >>>>>>>> reference must not be used in any way, e.g. for reading from the >>>>>>>> buffer. >>>>>>>> >>>>>>>> Note that the lifetime of the unique buffer reference does not include >>>>>>>> the >>>>>>>> part between the begin of the ``isUnique`` false-branch and the >>>>>>>> write-back >>>>>>>> of the copy. This means is okay to read from the buffer (using >>>>>>>> ``self.b``) >>>>>>>> for the purpose of copying. >>>>>>>> >>>>>>>> Examples:: >>>>>>>> >>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>> uniqueBuffer.someData = x >>>>>>>> // violates rule 1 >>>>>>>> } >>>>>>>> >>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>> makeMutable() >>>>>>>> self.b.someData = x // violates rule 2 >>>>>>>> endUnique(&self.b) >>>>>>>> } >>>>>>>> >>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>> uniqueBuffer.someData = x >>>>>>>> endUnique(&self.b) >>>>>>>> uniqueBuffer.someData = 27 // violates rule 3 >>>>>>>> } >>>>>>>> >>>>>>>> mutating func incrementSomeData() { >>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>>>>>>> endUnique(&self.b) >>>>>>>> } >>>>>>> >>>>>>> It would be instructive to write down the *correct* code for these >>>>>>> operations. >>>>>> >>>>>> added to my todo list. >>>>>> >>>>>>> >>>>>>>> The intention of the rules is to ensure that there is no overlap of a >>>>>>>> "read-only" life-range with a "mutable" life-range of the buffer >>>>>>>> reference. >>>>>>>> It’s the responsibility of the implementor to follow the rules. >>>>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) >>>>>>>> can >>>>>>>> statically detect rule violations in obvious cases (with >>>>>>>> inter-procedural >>>>>>>> analysis maybe even in most cases). >>>>>>>> >>>>>>>> This approach would require to change some of the internals of our >>>>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>>>>>>> For example, the Array make_mutable semantic functions currently do >>>>>>>> not return >>>>>>>> the unique buffer. >>>>>>> >>>>>>> No big deal. >>>>>>> >>>>>>>> Scoping Alternative 2: Implicit Inout Scopes >>>>>>>> -------------------------------------------- >>>>>>>> >>>>>>>> There is an idea (proposal?) to change the representation of inout >>>>>>>> variables >>>>>>>> in SIL. This is independent of this proposal, but can be helpful for >>>>>>>> the >>>>>>>> purpose of defining the scope of a COW mutation. >>>>>>>> >>>>>>>> The basic idea is that SILGen inserts scoping instructions for *all* >>>>>>>> inout >>>>>>>> variables. And those scoping instructions can be used to define the >>>>>>>> mutating >>>>>>>> scope of a COW buffer. >>>>>>>> >>>>>>>> The scoping instructions which are inserted by SILGen for an inout >>>>>>>> scope are:: >>>>>>>> >>>>>>>> begin_exclusive >>>>>>>> end_exclusive >>>>>>>> >>>>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>>>>>>> instructions take the >>>>>>>> address of the inout variable as argument. For the optimizer those >>>>>>>> instructions >>>>>>>> look like potential writes to the inout variable. >>>>>>>> >>>>>>>> The implementor of a COW type has to follow the rule that the COW >>>>>>>> buffer may >>>>>>>> only be modified in mutating functions of the COW type. But this is >>>>>>>> the case >>>>>>>> anyway because any modification needs a uniqueness check and this can >>>>>>>> only be >>>>>>>> done in mutating functions. >>>>>>>> >>>>>>>> Example:: >>>>>>>> >>>>>>>> // > mutating func setSomeData(x: Int) { >>>>>>>> // Accepts a unique reference to the array value (avoiding refcount >>>>>>>> operations) >>>>>>>> sil @setSomeData : $(Int, @inout Array) -> () { >>>>>>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>>>>>>> >>>>>>>> // > makeMutable() (inlined) >>>>>>>> // Forward the unique reference to the `self` array value, still >>>>>>>> avoiding refcount operations. >>>>>>>> // Begin the inlined exclusive scope (could be trivially removed). >>>>>>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>>>>>>> >>>>>>>> // > if !isUnique(&self._storage) { >>>>>>>> // Extract a unique inout reference to the class reference to the >>>>>>>> array storage. >>>>>>>> // This begins the isUnique() argument's exclusive scope. The memory >>>>>>>> is already exclusive >>>>>>>> // but the scope helps ensure this is the only alias to _storage. >>>>>>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>>>>>>> #Array._storage >>>>>>>> >>>>>>>> // Uniqueness checking requires an inout reference to the class >>>>>>>> reference. >>>>>>>> // The is_unique instruction does not need to create a new storage >>>>>>>> reference. >>>>>>>> // It's only purpose is to check the RC count, ensure that the >>>>>>>> checked reference >>>>>>>> // is inout, and prevent the inout scope from being optimized away. >>>>>>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>>>> >>>>>>>> // End the isUnique argument's exclusive scope (can also be trivially >>>>>>>> removed). >>>>>>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>>>> >>>>>>>> br %isuniq, bb_continue, bb_slow >>>>>>>> >>>>>>>> bb_slow: >>>>>>>> // > self._storage = copyBuffer(self._storage) >>>>>>>> // Produce a new class reference to storage with verifiably unique RC >>>>>>>> semantics. >>>>>>>> %copied_storage_class = alloc_ref ... >>>>>>>> // A begin/end exclusive scope is implicit in store [assign]. >>>>>>>> store [assign] %copied_storage_class to %arrayref._storageref >>>>>>>> br bb_continue >>>>>>>> >>>>>>>> bb_continue: >>>>>>>> >>>>>>>> // This marks the end of makeMutable's inout `self` scope. Because >>>>>>>> Array >>>>>>>> // contains a "copy_on_write" property, the SIL verifier needs to >>>>>>>> // prove that %arrayref.#_storage has not escaped at this point. This >>>>>>>> // is equivalent to checking that %arrayref itself is not copied, and >>>>>>>> // checking each projection of the "copy_on_write" storage property >>>>>>>> // (%arrayref._storageref) is not copied. Or, if any copies are >>>>>>>> present, >>>>>>>> // they must be consumed within this scope. >>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>>>>>>> >>>>>>>> // > self._storage.someData = x >>>>>>>> // An _addr instruction with one load/store use doesn't really need >>>>>>>> its own scope. >>>>>>>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >>>>>>>> >>>>>>>> // ARC optimization can promote this to a borrow, replacing >>>>>>>> strong_release with end_borrow. >>>>>>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : >>>>>>>> $*@sil_cow ArrayStorage >>>>>>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>>>>>>> ArrayStorage >>>>>>>> >>>>>>>> // Write some data into the CoW buffer. >>>>>>>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>>>>>>> // A single-use _addr instruction, so no scope. >>>>>>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>>>>>>> // A store with an implicit [exclusive] scope. >>>>>>>> store [assign] %x to %somedata_addr >>>>>>>> >>>>>>>> strong_release %arrayref._storage : $*ArrayStorage<T> >>>>>>>> >>>>>>>> // End the isUnique argument's exclusive scope. >>>>>>>> // The same verification is needed here, but the inner scope would be >>>>>>>> eliminated. >>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>>>>>>> >>>>>>>> In general this approach looks more "user-friendly" than the first >>>>>>>> alternative. >>>>>>> >>>>>>> Well, I can't really tell, because you haven't shown the Swift code that >>>>>>> generates this SIL. >>>>>>> >>>>>>>> But it depends on implementing the general feature to insert the inout >>>>>>>> scoping instructions. Also, we still have to think through all the >>>>>>>> details of this approach. >>>>>>> >>>>>>> FWIW, I am convinced we will need (and get) a stricter inout model that >>>>>>> would be conducive to inserting the scoping instructions. >>>>>>> >>>>>>> >>>>>>>> Dependency between a buffer reference to the scope-begin >>>>>>>> -------------------------------------------------------- >>>>>>> >>>>>>> You can only have a dependency between two things, but as phrased “a >>>>>>> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >>>>>>> would fix it. >>>>>>> >>>>>>>> With both alternatives there is no explicit dependency from a buffer >>>>>>>> reference >>>>>>>> to a scope-begin instruction:: >>>>>>>> >>>>>>>> %b_cow = load %baddr >>>>>>>> %b = cow_to_ref %b_cow >>>>>>>> %x = load %b // No dependency between this... >>>>>>>> ... >>>>>>>> begin_exclusive %baddr // ... and this instruction. >>>>>>>> ... >>>>>>>> >>>>>>>> So in theory the optimizer is free to reschedule the instructions:: >>>>>>>> >>>>>>>> %b_cow = load %baddr >>>>>>>> %b = cow_to_ref %b_cow >>>>>>>> ... >>>>>>>> begin_exclusive %baddr >>>>>>>> %x = load %b // Wrong! Buffer could be modified here >>>>>>>> ... >>>>>>>> >>>>>>>> We still have to figure out how to cope with this. >>>>>>>> >>>>>>>> - We could add an end-of-lifetime instruction for a COW buffer >>>>>>>> reference, which >>>>>>>> the optimizer may not move over a begin-of-scope instruction. >>>>>>>> >>>>>>>> - Or we just define the implicit rule for the optimizer that any use >>>>>>>> of a COW >>>>>>>> reference may not be moved over a begin-of-scope instruction. >>>>>>>> >>>>>>>> Preconditions >>>>>>>> ============= >>>>>>>> >>>>>>>> To benefit from COW optimizations in the stdlib Array, Set and >>>>>>>> Dictionary data >>>>>>>> structures we first need eager bridging, meaning getting rid of the >>>>>>>> bridged >>>>>>>> buffer. >>>>>>> >>>>>>> As you know I'm very much in favor of eager bridging, but I don't see >>>>>>> why this would be dependent on it. >>>>>> >>>>>> We could use copy_on_write with eager bridging, but I don’t think it >>>>>> will give any benefits to the >>>>>> optimizer. >>>>>> For example, the SIL code to get from an Array to a native >>>>>> ContiguousArrayStorage reference is pretty hard to understand for the >>>>>> optimizer (involves low level bit operations, etc.). >>>>> >>>>> It wouldn't need to do low-level bit operations if our enums were >>>>> capable/controllable enough. I'm just saying, there's no reason we >>>>> couldn't give the optimizer something to work with that has higher level >>>>> semantics than what we currently do. >>>>> >>>>>>>> At least for Array this is implemented as low-level bit operations and >>>>>>>> optimizations cannot reason about it (e.g. finding a reasonable >>>>>>>> RC-root for the buffer reference). >>>>>>>> >>>>>>>> Another thing is that we currently cannot use ``copy_on_write`` for >>>>>>>> Array >>>>>>>> because of pinning. Array pins it’s buffer when passing an element >>>>>>>> address to >>>>>>>> an inout parameter. This allows the array buffer to be modified even >>>>>>>> if its >>>>>>>> reference count is > 1. With ``copy_on_write``, a programmer could >>>>>>>> break memory >>>>>>>> safety when violating the inout rule. Example:: >>>>>>>> >>>>>>>> var arr = [MyClass()] // a global array >>>>>>>> >>>>>>>> foo(&arr[0]) // Pins the buffer of arr during the call >>>>>>>> >>>>>>>> func foo(_ x: inout MyClass) -> Int { >>>>>>>> let b = arr // The ref-count of the buffer is not >>>>>>>> incremented, because it is pinned! >>>>>>>> let r = b[0] // optimizer removes the retain of r because it >>>>>>>> thinks the following code cannot modify b >>>>>>>> arr.removeAll() // does not copy the array buffer and thus >>>>>>>> de-allocates r >>>>>>>> return r.i // use-after-free! >>>>>>>> } >>>>>>> >>>>>>> I only know of one way to resolve inout and pinning: >>>>>>> >>>>>>> * Semantically, references are replaced with a trap value when entering >>>>>>> an inout context so that all inout values are provably unique >>>>>>> references in the absence of unsafe code. We drop pinning and provide >>>>>>> explicit operations that provide simultaneous lvalue accesses to >>>>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >>>>>>> >>>>>>> If there are other ideas out there, I'd like to hear them. If not, we >>>>>>> should probably decide that this is what we're doing so that we can move >>>>>>> forward without this looming uncertainty. >>>>>>> >>>>>>> -- >>>>>>> -Dave >>>>>>> >>>>>>> _______________________________________________ >>>>>>> swift-dev mailing list >>>>>>> swift-dev@swift.org >>>>>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>>>> >>>>> >>>>> -- >>>>> -Dave >>>> >>>> _______________________________________________ >>>> swift-dev mailing list >>>> swift-dev@swift.org >>>> https://lists.swift.org/mailman/listinfo/swift-dev >>> >>> -- >>> -Dave >>> >>> _______________________________________________ >>> swift-dev mailing list >>> swift-dev@swift.org >>> https://lists.swift.org/mailman/listinfo/swift-dev >> >> _______________________________________________ >> swift-dev mailing list >> swift-dev@swift.org >> https://lists.swift.org/mailman/listinfo/swift-dev >
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev