Re: [swift-dev] copy-on-write proposal

Erik Eckstein via swift-dev Mon, 17 Oct 2016 08:59:33 -0700

On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev <swift-dev@swift.org> 
wrote:



> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org 
> <http://swift-dev-at-swift.org/>> wrote:
> 
>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev 
>>> <swift-dev@swift.org> wrote:
>>> 
>>> This is a proposal for representing copy-on-write buffers in
>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>> depends on how we move forward with SIL ownership.
>>> <CopyOnWrite.rst>
>>> If you have any comments, please let me know.
>> 
>> The SIL-level design seems sensible to me at a glance. At the language
>> level, I think it would make more sense to treat this as an attribute
>> on class types rather than on properties in structs using the class. I
>> don't think many people reuse class definitions as both shared
>> reference types and as value type payloads, 
> 
> Foundation does, or would if they could.
> 
>> but beyond that, I think that making it an attribute of classes would
>> put us into a better position to leverage the borrow model to enforce
>> the "mutable-only-when-unique" aspect of COW implementations. John
>> alluded to this in the "SIL address types and borrowing" thread:
>> 
>>> I wonder if it would make more sense to make copy-on-write buffer
>>> references a move-only type, so that as long as you were just
>>> working with the raw reference (as opposed to the CoW aggregate,
>>> which would remain copyable) it wouldn't get implicitly copied
>>> anymore.  You could have mutable and immutable buffer reference
>>> types, both move-only, and there could be a consuming checkUnique
>>> operation on the immutable one that, I dunno, returned an Either of
>>> the mutable and immutable versions.
>>> 
>>> For CoW aggregates, you'd need some @copied attribute on the field
>>> to make sure that the CoW attribute was still copyable.  Within the
>>> implementation of the type, though, you would be projecting out the
>>> reference immediately, and thereafter you'd be certain that you were
>>> borrowing / moving it around as appropriate.
>> 
>> If 'copy-on-write' were a trait on classes, then we could distinguish
>> unique and nonunique references to the class. A unique reference would
>> act like a move-only type to prevent accidental loss of uniqueness. 
> 
> +1
> 
>> We can also allow a copy-on-write class to have "mutating" methods,
>> and only allow mutation on unique references. It seems to me like,
>> exploring this direction, we could also come up with a way for the
>> high-level value-semantics operations on the struct to statically
>> indicate which methods are known to leave the value's buffers in a
>> unique state, or which return values that are uniquely owned, which
>> would give the optimizer more ability to avoid uniqueness checks
>> across calls without relying on inlining and IPO.
> 
> That's pretty cool.  However, I think there's nothing to prevent any
> mutating method from storing a copy of self in a global, so I think we'd
> need some participation from the programmer (either an agreement not to
> do that, or an explicit claim of uniqueness on exit) in order to
> identify operations that create/preserve uniqueness.

If a mutating reference (like self in a mutating method) is move-only then you 
would not be able to “copy” it to a global.

> 
> 
> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev 
> <swift-dev@swift.org> wrote:
> 
> 
> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote:
> 
>> This is a proposal for representing copy-on-write buffers in
>> SIL. Actually it’s still a draft for a proposal. It also heavily
>> depends on how we move forward with SIL ownership.
>> 
>> :orphan:
>> 
>> .. highlight:: sil
>> 
>> ===================================
>> Copy-On-Write Representation in SIL
>> ===================================
>> 
>> .. contents::
>> 
>> Overview
>> ========
>> 
>> This document proposes:
>> 
>> - An ownership attribute to define copy-on-write (COW) buffers in Swift data
>>  types.
>> 
>> - A representation of COW buffers in SIL so that optimizations can take 
>> benefit
>>  of it.
>> 
>> The basic idea is to enable the SIL optimizer to reason about COW data types
>> in the same way as a programmer can do.
>> This means: a COW buffer can only be modified by its owning SIL value, 
>> because
>> either it's uniquely referenced or the buffer is copied before modified.
>> 
>> .. note::
>>    In the following the term "buffer" refers to a Swift heap object.
>>    It can be any heap object, not necessarily a “buffer” with e.g. 
>> tail-allocated elements.
>> 
>> COW Types
>> =========
>> 
>> The basic structure of COW data types can be simplified as follows::
>> 
>>    class COWBuffer {
>>      var someData: Int
>>      ...
>>    }
>> 
>>    struct COWType {
>>      var b : COWBuffer
>> 
>>      mutating func change_it() {
>>        if (!isUniquelyReferenced(b)) {
>>          b = copy_buffer(b)
>>        }
>>        b.someData = ...
>>      }
>>    }
>> 
>> Currently the COW behavior of such types is just defined by their 
>> implementation.
>> But there is no representation of this special behavior in the SIL.
>> So the SIL optimizer has no clue about it and cannot take advantage of it.
>> 
>> For example::
>> 
>>    func foo(arr : [Int]) {
>>      x = arr[0]
>>      opaque_function()
>>      y = arr[0] // can RLE replace this with y = x?
>>    }
>> 
>> If opaque_function() wants to change the contents of the array buffer it 
>> first
>> has to copy it. 
> 
> ...or determine that it's uniquely-referenced.

In this specific example, if opqaue_function holds a reference to arr’s buffer, 
the buffer is not uniquely-referenced.

> 
>> But the optimizer does not know it so it has to conservatively assume
>> that opaque_function() will write to the location of arr[0].
>> 
>> Copy-on-write Ownership Attribute
>> =================================
>> 
>> This section proposes an ownership attribute to define a copy-on-write 
>> buffer.
>> 
>> Swift Syntax
>> ------------
>> 
>> A COW buffer reference can be defined with a new ownership attribute for the
>> buffer variable declaration (similar to “weak” and “unowned”)::
>> 
>>    struct COWType {
>>      copy_on_write var b : COWBuffer
>> 
>>      // ...
>>    }
>> 
>> The ``copy_on_write`` attribute is purely used for optimization purposes.
>> It does not change the semantics of the program.
> 
> Presumably, it changes what code you can execute on `b` without invoking
> traps or undefined behavior.  Otherwise, the optimizer wouldn't be able
> to do anything differently to take advantage of the annotation.

That’s true.

> What are the rules for writing code that uses `copy_on_write`?

See below ("The rules for using ``copy_on_write`` and the built-ins are:”)

> 
>> .. note::
>> 
>>  “copy_on_write” is a  working title. TODO: decide on the name.
>>  Maybe it should be a @-attribute, like @copy_on_write?
>>  Another question is if we should open this attribute for the public or just
>>  use it internally in the library, because violating the implied rules
>>  (see below) could break memory safety.
>> 
>> Implementation
>> --------------
>> 
>> The ``copy_on_write`` references can be represented in the AST as a special
>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented.
>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced
>> buffer class type.
>> 
>> In SIL the buffer reference will have type::
>> 
>>    $@sil_cow COWBuffer
>> 
>> where ``COWBuffer`` is the type of the referenced heap object.
>> 
>> Two conversion instructions are needed to convert from a ``@sil_cow`` 
>> reference
>> type to a regular reference type::
>> 
>>    cow_to_ref
>>    ref_to_cow
>> 
>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.
>> 
>> For example the SIL code for::
>> 
>>    var c: COWType
>>    let x = c.b.someData
>> 
>> would be::
>> 
>>    %1 = struct_extract %1 : COWType, #COWType.b
>>    %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>    %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData
>>    %4 = load %3 : $*Int
>> 
>> The ``ref_to_cow`` instruction is needed to store a new buffer reference 
>> into a
>> COW type.
>> 
>> COW Buffers and the Optimizer
>> =============================
>> 
>> A reference to a COW buffer gives the optimizer additional information:
>> 
>>  *A buffer, referenced by a @sil_cow reference is considered to be immutable
>>  during the lifetime of the reference.*
> 
> This seems like much too broad a rule to allow inplace mutations of
> uniquely referenced buffers.

The point is that all mutations must be guarded by an is_unique, which takes 
the _address_ of the buffer reference as argument.
And the optimizer considers this instruction as a potential write to the buffer 
reference.
The effect is that the lifetime of a buffer reference (as a SIL value) will not 
outlive a is_unique - regardless if this is inside a called function or inlined.

>  Unless you mean the reference is
> immutable, rather than the storage being referred to by it.
> 
>> This means any address derived from a ``cow_to_ref`` instruction can be
>> considered to point to immutable memory.
>> 
>> Some examples of optimizations which will benefit from copy-on-write
>> representation in SIL:
>> 
>> - Redundant load elimination
>> 
>>  RLE can assume that opaque code does not modify a COW buffer.
> 
> How do you distinguish “opaque code” from “code that is meant to
> modify the buffer and might do so in place if it's uniquely-referenced?”

Again, the is_unique which takes the address of the reference, will guarantee 
that during the lifetime of a buffer there are no modifications of the buffer.

> 
>>  Example::
>> 
>>      %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>      %3 = ref_element_addr %2 : $COWBuffer, #someData
>>      %4 = load %3 : $*Int
>>      %5 = apply %foo()                        // Cannot overwrite memory 
>> location %3
>>      %6 = load %3 : $*Int                     // Can be replaced by %4
>> 
>>  Currently we do some ad-hoc optimizations for array, based on semantics,
>>  like array count propagation. These hacks would not be needed
>>  anymore.
> 
> W0000000000000000000000t.
> 
>>  Note that it’s not required to check if a ``cow_to_ref`` reference (or a
>>  projected address) escapes. Even if it escapes, it will reference immutable
>>  memory.
>> 
>> - CSE, loop hoisting
>> 
>>  Similar to RLE: the optimizer can assume that opaque code cannot modify a
>>  COW buffer
> 
> Same question here as above, then.
>> 
>> - ARC optimization
>> 
>>  Knowing that some opaque code cannot overwrite a reference in the COW buffer
>>  can remove retain/release pairs across such code::
>> 
>>      %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>      %3 = ref_element_addr %2 : $COWBuffer, #someRef
>>      %4 = load_strong %3 : $*MyClass          // Can do a load_strong 
>> [guarantee]
>>      %5 = apply %foo()                        // Cannot overwrite someRef 
>> and dealloc the object
>>      // Use %4
>>      destroy_value %4 : $MyClass
>> 
>> Scoping instructions
>> --------------------
>> 
>> To let the optimizer reason about the immutability of the COW buffer, it is
>> important to *bind* the lifetime of the buffer content to the lifetime of the
>> buffer reference. For example::
>> 
>>    %b1 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference
>>    // load something from %b1
>>    %a = apply %foo(%baddr : $@sil_cow COWBuffer)
>>    %b2 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference 
>> again
>>    // load something from %b2
>> 
>> The question is: can RLE forward the load of the buffer reference and replace
>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the
>> buffer.
>> 
>> To enforce this restriction, the scope of any buffer modification must be
>> enclosed in a pair of SIL instructions. Those instructions define the scope
>> of the mutation. Both instructions take the *address* of the buffer
>> reference as operand and act as a potential write to the buffer reference. 
>> 
>> The purpose of the scoping instructions is to strictly separate the 
>> liferanges
>> of references to an immutable buffer and references to the mutable buffer.
> 
> Looks reasonable.
> 
>> The following example shows why the scoping instructions (specifically the
>> end-of-scope instruction) are required to prevent loop-hoisting from
>> interleaving mutable and immutable liferanges::
>> 
>>    // there should be a begin-of-scope %baddr
>>    %mut_b = load %baddr
>>    store %x to %mut_b    // modification of the buffer
>>    // there should be a end-of-scope %baddr
>> 
>>    loop {
>>      %b = load %baddr
>>      %y = load %b        // load from the buffer
>>      ...
>>    }
>> 
>> If there is no end-of-scope instruction, loop hoisting could do::
>> 
>>    %mut_b = load %baddr
>>    %b = load %baddr        // moved out of the loop
>>    store %x to %mut_b
>> 
>>    loop {
>>      %y = load %b
>>      ...
>>    }
>> 
>> Now the optimizer assumes that ``%b`` references an immutable buffer, so it 
>> could
>> also hoist the load::
>> 
>>    %mut_b = load %baddr
>>    %b = load %baddr
>>    %y = load %b          // Wrong! Will be overwritten by the following store
>>    store %x to %mut_b
>> 
>>    loop {
>>      ...
>>    }
>> 
>> 
>> The following sections describe two alternatives to implement the scoping.
>> 
>> Scoping Alternative 1: Explicit Built-ins
>> -----------------------------------------
>> 
>> SIL instructions
>> ^^^^^^^^^^^^^^^^
>> 
>> The existing ``is_unique`` instruction is changed to a terminator 
>> instruction::
>> 
>>    bb0:
>>      is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2  // %0 is the 
>> address of the COWBuffer reference
>>    bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique 
>> reference. Physically identical to "load %0”
>>      // usually empty
>>      br bb3(%1 : $COWBuffer)
>>    bb2:                  // the false-block
>>      // usually contains:
>>      %2 = apply %copy_buffer
>>      %3 = cow_to_ref %2
>>      store_strong %3 to %0 : $*@sil_cow COWBuffer
>>      br bb3(%2 : $COWBuffer)
>>    bb3(%4 : $COWBuffer):
>>      // Modify the buffer referenced by %4
>>      // ...
>> 
>> The end-of-scope instruction is::
>> 
>>    end_unique_addr %0 : $*COWBuffer
>> 
>> It is important that the references to the unique buffers (``%1``, ``%2``) 
>> must
>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL
>> verifier.
>> 
>> The two instructions must be paired properly but not necessarily in the
>> same function.
>> 
>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to
>> separate the lifetimes of mutable and immutable accesses to the COW buffer.
>> Both instructions take an address to the COW buffer reference and are
>> considered as potential stores to the reference.
>> This makes sure that the SIL optimizer cannot mix-up buffer reference 
>> lifetimes
>> across these instructions.
>> For example, RLE cannot combine two buffer loads which are interleaved with
>> a ``is_unique_addr_br``::
>> 
>>    %1 = load_strong %0 : $*@sil_cow COWBuffer
>>    // do something with %1
>>    …
>>    is_unique_addr_br %0 : $*@sil_cow COWBuffer
>>    …
>>    %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this 
>> with %1
>> 
>> Another important thing is that the COW buffer can only be mutated by using 
>> the
>> reference of the ``is_unique_addr_br`` true-block argument.
>> The COW buffer cannot be modified by simply loading/extracting the reference
>> from the COWType.
>> Example::
>> 
>>  %1 = load_strong %0 : $*COWBuffer
>>  %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>  %3 = ref_element_addr %2 : $COWBuffer, #someData
>>  store %7 : $Int to %3 : $*Int            // Violation!
>> 
>> Most obvious violations to this constraint can be catched by the SILVerifier.
>> 
>> The ``_addr`` variants of the instructions also have a non-addr counterpart::
>> 
>>    is_unique_br %0 : $COWBuffer, bb1, bb2.  // consumes %0 and produces the 
>> true-block arg as owned
>> 
>>    %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned
>> 
>> These instructions are generated by Mem2reg (or a similar optimization)
>> in case the COW value is stored (in a temporary alloc_stack location)
>> just for the sake of passing an address to ``is_unique_addr_br`` and
>> ``end_unique_addr``.
>> For example in the following code, where the COW data is passed as-value and
>> all the mutating functions are inlined::
>> 
>>    func foo(arr : [Int], x: Int) {
>>      arr[0] = 27
>>      …
>>      y = arr[x]
>>      …
>>    }
>> 
>> Finally it’s probably a good idea to add an instruction for converting an
>> immutable reference to a mutable reference::
>> 
>>    %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : 
>> $COWBuffer as owned
>> 
>> which is basically just a simpler representation of the following pattern::
>> 
>>    bb0:
>>      is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2
>>    bb1(%1 : $COWBuffer):
>>      … // main control flow continues here
>>    bb2:
>>      unreachable
>> 
>> An optimizations, which eliminate uniqueness checks, would replace a
>> ``is_unique_br`` by a ``start_unique``.
>> 
>> Built-ins
>> ^^^^^^^^^
>> 
>> A COW type implementor can generate the new instructions by using a set of 
>> built-ins::
>> 
>>    func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?
>>    func endUnique<BufferType>(_ buffer: inout BufferType)  
>> 
>> For example::
>> 
>>    struct COWType {
>>      copy_on_write var b : COWBuffer
>> 
>>      mutating func makeMutable() -> COWBuffer {
>>        if let uniqueBuffer = isUnique(&self.b) {
>>          return uniqueBuffer
>>        }
>>        let copiedBuffer = copyBuffer(self.b)
>>        self.b = copiedBuffer
>>        return copiedBuffer
>>      }
>> 
>>      mutating func setSomeData(x: Int) {
>>        let uniqueBuffer = makeMutable()
>>        uniqueBuffer.someData = x
>>        endUnique(&self.b)
>>      }
>>    }
> 
> This seems reasonable, but it also looks like the compiler could do the
> `endUnique` dance for us based, e.g., on the mutability of methods.  

I agree, that would be ideal, e.g. the compiler could insert the endUnique at 
the end of an inout scope.

> 
>> The ``isUnique`` built-in returns an optional unique buffer reference.
>> Physically this is the COW buffer which is passed as the inout argument.
>> The result is nil if the buffer is not uniquely referenced.
>> In this case usually the original buffer is copied and the reference to the
>> copy is written back to the original buffer reference location
>> (``self.b = copiedBuffer``).
>> Starting at the point of the write-back, the reference to the copy also 
>> becomes
>> a unique buffer reference.
>> 
>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern 
>> which
>> constructs the Optional in the successor blocks. Using ``isUnique`` in an
>> if-let (as shown above) will end up in two consecutive CFG "diamonds".
>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond.
>> 
>> .. note::
>>  This makes the definition of the unique buffer location lifetime a little 
>> bit
>>  problematic, because the false-branch of ``isUnique`` is not equivalent to
>>  the false-branch of the ``is_unique_addr_br`` instruction (before 
>> SimplifyCFG
>>  can do its job).
> 
> I don't know what the implications of these diamonds and the problem
> described above might be, FWIW.
> 
>> The rules for using ``copy_on_write`` and the built-ins are:
>> 
>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the
>>   same function.
>> 
>> 2. The COW buffer may only be mutated by using the unique buffer reference.
>> 
>> 3. The COW buffer must not be mutated outside the ``isUnique`` - 
>> ``endUnique``
>>   pair.
>> 
>> 4. During the lifetime of the unique buffer reference, the original COW 
>> buffer
>>   reference must not be used in any way, e.g. for reading from the buffer.
>> 
>>   Note that the lifetime of the unique buffer reference does not include the
>>   part between the begin of the ``isUnique`` false-branch and the write-back
>>   of the copy. This means is okay to read from the buffer (using ``self.b``)
>>   for the purpose of copying.
>> 
>> Examples::
>> 
>>    mutating func setSomeData(x: Int) {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = x
>>      // violates rule 1
>>    }
>> 
>>    mutating func setSomeData(x: Int) {
>>      makeMutable()
>>      self.b.someData = x // violates rule 2
>>      endUnique(&self.b)
>>    }
>> 
>>    mutating func setSomeData(x: Int) {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = x
>>      endUnique(&self.b)
>>      uniqueBuffer.someData = 27 // violates rule 3
>>    }
>> 
>>    mutating func incrementSomeData() {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = self.b.someData + 1 // violates rule 4
>>      endUnique(&self.b)
>>    }
> 
> It would be instructive to write down the *correct* code for these
> operations.

added to my todo list.

> 
>> The intention of the rules is to ensure that there is no overlap of a
>> "read-only" life-range with a "mutable" life-range of the buffer reference.
>> It’s the responsibility of the implementor to follow the rules.
>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can
>> statically detect rule violations in obvious cases (with inter-procedural
>> analysis maybe even in most cases).
>> 
>> This approach would require to change some of the internals of our
>> current COW data structures in the stdlib (Array, Dictionary, etc.).
>> For example, the Array make_mutable semantic functions currently do not 
>> return
>> the unique buffer.
> 
> No big deal.
> 
>> Scoping Alternative 2: Implicit Inout Scopes
>> --------------------------------------------
>> 
>> There is an idea (proposal?) to change the representation of inout variables
>> in SIL. This is independent of this proposal, but can be helpful for the
>> purpose of defining the scope of a COW mutation.
>> 
>> The basic idea is that SILGen inserts scoping instructions for *all* inout
>> variables. And those scoping instructions can be used to define the mutating
>> scope of a COW buffer.
>> 
>> The scoping instructions which are inserted by SILGen for an inout scope 
>> are::
>> 
>>    begin_exclusive
>>    end_exclusive
>> 
>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions 
>> take the
>> address of the inout variable as argument. For the optimizer those 
>> instructions
>> look like potential writes to the inout variable.
>> 
>> The implementor of a COW type has to follow the rule that the COW buffer may
>> only be modified in mutating functions of the COW type. But this is the case
>> anyway because any modification needs a uniqueness check and this can only be
>> done in mutating functions.
>> 
>> Example::
>> 
>>    // > mutating func setSomeData(x: Int) {
>>    // Accepts a unique reference to the array value (avoiding refcount 
>> operations)
>>    sil @setSomeData : $(Int, @inout Array) -> () {
>>    bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0
>> 
>>    // >   makeMutable() (inlined)
>>    // Forward the unique reference to the `self` array value, still avoiding 
>> refcount operations.
>>    // Begin the inlined exclusive scope (could be trivially removed).
>>    begin_exclusive %arrayref : $*Array<T> // Begin scope #1
>> 
>>    // >    if !isUnique(&self._storage) {
>>    // Extract a unique inout reference to the class reference to the array 
>> storage.
>>    // This begins the isUnique() argument's exclusive scope. The memory is 
>> already exclusive
>>    // but the scope helps ensure this is the only alias to _storage.
>>    %arrayref._storageref = struct_element_addr [exclusive] %arrayref, 
>> #Array._storage
>> 
>>    // Uniqueness checking requires an inout reference to the class reference.
>>    // The is_unique instruction does not need to create a new storage 
>> reference.
>>    // It's only purpose is to check the RC count, ensure that the checked 
>> reference
>>    // is inout, and prevent the inout scope from being optimized away.
>>    %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>> 
>>    // End the isUnique argument's exclusive scope (can also be trivially 
>> removed).
>>    end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>> 
>>    br %isuniq, bb_continue, bb_slow
>> 
>>    bb_slow:
>>    // >      self._storage = copyBuffer(self._storage)
>>    // Produce a new class reference to storage with verifiably unique RC 
>> semantics.
>>    %copied_storage_class = alloc_ref ...
>>    // A begin/end exclusive scope is implicit in store [assign].
>>    store [assign] %copied_storage_class to %arrayref._storageref
>>    br bb_continue
>> 
>>    bb_continue:
>> 
>>    // This marks the end of makeMutable's inout `self` scope. Because Array
>>    // contains a "copy_on_write" property, the SIL verifier needs to
>>    // prove that %arrayref.#_storage has not escaped at this point. This
>>    // is equivalent to checking that %arrayref itself is not copied, and
>>    // checking each projection of the "copy_on_write" storage property
>>    // (%arrayref._storageref) is not copied. Or, if any copies are present,
>>    // they must be consumed within this scope.
>>    end_exclusive %arrayref : $*Array<T> // End scope #1
>> 
>>    // >    self._storage.someData = x
>>    // An _addr instruction with one load/store use doesn't really need its 
>> own scope.
>>    %arrayref._storageref = struct_element_addr %arrayref, #Array._storage
>> 
>>    // ARC optimization can promote this to a borrow, replacing 
>> strong_release with end_borrow.
>>    %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow 
>> ArrayStorage
>>    %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow 
>> ArrayStorage
>> 
>>    // Write some data into the CoW buffer.
>>    // (For simplicity, pretend ArrayStorage has a "someData" field).
>>    // A single-use _addr instruction, so no scope.
>>    %somedata_addr = ref_element_addr %arrayref._storage, #someData
>>    // A store with an implicit [exclusive] scope.
>>    store [assign] %x to %somedata_addr
>> 
>>    strong_release %arrayref._storage : $*ArrayStorage<T>
>> 
>>    // End the isUnique argument's exclusive scope.
>>    // The same verification is needed here, but the inner scope would be 
>> eliminated.
>>    end_exclusive %arrayref : $*Array<T> // End scope #0
>> 
>> In general this approach looks more "user-friendly" than the first
>> alternative.
> 
> Well, I can't really tell, because you haven't shown the Swift code that
> generates this SIL.
> 
>> But it depends on implementing the general feature to insert the inout
>> scoping instructions.  Also, we still have to think through all the
>> details of this approach.
> 
> FWIW, I am convinced we will need (and get) a stricter inout model that
> would be conducive to inserting the scoping instructions.
> 
> 
>> Dependency between a buffer reference to the scope-begin
>> --------------------------------------------------------
> 
> You can only have a dependency between two things, but as phrased “a
> buffer reference to the scope-begin” sounds like one thing.  s/to/and/
> would fix it.
> 
>> With both alternatives there is no explicit dependency from a buffer 
>> reference
>> to a scope-begin instruction::
>> 
>>    %b_cow = load %baddr
>>    %b = cow_to_ref %b_cow
>>    %x = load %b             // No dependency between this...
>>    ...
>>    begin_exclusive %baddr   // ... and this instruction.
>>    ...
>> 
>> So in theory the optimizer is free to reschedule the instructions::
>> 
>>    %b_cow = load %baddr
>>    %b = cow_to_ref %b_cow
>>    ...
>>    begin_exclusive %baddr
>>    %x = load %b             // Wrong! Buffer could be modified here
>>    ...
>> 
>> We still have to figure out how to cope with this.
>> 
>> - We could add an end-of-lifetime instruction for a COW buffer reference, 
>> which
>>  the optimizer may not move over a begin-of-scope instruction.
>> 
>> - Or we just define the implicit rule for the optimizer that any use of a COW
>>  reference may not be moved over a begin-of-scope instruction.
>> 
>> Preconditions
>> =============
>> 
>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary 
>> data
>> structures we first need eager bridging, meaning getting rid of the bridged
>> buffer. 
> 
> As you know I'm very much in favor of eager bridging, but I don't see
> why this would be dependent on it.

We could use copy_on_write with eager bridging, but I don’t think it will give 
any benefits to the optimizer.
For example, the SIL code to get from an Array to a native 
ContiguousArrayStorage reference is pretty hard to understand for the optimizer 
(involves low level bit operations, etc.).

> 
>> At least for Array this is implemented as low-level bit operations and
>> optimizations cannot reason about it (e.g. finding a reasonable
>> RC-root for the buffer reference).
>> 
>> Another thing is that we currently cannot use ``copy_on_write`` for Array
>> because of pinning. Array pins it’s buffer when passing an element address to
>> an inout parameter. This allows the array buffer to be modified even if its
>> reference count is > 1. With ``copy_on_write``, a programmer could break 
>> memory
>> safety when violating the inout rule. Example::
>> 
>>    var arr = [MyClass()]  // a global array
>> 
>>    foo(&arr[0])        // Pins the buffer of arr during the call
>> 
>>    func foo(_ x: inout MyClass) -> Int {
>>      let b = arr       // The ref-count of the buffer is not incremented, 
>> because it is pinned!
>>      let r = b[0]      // optimizer removes the retain of r because it 
>> thinks the following code cannot modify b
>>      arr.removeAll()   // does not copy the array buffer and thus 
>> de-allocates r
>>      return r.i        // use-after-free!
>>    }
> 
> I only know of one way to resolve inout and pinning:
> 
> * Semantically, references are replaced with a trap value when entering
>  an inout context so that all inout values are provably unique
>  references in the absence of unsafe code.  We drop pinning and provide
>  explicit operations that provide simultaneous lvalue accesses to
>  distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices.
> 
> If there are other ideas out there, I'd like to hear them.  If not, we
> should probably decide that this is what we're doing so that we can move
> forward without this looming uncertainty.
> 
> -- 
> -Dave
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev@swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] copy-on-write proposal

Reply via email to