on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com> wrote: > On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev > <swift-dev@swift.org> wrote: > >> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >> <http://swift-dev-at-swift.org/>> wrote: >> >>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>> <swift-dev@swift.org> wrote: >>>> >>>> This is a proposal for representing copy-on-write buffers in >>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>> depends on how we move forward with SIL ownership. >>>> <CopyOnWrite.rst> >>>> If you have any comments, please let me know. >>> >>> The SIL-level design seems sensible to me at a glance. At the language >>> level, I think it would make more sense to treat this as an attribute >>> on class types rather than on properties in structs using the class. I >>> don't think many people reuse class definitions as both shared >>> reference types and as value type payloads, >> >> Foundation does, or would if they could. >> >>> but beyond that, I think that making it an attribute of classes would >>> put us into a better position to leverage the borrow model to enforce >>> the "mutable-only-when-unique" aspect of COW implementations. John >>> alluded to this in the "SIL address types and borrowing" thread: >>> >>>> I wonder if it would make more sense to make copy-on-write buffer >>>> references a move-only type, so that as long as you were just >>>> working with the raw reference (as opposed to the CoW aggregate, >>>> which would remain copyable) it wouldn't get implicitly copied >>>> anymore. You could have mutable and immutable buffer reference >>>> types, both move-only, and there could be a consuming checkUnique >>>> operation on the immutable one that, I dunno, returned an Either of >>>> the mutable and immutable versions. >>>> >>>> For CoW aggregates, you'd need some @copied attribute on the field >>>> to make sure that the CoW attribute was still copyable. Within the >>>> implementation of the type, though, you would be projecting out the >>>> reference immediately, and thereafter you'd be certain that you were >>>> borrowing / moving it around as appropriate. >>> >>> If 'copy-on-write' were a trait on classes, then we could distinguish >>> unique and nonunique references to the class. A unique reference would >>> act like a move-only type to prevent accidental loss of uniqueness. >> >> +1 >> >>> We can also allow a copy-on-write class to have "mutating" methods, >>> and only allow mutation on unique references. It seems to me like, >>> exploring this direction, we could also come up with a way for the >>> high-level value-semantics operations on the struct to statically >>> indicate which methods are known to leave the value's buffers in a >>> unique state, or which return values that are uniquely owned, which >>> would give the optimizer more ability to avoid uniqueness checks >>> across calls without relying on inlining and IPO. >> >> That's pretty cool. However, I think there's nothing to prevent any >> mutating method from storing a copy of self in a global, so I think we'd >> need some participation from the programmer (either an agreement not to >> do that, or an explicit claim of uniqueness on exit) in order to >> identify operations that create/preserve uniqueness. > > If a mutating reference (like self in a mutating method) is move-only > then you would not be able to “copy” it to a global.
Yes, a reference to a move-only type would work for this purpose. >> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >> <swift-dev@swift.org> wrote: >> >> >> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: >> >>> This is a proposal for representing copy-on-write buffers in >>> SIL. Actually it’s still a draft for a proposal. It also heavily >>> depends on how we move forward with SIL ownership. >>> >>> :orphan: >>> >>> .. highlight:: sil >>> >>> =================================== >>> Copy-On-Write Representation in SIL >>> =================================== >>> >>> .. contents:: >>> >>> Overview >>> ======== >>> >>> This document proposes: >>> >>> - An ownership attribute to define copy-on-write (COW) buffers in Swift data >>> types. >>> >>> - A representation of COW buffers in SIL so that optimizations can take >>> benefit >>> of it. >>> >>> The basic idea is to enable the SIL optimizer to reason about COW data types >>> in the same way as a programmer can do. >>> This means: a COW buffer can only be modified by its owning SIL value, >>> because >>> either it's uniquely referenced or the buffer is copied before modified. >>> >>> .. note:: >>> In the following the term "buffer" refers to a Swift heap object. >>> It can be any heap object, not necessarily a “buffer” with e.g. >>> tail-allocated elements. >>> >>> COW Types >>> ========= >>> >>> The basic structure of COW data types can be simplified as follows:: >>> >>> class COWBuffer { >>> var someData: Int >>> ... >>> } >>> >>> struct COWType { >>> var b : COWBuffer >>> >>> mutating func change_it() { >>> if (!isUniquelyReferenced(b)) { >>> b = copy_buffer(b) >>> } >>> b.someData = ... >>> } >>> } >>> >>> Currently the COW behavior of such types is just defined by their >>> implementation. >>> But there is no representation of this special behavior in the SIL. >>> So the SIL optimizer has no clue about it and cannot take advantage of it. >>> >>> For example:: >>> >>> func foo(arr : [Int]) { >>> x = arr[0] >>> opaque_function() >>> y = arr[0] // can RLE replace this with y = x? >>> } >>> >>> If opaque_function() wants to change the contents of the array buffer it >>> first >>> has to copy it. >> >> ...or determine that it's uniquely-referenced. > > In this specific example, if opqaue_function holds a reference to arr’s > buffer, the buffer is not > uniquely-referenced. Right. >> >>> But the optimizer does not know it so it has to conservatively assume >>> that opaque_function() will write to the location of arr[0]. >>> >>> Copy-on-write Ownership Attribute >>> ================================= >>> >>> This section proposes an ownership attribute to define a copy-on-write >>> buffer. >>> >>> Swift Syntax >>> ------------ >>> >>> A COW buffer reference can be defined with a new ownership attribute for the >>> buffer variable declaration (similar to “weak” and “unowned”):: >>> >>> struct COWType { >>> copy_on_write var b : COWBuffer >>> >>> // ... >>> } >>> >>> The ``copy_on_write`` attribute is purely used for optimization purposes. >>> It does not change the semantics of the program. >> >> Presumably, it changes what code you can execute on `b` without invoking >> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >> to do anything differently to take advantage of the annotation. > > That’s true. > >> What are the rules for writing code that uses `copy_on_write`? > > See below ("The rules for using ``copy_on_write`` and the built-ins are:”) Yeah, I got there, eventually. But just saying “doesn't change semantics” at this point in the proposal leaves a gap, because it does change semantic *requirements*. You should mention that. >>> .. note:: >>> >>> “copy_on_write” is a working title. TODO: decide on the name. >>> Maybe it should be a @-attribute, like @copy_on_write? >>> Another question is if we should open this attribute for the public or just >>> use it internally in the library, because violating the implied rules >>> (see below) could break memory safety. >>> >>> Implementation >>> -------------- >>> >>> The ``copy_on_write`` references can be represented in the AST as a special >>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced >>> buffer class type. >>> >>> In SIL the buffer reference will have type:: >>> >>> $@sil_cow COWBuffer >>> >>> where ``COWBuffer`` is the type of the referenced heap object. >>> >>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>> reference >>> type to a regular reference type:: >>> >>> cow_to_ref >>> ref_to_cow >>> >>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>> >>> For example the SIL code for:: >>> >>> var c: COWType >>> let x = c.b.someData >>> >>> would be:: >>> >>> %1 = struct_extract %1 : COWType, #COWType.b >>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>> %4 = load %3 : $*Int >>> >>> The ``ref_to_cow`` instruction is needed to store a new buffer reference >>> into a >>> COW type. >>> >>> COW Buffers and the Optimizer >>> ============================= >>> >>> A reference to a COW buffer gives the optimizer additional information: >>> >>> *A buffer, referenced by a @sil_cow reference is considered to be immutable >>> during the lifetime of the reference.* >> >> This seems like much too broad a rule to allow inplace mutations of >> uniquely referenced buffers. > > The point is that all mutations must be guarded by an is_unique, which > takes the _address_ of the buffer reference as argument. > And the optimizer considers this instruction as a potential write to the > buffer reference. > The effect is that the lifetime of a buffer reference (as a SIL value) > will not outlive a is_unique - regardless if this is inside a called > function or inlined. I don't see how that allows me to mutate a uniquely referenced buffer held in a @sil_cow reference, given what you wrote above. >> Unless you mean the reference is >> immutable, rather than the storage being referred to by it. >> >>> This means any address derived from a ``cow_to_ref`` instruction can be >>> considered to point to immutable memory. >>> >>> Some examples of optimizations which will benefit from copy-on-write >>> representation in SIL: >>> >>> - Redundant load elimination >>> >>> RLE can assume that opaque code does not modify a COW buffer. >> >> How do you distinguish “opaque code” from “code that is meant to >> modify the buffer and might do so in place if it's uniquely-referenced?” > > Again, the is_unique which takes the address of the reference, will > guarantee that during the lifetime of a buffer there are no > modifications of the buffer. Again, that sounds like it rules out inplace modification of uniquely referenced buffers. > > >> >>> Example:: >>> >>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>> %4 = load %3 : $*Int >>> %5 = apply %foo() // Cannot overwrite memory >>> location %3 >>> %6 = load %3 : $*Int // Can be replaced by %4 >>> >>> Currently we do some ad-hoc optimizations for array, based on semantics, >>> like array count propagation. These hacks would not be needed >>> anymore. >> >> W0000000000000000000000t. >> >>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a >>> projected address) escapes. Even if it escapes, it will reference immutable >>> memory. >>> >>> - CSE, loop hoisting >>> >>> Similar to RLE: the optimizer can assume that opaque code cannot modify a >>> COW buffer >> >> Same question here as above, then. >>> >>> - ARC optimization >>> >>> Knowing that some opaque code cannot overwrite a reference in the COW >>> buffer >>> can remove retain/release pairs across such code:: >>> >>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>> [guarantee] >>> %5 = apply %foo() // Cannot overwrite someRef >>> and dealloc the object >>> // Use %4 >>> destroy_value %4 : $MyClass >>> >>> Scoping instructions >>> -------------------- >>> >>> To let the optimizer reason about the immutability of the COW buffer, it is >>> important to *bind* the lifetime of the buffer content to the lifetime of >>> the >>> buffer reference. For example:: >>> >>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>> // load something from %b1 >>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>> again >>> // load something from %b2 >>> >>> The question is: can RLE forward the load of the buffer reference and >>> replace >>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the >>> buffer. >>> >>> To enforce this restriction, the scope of any buffer modification must be >>> enclosed in a pair of SIL instructions. Those instructions define the scope >>> of the mutation. Both instructions take the *address* of the buffer >>> reference as operand and act as a potential write to the buffer reference. >>> >>> The purpose of the scoping instructions is to strictly separate the >>> liferanges >>> of references to an immutable buffer and references to the mutable buffer. >> >> Looks reasonable. >> >>> The following example shows why the scoping instructions (specifically the >>> end-of-scope instruction) are required to prevent loop-hoisting from >>> interleaving mutable and immutable liferanges:: >>> >>> // there should be a begin-of-scope %baddr >>> %mut_b = load %baddr >>> store %x to %mut_b // modification of the buffer >>> // there should be a end-of-scope %baddr >>> >>> loop { >>> %b = load %baddr >>> %y = load %b // load from the buffer >>> ... >>> } >>> >>> If there is no end-of-scope instruction, loop hoisting could do:: >>> >>> %mut_b = load %baddr >>> %b = load %baddr // moved out of the loop >>> store %x to %mut_b >>> >>> loop { >>> %y = load %b >>> ... >>> } >>> >>> Now the optimizer assumes that ``%b`` references an immutable buffer, so it >>> could >>> also hoist the load:: >>> >>> %mut_b = load %baddr >>> %b = load %baddr >>> %y = load %b // Wrong! Will be overwritten by the following >>> store >>> store %x to %mut_b >>> >>> loop { >>> ... >>> } >>> >>> >>> The following sections describe two alternatives to implement the scoping. >>> >>> Scoping Alternative 1: Explicit Built-ins >>> ----------------------------------------- >>> >>> SIL instructions >>> ^^^^^^^^^^^^^^^^ >>> >>> The existing ``is_unique`` instruction is changed to a terminator >>> instruction:: >>> >>> bb0: >>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >>> address of the COWBuffer reference >>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >>> reference. Physically identical to "load %0” >>> // usually empty >>> br bb3(%1 : $COWBuffer) >>> bb2: // the false-block >>> // usually contains: >>> %2 = apply %copy_buffer >>> %3 = cow_to_ref %2 >>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>> br bb3(%2 : $COWBuffer) >>> bb3(%4 : $COWBuffer): >>> // Modify the buffer referenced by %4 >>> // ... >>> >>> The end-of-scope instruction is:: >>> >>> end_unique_addr %0 : $*COWBuffer >>> >>> It is important that the references to the unique buffers (``%1``, ``%2``) >>> must >>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL >>> verifier. >>> >>> The two instructions must be paired properly but not necessarily in the >>> same function. >>> >>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to >>> separate the lifetimes of mutable and immutable accesses to the COW buffer. >>> Both instructions take an address to the COW buffer reference and are >>> considered as potential stores to the reference. >>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>> lifetimes >>> across these instructions. >>> For example, RLE cannot combine two buffer loads which are interleaved with >>> a ``is_unique_addr_br``:: >>> >>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>> // do something with %1 >>> … >>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>> … >>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >>> with %1 >>> >>> Another important thing is that the COW buffer can only be mutated by using >>> the >>> reference of the ``is_unique_addr_br`` true-block argument. >>> The COW buffer cannot be modified by simply loading/extracting the reference >>> from the COWType. >>> Example:: >>> >>> %1 = load_strong %0 : $*COWBuffer >>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>> store %7 : $Int to %3 : $*Int // Violation! >>> >>> Most obvious violations to this constraint can be catched by the >>> SILVerifier. >>> >>> The ``_addr`` variants of the instructions also have a non-addr >>> counterpart:: >>> >>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces the >>> true-block arg as owned >>> >>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned >>> >>> These instructions are generated by Mem2reg (or a similar optimization) >>> in case the COW value is stored (in a temporary alloc_stack location) >>> just for the sake of passing an address to ``is_unique_addr_br`` and >>> ``end_unique_addr``. >>> For example in the following code, where the COW data is passed as-value and >>> all the mutating functions are inlined:: >>> >>> func foo(arr : [Int], x: Int) { >>> arr[0] = 27 >>> … >>> y = arr[x] >>> … >>> } >>> >>> Finally it’s probably a good idea to add an instruction for converting an >>> immutable reference to a mutable reference:: >>> >>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>> $COWBuffer as owned >>> >>> which is basically just a simpler representation of the following pattern:: >>> >>> bb0: >>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>> bb1(%1 : $COWBuffer): >>> … // main control flow continues here >>> bb2: >>> unreachable >>> >>> An optimizations, which eliminate uniqueness checks, would replace a >>> ``is_unique_br`` by a ``start_unique``. >>> >>> Built-ins >>> ^^^^^^^^^ >>> >>> A COW type implementor can generate the new instructions by using a set of >>> built-ins:: >>> >>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>> func endUnique<BufferType>(_ buffer: inout BufferType) >>> >>> For example:: >>> >>> struct COWType { >>> copy_on_write var b : COWBuffer >>> >>> mutating func makeMutable() -> COWBuffer { >>> if let uniqueBuffer = isUnique(&self.b) { >>> return uniqueBuffer >>> } >>> let copiedBuffer = copyBuffer(self.b) >>> self.b = copiedBuffer >>> return copiedBuffer >>> } >>> >>> mutating func setSomeData(x: Int) { >>> let uniqueBuffer = makeMutable() >>> uniqueBuffer.someData = x >>> endUnique(&self.b) >>> } >>> } >> >> This seems reasonable, but it also looks like the compiler could do the >> `endUnique` dance for us based, e.g., on the mutability of methods. > > I agree, that would be ideal, e.g. the compiler could insert the endUnique at > the end of an inout > scope. > >> >>> The ``isUnique`` built-in returns an optional unique buffer reference. >>> Physically this is the COW buffer which is passed as the inout argument. >>> The result is nil if the buffer is not uniquely referenced. >>> In this case usually the original buffer is copied and the reference to the >>> copy is written back to the original buffer reference location >>> (``self.b = copiedBuffer``). >>> Starting at the point of the write-back, the reference to the copy also >>> becomes >>> a unique buffer reference. >>> >>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern >>> which >>> constructs the Optional in the successor blocks. Using ``isUnique`` in an >>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond. >>> >>> .. note:: >>> This makes the definition of the unique buffer location lifetime a little >>> bit >>> problematic, because the false-branch of ``isUnique`` is not equivalent to >>> the false-branch of the ``is_unique_addr_br`` instruction (before >>> SimplifyCFG >>> can do its job). >> >> I don't know what the implications of these diamonds and the problem >> described above might be, FWIW. >> >>> The rules for using ``copy_on_write`` and the built-ins are: >>> >>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in >>> the >>> same function. >>> >>> 2. The COW buffer may only be mutated by using the unique buffer reference. >>> >>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>> ``endUnique`` >>> pair. >>> >>> 4. During the lifetime of the unique buffer reference, the original COW >>> buffer >>> reference must not be used in any way, e.g. for reading from the buffer. >>> >>> Note that the lifetime of the unique buffer reference does not include the >>> part between the begin of the ``isUnique`` false-branch and the write-back >>> of the copy. This means is okay to read from the buffer (using ``self.b``) >>> for the purpose of copying. >>> >>> Examples:: >>> >>> mutating func setSomeData(x: Int) { >>> let uniqueBuffer = makeMutable() >>> uniqueBuffer.someData = x >>> // violates rule 1 >>> } >>> >>> mutating func setSomeData(x: Int) { >>> makeMutable() >>> self.b.someData = x // violates rule 2 >>> endUnique(&self.b) >>> } >>> >>> mutating func setSomeData(x: Int) { >>> let uniqueBuffer = makeMutable() >>> uniqueBuffer.someData = x >>> endUnique(&self.b) >>> uniqueBuffer.someData = 27 // violates rule 3 >>> } >>> >>> mutating func incrementSomeData() { >>> let uniqueBuffer = makeMutable() >>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>> endUnique(&self.b) >>> } >> >> It would be instructive to write down the *correct* code for these >> operations. > > added to my todo list. > >> >>> The intention of the rules is to ensure that there is no overlap of a >>> "read-only" life-range with a "mutable" life-range of the buffer reference. >>> It’s the responsibility of the implementor to follow the rules. >>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can >>> statically detect rule violations in obvious cases (with inter-procedural >>> analysis maybe even in most cases). >>> >>> This approach would require to change some of the internals of our >>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>> For example, the Array make_mutable semantic functions currently do not >>> return >>> the unique buffer. >> >> No big deal. >> >>> Scoping Alternative 2: Implicit Inout Scopes >>> -------------------------------------------- >>> >>> There is an idea (proposal?) to change the representation of inout variables >>> in SIL. This is independent of this proposal, but can be helpful for the >>> purpose of defining the scope of a COW mutation. >>> >>> The basic idea is that SILGen inserts scoping instructions for *all* inout >>> variables. And those scoping instructions can be used to define the mutating >>> scope of a COW buffer. >>> >>> The scoping instructions which are inserted by SILGen for an inout scope >>> are:: >>> >>> begin_exclusive >>> end_exclusive >>> >>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>> instructions take the >>> address of the inout variable as argument. For the optimizer those >>> instructions >>> look like potential writes to the inout variable. >>> >>> The implementor of a COW type has to follow the rule that the COW buffer may >>> only be modified in mutating functions of the COW type. But this is the case >>> anyway because any modification needs a uniqueness check and this can only >>> be >>> done in mutating functions. >>> >>> Example:: >>> >>> // > mutating func setSomeData(x: Int) { >>> // Accepts a unique reference to the array value (avoiding refcount >>> operations) >>> sil @setSomeData : $(Int, @inout Array) -> () { >>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>> >>> // > makeMutable() (inlined) >>> // Forward the unique reference to the `self` array value, still >>> avoiding refcount operations. >>> // Begin the inlined exclusive scope (could be trivially removed). >>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>> >>> // > if !isUnique(&self._storage) { >>> // Extract a unique inout reference to the class reference to the array >>> storage. >>> // This begins the isUnique() argument's exclusive scope. The memory is >>> already exclusive >>> // but the scope helps ensure this is the only alias to _storage. >>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>> #Array._storage >>> >>> // Uniqueness checking requires an inout reference to the class >>> reference. >>> // The is_unique instruction does not need to create a new storage >>> reference. >>> // It's only purpose is to check the RC count, ensure that the checked >>> reference >>> // is inout, and prevent the inout scope from being optimized away. >>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>> >>> // End the isUnique argument's exclusive scope (can also be trivially >>> removed). >>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>> >>> br %isuniq, bb_continue, bb_slow >>> >>> bb_slow: >>> // > self._storage = copyBuffer(self._storage) >>> // Produce a new class reference to storage with verifiably unique RC >>> semantics. >>> %copied_storage_class = alloc_ref ... >>> // A begin/end exclusive scope is implicit in store [assign]. >>> store [assign] %copied_storage_class to %arrayref._storageref >>> br bb_continue >>> >>> bb_continue: >>> >>> // This marks the end of makeMutable's inout `self` scope. Because Array >>> // contains a "copy_on_write" property, the SIL verifier needs to >>> // prove that %arrayref.#_storage has not escaped at this point. This >>> // is equivalent to checking that %arrayref itself is not copied, and >>> // checking each projection of the "copy_on_write" storage property >>> // (%arrayref._storageref) is not copied. Or, if any copies are present, >>> // they must be consumed within this scope. >>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>> >>> // > self._storage.someData = x >>> // An _addr instruction with one load/store use doesn't really need its >>> own scope. >>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >>> >>> // ARC optimization can promote this to a borrow, replacing >>> strong_release with end_borrow. >>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow >>> ArrayStorage >>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>> ArrayStorage >>> >>> // Write some data into the CoW buffer. >>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>> // A single-use _addr instruction, so no scope. >>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>> // A store with an implicit [exclusive] scope. >>> store [assign] %x to %somedata_addr >>> >>> strong_release %arrayref._storage : $*ArrayStorage<T> >>> >>> // End the isUnique argument's exclusive scope. >>> // The same verification is needed here, but the inner scope would be >>> eliminated. >>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>> >>> In general this approach looks more "user-friendly" than the first >>> alternative. >> >> Well, I can't really tell, because you haven't shown the Swift code that >> generates this SIL. >> >>> But it depends on implementing the general feature to insert the inout >>> scoping instructions. Also, we still have to think through all the >>> details of this approach. >> >> FWIW, I am convinced we will need (and get) a stricter inout model that >> would be conducive to inserting the scoping instructions. >> >> >>> Dependency between a buffer reference to the scope-begin >>> -------------------------------------------------------- >> >> You can only have a dependency between two things, but as phrased “a >> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >> would fix it. >> >>> With both alternatives there is no explicit dependency from a buffer >>> reference >>> to a scope-begin instruction:: >>> >>> %b_cow = load %baddr >>> %b = cow_to_ref %b_cow >>> %x = load %b // No dependency between this... >>> ... >>> begin_exclusive %baddr // ... and this instruction. >>> ... >>> >>> So in theory the optimizer is free to reschedule the instructions:: >>> >>> %b_cow = load %baddr >>> %b = cow_to_ref %b_cow >>> ... >>> begin_exclusive %baddr >>> %x = load %b // Wrong! Buffer could be modified here >>> ... >>> >>> We still have to figure out how to cope with this. >>> >>> - We could add an end-of-lifetime instruction for a COW buffer reference, >>> which >>> the optimizer may not move over a begin-of-scope instruction. >>> >>> - Or we just define the implicit rule for the optimizer that any use of a >>> COW >>> reference may not be moved over a begin-of-scope instruction. >>> >>> Preconditions >>> ============= >>> >>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary >>> data >>> structures we first need eager bridging, meaning getting rid of the bridged >>> buffer. >> >> As you know I'm very much in favor of eager bridging, but I don't see >> why this would be dependent on it. > > We could use copy_on_write with eager bridging, but I don’t think it will > give any benefits to the > optimizer. > For example, the SIL code to get from an Array to a native > ContiguousArrayStorage reference is pretty hard to understand for the > optimizer (involves low level bit operations, etc.). It wouldn't need to do low-level bit operations if our enums were capable/controllable enough. I'm just saying, there's no reason we couldn't give the optimizer something to work with that has higher level semantics than what we currently do. >>> At least for Array this is implemented as low-level bit operations and >>> optimizations cannot reason about it (e.g. finding a reasonable >>> RC-root for the buffer reference). >>> >>> Another thing is that we currently cannot use ``copy_on_write`` for Array >>> because of pinning. Array pins it’s buffer when passing an element address >>> to >>> an inout parameter. This allows the array buffer to be modified even if its >>> reference count is > 1. With ``copy_on_write``, a programmer could break >>> memory >>> safety when violating the inout rule. Example:: >>> >>> var arr = [MyClass()] // a global array >>> >>> foo(&arr[0]) // Pins the buffer of arr during the call >>> >>> func foo(_ x: inout MyClass) -> Int { >>> let b = arr // The ref-count of the buffer is not incremented, >>> because it is pinned! >>> let r = b[0] // optimizer removes the retain of r because it >>> thinks the following code cannot modify b >>> arr.removeAll() // does not copy the array buffer and thus >>> de-allocates r >>> return r.i // use-after-free! >>> } >> >> I only know of one way to resolve inout and pinning: >> >> * Semantically, references are replaced with a trap value when entering >> an inout context so that all inout values are provably unique >> references in the absence of unsafe code. We drop pinning and provide >> explicit operations that provide simultaneous lvalue accesses to >> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >> >> If there are other ideas out there, I'd like to hear them. If not, we >> should probably decide that this is what we're doing so that we can move >> forward without this looming uncertainty. >> >> -- >> -Dave >> >> _______________________________________________ >> swift-dev mailing list >> swift-dev@swift.org >> https://lists.swift.org/mailman/listinfo/swift-dev > -- -Dave _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev