On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev <swift-dev@swift.org> wrote:
> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org > <http://swift-dev-at-swift.org/>> wrote: > >>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>> <swift-dev@swift.org> wrote: >>> >>> This is a proposal for representing copy-on-write buffers in >>> SIL. Actually it’s still a draft for a proposal. It also heavily >>> depends on how we move forward with SIL ownership. >>> <CopyOnWrite.rst> >>> If you have any comments, please let me know. >> >> The SIL-level design seems sensible to me at a glance. At the language >> level, I think it would make more sense to treat this as an attribute >> on class types rather than on properties in structs using the class. I >> don't think many people reuse class definitions as both shared >> reference types and as value type payloads, > > Foundation does, or would if they could. > >> but beyond that, I think that making it an attribute of classes would >> put us into a better position to leverage the borrow model to enforce >> the "mutable-only-when-unique" aspect of COW implementations. John >> alluded to this in the "SIL address types and borrowing" thread: >> >>> I wonder if it would make more sense to make copy-on-write buffer >>> references a move-only type, so that as long as you were just >>> working with the raw reference (as opposed to the CoW aggregate, >>> which would remain copyable) it wouldn't get implicitly copied >>> anymore. You could have mutable and immutable buffer reference >>> types, both move-only, and there could be a consuming checkUnique >>> operation on the immutable one that, I dunno, returned an Either of >>> the mutable and immutable versions. >>> >>> For CoW aggregates, you'd need some @copied attribute on the field >>> to make sure that the CoW attribute was still copyable. Within the >>> implementation of the type, though, you would be projecting out the >>> reference immediately, and thereafter you'd be certain that you were >>> borrowing / moving it around as appropriate. >> >> If 'copy-on-write' were a trait on classes, then we could distinguish >> unique and nonunique references to the class. A unique reference would >> act like a move-only type to prevent accidental loss of uniqueness. > > +1 > >> We can also allow a copy-on-write class to have "mutating" methods, >> and only allow mutation on unique references. It seems to me like, >> exploring this direction, we could also come up with a way for the >> high-level value-semantics operations on the struct to statically >> indicate which methods are known to leave the value's buffers in a >> unique state, or which return values that are uniquely owned, which >> would give the optimizer more ability to avoid uniqueness checks >> across calls without relying on inlining and IPO. > > That's pretty cool. However, I think there's nothing to prevent any > mutating method from storing a copy of self in a global, so I think we'd > need some participation from the programmer (either an agreement not to > do that, or an explicit claim of uniqueness on exit) in order to > identify operations that create/preserve uniqueness. If a mutating reference (like self in a mutating method) is move-only then you would not be able to “copy” it to a global. > > > On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev > <swift-dev@swift.org> wrote: > > > on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: > >> This is a proposal for representing copy-on-write buffers in >> SIL. Actually it’s still a draft for a proposal. It also heavily >> depends on how we move forward with SIL ownership. >> >> :orphan: >> >> .. highlight:: sil >> >> =================================== >> Copy-On-Write Representation in SIL >> =================================== >> >> .. contents:: >> >> Overview >> ======== >> >> This document proposes: >> >> - An ownership attribute to define copy-on-write (COW) buffers in Swift data >> types. >> >> - A representation of COW buffers in SIL so that optimizations can take >> benefit >> of it. >> >> The basic idea is to enable the SIL optimizer to reason about COW data types >> in the same way as a programmer can do. >> This means: a COW buffer can only be modified by its owning SIL value, >> because >> either it's uniquely referenced or the buffer is copied before modified. >> >> .. note:: >> In the following the term "buffer" refers to a Swift heap object. >> It can be any heap object, not necessarily a “buffer” with e.g. >> tail-allocated elements. >> >> COW Types >> ========= >> >> The basic structure of COW data types can be simplified as follows:: >> >> class COWBuffer { >> var someData: Int >> ... >> } >> >> struct COWType { >> var b : COWBuffer >> >> mutating func change_it() { >> if (!isUniquelyReferenced(b)) { >> b = copy_buffer(b) >> } >> b.someData = ... >> } >> } >> >> Currently the COW behavior of such types is just defined by their >> implementation. >> But there is no representation of this special behavior in the SIL. >> So the SIL optimizer has no clue about it and cannot take advantage of it. >> >> For example:: >> >> func foo(arr : [Int]) { >> x = arr[0] >> opaque_function() >> y = arr[0] // can RLE replace this with y = x? >> } >> >> If opaque_function() wants to change the contents of the array buffer it >> first >> has to copy it. > > ...or determine that it's uniquely-referenced. In this specific example, if opqaue_function holds a reference to arr’s buffer, the buffer is not uniquely-referenced. > >> But the optimizer does not know it so it has to conservatively assume >> that opaque_function() will write to the location of arr[0]. >> >> Copy-on-write Ownership Attribute >> ================================= >> >> This section proposes an ownership attribute to define a copy-on-write >> buffer. >> >> Swift Syntax >> ------------ >> >> A COW buffer reference can be defined with a new ownership attribute for the >> buffer variable declaration (similar to “weak” and “unowned”):: >> >> struct COWType { >> copy_on_write var b : COWBuffer >> >> // ... >> } >> >> The ``copy_on_write`` attribute is purely used for optimization purposes. >> It does not change the semantics of the program. > > Presumably, it changes what code you can execute on `b` without invoking > traps or undefined behavior. Otherwise, the optimizer wouldn't be able > to do anything differently to take advantage of the annotation. That’s true. > What are the rules for writing code that uses `copy_on_write`? See below ("The rules for using ``copy_on_write`` and the built-ins are:”) > >> .. note:: >> >> “copy_on_write” is a working title. TODO: decide on the name. >> Maybe it should be a @-attribute, like @copy_on_write? >> Another question is if we should open this attribute for the public or just >> use it internally in the library, because violating the implied rules >> (see below) could break memory safety. >> >> Implementation >> -------------- >> >> The ``copy_on_write`` references can be represented in the AST as a special >> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced >> buffer class type. >> >> In SIL the buffer reference will have type:: >> >> $@sil_cow COWBuffer >> >> where ``COWBuffer`` is the type of the referenced heap object. >> >> Two conversion instructions are needed to convert from a ``@sil_cow`` >> reference >> type to a regular reference type:: >> >> cow_to_ref >> ref_to_cow >> >> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >> >> For example the SIL code for:: >> >> var c: COWType >> let x = c.b.someData >> >> would be:: >> >> %1 = struct_extract %1 : COWType, #COWType.b >> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >> %4 = load %3 : $*Int >> >> The ``ref_to_cow`` instruction is needed to store a new buffer reference >> into a >> COW type. >> >> COW Buffers and the Optimizer >> ============================= >> >> A reference to a COW buffer gives the optimizer additional information: >> >> *A buffer, referenced by a @sil_cow reference is considered to be immutable >> during the lifetime of the reference.* > > This seems like much too broad a rule to allow inplace mutations of > uniquely referenced buffers. The point is that all mutations must be guarded by an is_unique, which takes the _address_ of the buffer reference as argument. And the optimizer considers this instruction as a potential write to the buffer reference. The effect is that the lifetime of a buffer reference (as a SIL value) will not outlive a is_unique - regardless if this is inside a called function or inlined. > Unless you mean the reference is > immutable, rather than the storage being referred to by it. > >> This means any address derived from a ``cow_to_ref`` instruction can be >> considered to point to immutable memory. >> >> Some examples of optimizations which will benefit from copy-on-write >> representation in SIL: >> >> - Redundant load elimination >> >> RLE can assume that opaque code does not modify a COW buffer. > > How do you distinguish “opaque code” from “code that is meant to > modify the buffer and might do so in place if it's uniquely-referenced?” Again, the is_unique which takes the address of the reference, will guarantee that during the lifetime of a buffer there are no modifications of the buffer. > >> Example:: >> >> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >> %3 = ref_element_addr %2 : $COWBuffer, #someData >> %4 = load %3 : $*Int >> %5 = apply %foo() // Cannot overwrite memory >> location %3 >> %6 = load %3 : $*Int // Can be replaced by %4 >> >> Currently we do some ad-hoc optimizations for array, based on semantics, >> like array count propagation. These hacks would not be needed >> anymore. > > W0000000000000000000000t. > >> Note that it’s not required to check if a ``cow_to_ref`` reference (or a >> projected address) escapes. Even if it escapes, it will reference immutable >> memory. >> >> - CSE, loop hoisting >> >> Similar to RLE: the optimizer can assume that opaque code cannot modify a >> COW buffer > > Same question here as above, then. >> >> - ARC optimization >> >> Knowing that some opaque code cannot overwrite a reference in the COW buffer >> can remove retain/release pairs across such code:: >> >> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >> %3 = ref_element_addr %2 : $COWBuffer, #someRef >> %4 = load_strong %3 : $*MyClass // Can do a load_strong >> [guarantee] >> %5 = apply %foo() // Cannot overwrite someRef >> and dealloc the object >> // Use %4 >> destroy_value %4 : $MyClass >> >> Scoping instructions >> -------------------- >> >> To let the optimizer reason about the immutability of the COW buffer, it is >> important to *bind* the lifetime of the buffer content to the lifetime of the >> buffer reference. For example:: >> >> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >> // load something from %b1 >> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >> again >> // load something from %b2 >> >> The question is: can RLE forward the load of the buffer reference and replace >> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the >> buffer. >> >> To enforce this restriction, the scope of any buffer modification must be >> enclosed in a pair of SIL instructions. Those instructions define the scope >> of the mutation. Both instructions take the *address* of the buffer >> reference as operand and act as a potential write to the buffer reference. >> >> The purpose of the scoping instructions is to strictly separate the >> liferanges >> of references to an immutable buffer and references to the mutable buffer. > > Looks reasonable. > >> The following example shows why the scoping instructions (specifically the >> end-of-scope instruction) are required to prevent loop-hoisting from >> interleaving mutable and immutable liferanges:: >> >> // there should be a begin-of-scope %baddr >> %mut_b = load %baddr >> store %x to %mut_b // modification of the buffer >> // there should be a end-of-scope %baddr >> >> loop { >> %b = load %baddr >> %y = load %b // load from the buffer >> ... >> } >> >> If there is no end-of-scope instruction, loop hoisting could do:: >> >> %mut_b = load %baddr >> %b = load %baddr // moved out of the loop >> store %x to %mut_b >> >> loop { >> %y = load %b >> ... >> } >> >> Now the optimizer assumes that ``%b`` references an immutable buffer, so it >> could >> also hoist the load:: >> >> %mut_b = load %baddr >> %b = load %baddr >> %y = load %b // Wrong! Will be overwritten by the following store >> store %x to %mut_b >> >> loop { >> ... >> } >> >> >> The following sections describe two alternatives to implement the scoping. >> >> Scoping Alternative 1: Explicit Built-ins >> ----------------------------------------- >> >> SIL instructions >> ^^^^^^^^^^^^^^^^ >> >> The existing ``is_unique`` instruction is changed to a terminator >> instruction:: >> >> bb0: >> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >> address of the COWBuffer reference >> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >> reference. Physically identical to "load %0” >> // usually empty >> br bb3(%1 : $COWBuffer) >> bb2: // the false-block >> // usually contains: >> %2 = apply %copy_buffer >> %3 = cow_to_ref %2 >> store_strong %3 to %0 : $*@sil_cow COWBuffer >> br bb3(%2 : $COWBuffer) >> bb3(%4 : $COWBuffer): >> // Modify the buffer referenced by %4 >> // ... >> >> The end-of-scope instruction is:: >> >> end_unique_addr %0 : $*COWBuffer >> >> It is important that the references to the unique buffers (``%1``, ``%2``) >> must >> not outlive ``end_unique_addr``. In most cases this can be check by the SIL >> verifier. >> >> The two instructions must be paired properly but not necessarily in the >> same function. >> >> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to >> separate the lifetimes of mutable and immutable accesses to the COW buffer. >> Both instructions take an address to the COW buffer reference and are >> considered as potential stores to the reference. >> This makes sure that the SIL optimizer cannot mix-up buffer reference >> lifetimes >> across these instructions. >> For example, RLE cannot combine two buffer loads which are interleaved with >> a ``is_unique_addr_br``:: >> >> %1 = load_strong %0 : $*@sil_cow COWBuffer >> // do something with %1 >> … >> is_unique_addr_br %0 : $*@sil_cow COWBuffer >> … >> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >> with %1 >> >> Another important thing is that the COW buffer can only be mutated by using >> the >> reference of the ``is_unique_addr_br`` true-block argument. >> The COW buffer cannot be modified by simply loading/extracting the reference >> from the COWType. >> Example:: >> >> %1 = load_strong %0 : $*COWBuffer >> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >> %3 = ref_element_addr %2 : $COWBuffer, #someData >> store %7 : $Int to %3 : $*Int // Violation! >> >> Most obvious violations to this constraint can be catched by the SILVerifier. >> >> The ``_addr`` variants of the instructions also have a non-addr counterpart:: >> >> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces the >> true-block arg as owned >> >> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned >> >> These instructions are generated by Mem2reg (or a similar optimization) >> in case the COW value is stored (in a temporary alloc_stack location) >> just for the sake of passing an address to ``is_unique_addr_br`` and >> ``end_unique_addr``. >> For example in the following code, where the COW data is passed as-value and >> all the mutating functions are inlined:: >> >> func foo(arr : [Int], x: Int) { >> arr[0] = 27 >> … >> y = arr[x] >> … >> } >> >> Finally it’s probably a good idea to add an instruction for converting an >> immutable reference to a mutable reference:: >> >> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >> $COWBuffer as owned >> >> which is basically just a simpler representation of the following pattern:: >> >> bb0: >> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >> bb1(%1 : $COWBuffer): >> … // main control flow continues here >> bb2: >> unreachable >> >> An optimizations, which eliminate uniqueness checks, would replace a >> ``is_unique_br`` by a ``start_unique``. >> >> Built-ins >> ^^^^^^^^^ >> >> A COW type implementor can generate the new instructions by using a set of >> built-ins:: >> >> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >> func endUnique<BufferType>(_ buffer: inout BufferType) >> >> For example:: >> >> struct COWType { >> copy_on_write var b : COWBuffer >> >> mutating func makeMutable() -> COWBuffer { >> if let uniqueBuffer = isUnique(&self.b) { >> return uniqueBuffer >> } >> let copiedBuffer = copyBuffer(self.b) >> self.b = copiedBuffer >> return copiedBuffer >> } >> >> mutating func setSomeData(x: Int) { >> let uniqueBuffer = makeMutable() >> uniqueBuffer.someData = x >> endUnique(&self.b) >> } >> } > > This seems reasonable, but it also looks like the compiler could do the > `endUnique` dance for us based, e.g., on the mutability of methods. I agree, that would be ideal, e.g. the compiler could insert the endUnique at the end of an inout scope. > >> The ``isUnique`` built-in returns an optional unique buffer reference. >> Physically this is the COW buffer which is passed as the inout argument. >> The result is nil if the buffer is not uniquely referenced. >> In this case usually the original buffer is copied and the reference to the >> copy is written back to the original buffer reference location >> (``self.b = copiedBuffer``). >> Starting at the point of the write-back, the reference to the copy also >> becomes >> a unique buffer reference. >> >> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern >> which >> constructs the Optional in the successor blocks. Using ``isUnique`` in an >> if-let (as shown above) will end up in two consecutive CFG "diamonds". >> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond. >> >> .. note:: >> This makes the definition of the unique buffer location lifetime a little >> bit >> problematic, because the false-branch of ``isUnique`` is not equivalent to >> the false-branch of the ``is_unique_addr_br`` instruction (before >> SimplifyCFG >> can do its job). > > I don't know what the implications of these diamonds and the problem > described above might be, FWIW. > >> The rules for using ``copy_on_write`` and the built-ins are: >> >> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the >> same function. >> >> 2. The COW buffer may only be mutated by using the unique buffer reference. >> >> 3. The COW buffer must not be mutated outside the ``isUnique`` - >> ``endUnique`` >> pair. >> >> 4. During the lifetime of the unique buffer reference, the original COW >> buffer >> reference must not be used in any way, e.g. for reading from the buffer. >> >> Note that the lifetime of the unique buffer reference does not include the >> part between the begin of the ``isUnique`` false-branch and the write-back >> of the copy. This means is okay to read from the buffer (using ``self.b``) >> for the purpose of copying. >> >> Examples:: >> >> mutating func setSomeData(x: Int) { >> let uniqueBuffer = makeMutable() >> uniqueBuffer.someData = x >> // violates rule 1 >> } >> >> mutating func setSomeData(x: Int) { >> makeMutable() >> self.b.someData = x // violates rule 2 >> endUnique(&self.b) >> } >> >> mutating func setSomeData(x: Int) { >> let uniqueBuffer = makeMutable() >> uniqueBuffer.someData = x >> endUnique(&self.b) >> uniqueBuffer.someData = 27 // violates rule 3 >> } >> >> mutating func incrementSomeData() { >> let uniqueBuffer = makeMutable() >> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >> endUnique(&self.b) >> } > > It would be instructive to write down the *correct* code for these > operations. added to my todo list. > >> The intention of the rules is to ensure that there is no overlap of a >> "read-only" life-range with a "mutable" life-range of the buffer reference. >> It’s the responsibility of the implementor to follow the rules. >> But the compiler (a mandatory diagnostics pass and the SIL verifier) can >> statically detect rule violations in obvious cases (with inter-procedural >> analysis maybe even in most cases). >> >> This approach would require to change some of the internals of our >> current COW data structures in the stdlib (Array, Dictionary, etc.). >> For example, the Array make_mutable semantic functions currently do not >> return >> the unique buffer. > > No big deal. > >> Scoping Alternative 2: Implicit Inout Scopes >> -------------------------------------------- >> >> There is an idea (proposal?) to change the representation of inout variables >> in SIL. This is independent of this proposal, but can be helpful for the >> purpose of defining the scope of a COW mutation. >> >> The basic idea is that SILGen inserts scoping instructions for *all* inout >> variables. And those scoping instructions can be used to define the mutating >> scope of a COW buffer. >> >> The scoping instructions which are inserted by SILGen for an inout scope >> are:: >> >> begin_exclusive >> end_exclusive >> >> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions >> take the >> address of the inout variable as argument. For the optimizer those >> instructions >> look like potential writes to the inout variable. >> >> The implementor of a COW type has to follow the rule that the COW buffer may >> only be modified in mutating functions of the COW type. But this is the case >> anyway because any modification needs a uniqueness check and this can only be >> done in mutating functions. >> >> Example:: >> >> // > mutating func setSomeData(x: Int) { >> // Accepts a unique reference to the array value (avoiding refcount >> operations) >> sil @setSomeData : $(Int, @inout Array) -> () { >> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >> >> // > makeMutable() (inlined) >> // Forward the unique reference to the `self` array value, still avoiding >> refcount operations. >> // Begin the inlined exclusive scope (could be trivially removed). >> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >> >> // > if !isUnique(&self._storage) { >> // Extract a unique inout reference to the class reference to the array >> storage. >> // This begins the isUnique() argument's exclusive scope. The memory is >> already exclusive >> // but the scope helps ensure this is the only alias to _storage. >> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >> #Array._storage >> >> // Uniqueness checking requires an inout reference to the class reference. >> // The is_unique instruction does not need to create a new storage >> reference. >> // It's only purpose is to check the RC count, ensure that the checked >> reference >> // is inout, and prevent the inout scope from being optimized away. >> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >> >> // End the isUnique argument's exclusive scope (can also be trivially >> removed). >> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >> >> br %isuniq, bb_continue, bb_slow >> >> bb_slow: >> // > self._storage = copyBuffer(self._storage) >> // Produce a new class reference to storage with verifiably unique RC >> semantics. >> %copied_storage_class = alloc_ref ... >> // A begin/end exclusive scope is implicit in store [assign]. >> store [assign] %copied_storage_class to %arrayref._storageref >> br bb_continue >> >> bb_continue: >> >> // This marks the end of makeMutable's inout `self` scope. Because Array >> // contains a "copy_on_write" property, the SIL verifier needs to >> // prove that %arrayref.#_storage has not escaped at this point. This >> // is equivalent to checking that %arrayref itself is not copied, and >> // checking each projection of the "copy_on_write" storage property >> // (%arrayref._storageref) is not copied. Or, if any copies are present, >> // they must be consumed within this scope. >> end_exclusive %arrayref : $*Array<T> // End scope #1 >> >> // > self._storage.someData = x >> // An _addr instruction with one load/store use doesn't really need its >> own scope. >> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >> >> // ARC optimization can promote this to a borrow, replacing >> strong_release with end_borrow. >> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow >> ArrayStorage >> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >> ArrayStorage >> >> // Write some data into the CoW buffer. >> // (For simplicity, pretend ArrayStorage has a "someData" field). >> // A single-use _addr instruction, so no scope. >> %somedata_addr = ref_element_addr %arrayref._storage, #someData >> // A store with an implicit [exclusive] scope. >> store [assign] %x to %somedata_addr >> >> strong_release %arrayref._storage : $*ArrayStorage<T> >> >> // End the isUnique argument's exclusive scope. >> // The same verification is needed here, but the inner scope would be >> eliminated. >> end_exclusive %arrayref : $*Array<T> // End scope #0 >> >> In general this approach looks more "user-friendly" than the first >> alternative. > > Well, I can't really tell, because you haven't shown the Swift code that > generates this SIL. > >> But it depends on implementing the general feature to insert the inout >> scoping instructions. Also, we still have to think through all the >> details of this approach. > > FWIW, I am convinced we will need (and get) a stricter inout model that > would be conducive to inserting the scoping instructions. > > >> Dependency between a buffer reference to the scope-begin >> -------------------------------------------------------- > > You can only have a dependency between two things, but as phrased “a > buffer reference to the scope-begin” sounds like one thing. s/to/and/ > would fix it. > >> With both alternatives there is no explicit dependency from a buffer >> reference >> to a scope-begin instruction:: >> >> %b_cow = load %baddr >> %b = cow_to_ref %b_cow >> %x = load %b // No dependency between this... >> ... >> begin_exclusive %baddr // ... and this instruction. >> ... >> >> So in theory the optimizer is free to reschedule the instructions:: >> >> %b_cow = load %baddr >> %b = cow_to_ref %b_cow >> ... >> begin_exclusive %baddr >> %x = load %b // Wrong! Buffer could be modified here >> ... >> >> We still have to figure out how to cope with this. >> >> - We could add an end-of-lifetime instruction for a COW buffer reference, >> which >> the optimizer may not move over a begin-of-scope instruction. >> >> - Or we just define the implicit rule for the optimizer that any use of a COW >> reference may not be moved over a begin-of-scope instruction. >> >> Preconditions >> ============= >> >> To benefit from COW optimizations in the stdlib Array, Set and Dictionary >> data >> structures we first need eager bridging, meaning getting rid of the bridged >> buffer. > > As you know I'm very much in favor of eager bridging, but I don't see > why this would be dependent on it. We could use copy_on_write with eager bridging, but I don’t think it will give any benefits to the optimizer. For example, the SIL code to get from an Array to a native ContiguousArrayStorage reference is pretty hard to understand for the optimizer (involves low level bit operations, etc.). > >> At least for Array this is implemented as low-level bit operations and >> optimizations cannot reason about it (e.g. finding a reasonable >> RC-root for the buffer reference). >> >> Another thing is that we currently cannot use ``copy_on_write`` for Array >> because of pinning. Array pins it’s buffer when passing an element address to >> an inout parameter. This allows the array buffer to be modified even if its >> reference count is > 1. With ``copy_on_write``, a programmer could break >> memory >> safety when violating the inout rule. Example:: >> >> var arr = [MyClass()] // a global array >> >> foo(&arr[0]) // Pins the buffer of arr during the call >> >> func foo(_ x: inout MyClass) -> Int { >> let b = arr // The ref-count of the buffer is not incremented, >> because it is pinned! >> let r = b[0] // optimizer removes the retain of r because it >> thinks the following code cannot modify b >> arr.removeAll() // does not copy the array buffer and thus >> de-allocates r >> return r.i // use-after-free! >> } > > I only know of one way to resolve inout and pinning: > > * Semantically, references are replaced with a trap value when entering > an inout context so that all inout values are provably unique > references in the absence of unsafe code. We drop pinning and provide > explicit operations that provide simultaneous lvalue accesses to > distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. > > If there are other ideas out there, I'd like to hear them. If not, we > should probably decide that this is what we're doing so that we can move > forward without this looming uncertainty. > > -- > -Dave > > _______________________________________________ > swift-dev mailing list > swift-dev@swift.org > https://lists.swift.org/mailman/listinfo/swift-dev
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev