Re: Using transducers in a new transducing context

Alexander Gunnarson Mon, 10 Apr 2017 09:48:56 -0700

Léo, I definitely agree that you can use unsynchronized mutable stateful 
transducers *as long as you can guarantee they'll be used only in 
single-threaded contexts. *We were talking up above on which version of 
synchronization is appropriate for which context. With core.async, if 
you're using a transducer on a `chan` or `pipeline` or the like, it is 
guaranteed that only one thread will use that at a time (thus `atom`s 
weren't needed), *but *a different thread might come in and reuse that same 
stateful transducer, in which case the result of that mutation will need to 
propagate to that thread via a `volatile`. With reducers `fold`, stateful 
transducers don't necessarily hold up their contract (e.g. with 
`map-indexed` as we discussed above) even if you use an `atom` or the like. 
But in truly single-threaded contexts, even within a `go` block or a 
`thread` or the like (as long as the transducer is not re-used e.g. on a 
`chan` etc. where the necessity for a `volatile` applies), it's certainly 
fine to use unsynchronized mutable stateful transducers.


On Monday, April 10, 2017 at 9:37:29 AM UTC-4, Léo Noel wrote:
>
> This topic is of high interest to me as it is at the core of my current 
> works. I had a similar questioning a while ago 
> <https://groups.google.com/forum/#!topic/clojure/2WtfyLG2Jls> and I have 
> to say I'm even more confused with this :
>
> While transducing processes may provide locking to cover the visibility of 
>> state updates in a stateful transducer, transducers should still use 
>> stateful constructs that ensure visibility (by using volatile, atoms, etc).
>>
>
> I actually tried pretty hard to find a use case that would make 
> partition-all fail because of its unsynchronized local state, and did not 
> manage to find one that did not break any contract. I arrived at the 
> conclusion that it is always safe to use unsynchronized constructs in 
> stateful transducers. The reason is that you need to ensure that the result 
> of each step is given to the next, and doing so you will necessarily set a 
> memory barrier of some sort between each step. Each step happens-before the 
> next, and therefore mutations performed by the thread at step n are always 
> visible by the thread performing the step n+1. This is really brilliant : 
> when designing a transducer, you can be confident that calls to your 
> reducing function will be sequential and stop worrying about concurrency. 
> You just have to ensure that mutable state stays local. True encapsulation, 
> the broken promise of object-oriented programming.
>
> My point is that the transducer contract "always feed the result of step n 
> as the first argument of step n+1" is strong enough to safely use local 
> unsynchronized state. For this reason, switching partition-* transducers to 
> volatile constructs really sounds like a step backwards to me. However, 
> after re-reading the documentation on transducers, I found that this 
> contract is not explicitly stated. It is just *natural* to think this way, 
> because transducers are all about reducing processes. Is there a plan to 
> reconsider this principle ? I would be very interested to know what Rich 
> has in mind that could lead him to advise to overprotect local state of 
> transducers.
>
>
>
> On Monday, April 10, 2017 at 4:44:00 AM UTC+2, Alexander Gunnarson wrote:
>>
>> Thanks so much for your input Alex! It was a very helpful confirmation of 
>> the key conclusions arrived at in this thread, and I appreciate the 
>> additional elaborations you gave, especially the insight you passed on 
>> about the stateful transducers using `ArrayList`. I'm glad that I wasn't 
>> the only one wondering about the apparent lack of parity between its 
>> unsynchronized mutability and the volatile boxes used for e.g. 
>> `map-indexed` and others.
>>
>> As an aside about the stateful `take` transducer, Tesser uses the 
>> equivalent of one but skirts the issue by not guaranteeing that the first n 
>> items of the collection will be returned, but rather, n items of the 
>> collection in no particular order and starting at no particular index. This 
>> is achievable without Tesser by simply replacing the `volatile` in the 
>> `core/take` transducer with an `atom` and using it with `fold`. But yes, 
>> `take`'s contract is broken with this and so still follows the rule of 
>> thumb you established that `fold` can't use stateful transducers (at least, 
>> not without weird things like reordering of the indices in `map-indexed` 
>> and so on).
>>
>> That's interesting that `fold` can use transducers directly! I haven't 
>> tried that yet — I've just been wrapping them in an `r/folder`.
>>
>> On Sunday, April 9, 2017 at 10:22:13 PM UTC-4, Alex Miller wrote:
>>>
>>> Hey all, just catching up on this thread after the weekend. Rich and I 
>>> discussed the thread safety aspects of transducers last fall and the 
>>> intention is that transducers are expected to only be used in a single 
>>> thread at a time, but that thread can change throughout the life of the 
>>> transducing process (for example when a go block is passed over threads in 
>>> a pool in core.async). While transducing processes may provide locking to 
>>> cover the visibility of state updates in a stateful transducer, transducers 
>>> should still use stateful constructs that ensure visibility (by using 
>>> volatile, atoms, etc).
>>>
>>> The major transducing processes provided in core are transduce, into, 
>>> sequence, eduction, and core.async. All but core.async are single-threaded. 
>>> core.async channel transducers may occur on many threads due to interaction 
>>> with the go processing threads, but never happen on more than one thread at 
>>> a time. These operations are covered by the channel lock which should 
>>> guarantee visibility. Transducers used within a go block (via something 
>>> like transduce or into) occur eagerly and don't incur any switch in threads 
>>> so just fall back to the same old expectations of single-threaded use and 
>>> visibility.
>>>
>>> Note that there are a couple of stateful transducers that use ArrayList 
>>> (partition-by and partition-all). From my last conversation with Rich, he 
>>> said those should really be changed to protect themselves better with 
>>> volatile or something else. I thought I wrote up a ticket for this but 
>>> looks like maybe I didn't, so I will take care of that. 
>>>
>>> Reducer fold is interesting in that each "bucket" is reduced via its 
>>> reduce function, which can actually use a transducer (since that produces a 
>>> reduce function), however, it can't be a stateful transducer (something 
>>> like take, etc).
>>>
>>> Hope that helps with respect to intent.
>>>
>>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using transducers in a new transducing context

Reply via email to