Re: [Pharo-users] Porting Transducers to Pharo

Stephane Ducasse Mon, 05 Jun 2017 00:16:05 -0700

Hi Steffen


> The short answer is that the compact notation turned out to work much better
> for me in my code, especially, if multiple transducers are involved. But
> that's my personal taste. You can choose which suits you better. In fact,
>
>   1000 take.
>
> just sits on top and simply calls
>
>   Take number: 1000.

To me this is much much better.


> If the need arises, we could of course factor the compact notation out into
> a separate package.
Good idea

 Btw, would you prefer (Take n: 1000) over (Take number:
> 1000)?

I tend to prefer explicit selector :)


> Damien, you're right, I experimented with additional styles. Right now, we
> already have in the basic Transducer package:
>
>   (collection transduce: #squared map * 1000 take. "which is equal to"
>   (collection transduce: #squared map) transduce: 1000 take.
>
> Basically, one can split #transduce:reduce:init: into single calls of
> #transduce:, #reduce:, and #init:, depending on the needs.
> I also have an (unfinished) extension, that allows to write:
>
>   (collection transduce map: #squared) take: 1000.

To me this is much mre readable.
I cannot and do not want to use the other forms.


> This feels familiar, but becomes a bit hard to read if more than two steps
> are needed.
>
>   collection transduce
>                map: #squared;
>                take: 1000.

Why this is would hard to read. We do that all the time everywhere.


> I think, this alternative would reads nicely. But as the message chain has
> to modify the underlying object (an eduction), very snaky side effects may
> occur. E.g., consider
>
>   eduction := collection transduce.
>   squared  := eduction map: #squared.
>   take     := squared take: 1000.
>
> Now, all three variables hold onto the same object, which first squares all
> elements and than takes the first 1000.

This is because the programmer did not understand what he did. No?



Stef

PS: I played with infinite stream and iteration back in 1993 in CLOS.
Now I do not like to mix things because it breaks my flow of thinking.


>
> Best,
> Steffen
>
>
>
>
>
> Am .06.2017, 21:28 Uhr, schrieb Damien Pollet
> <damien.pollet+ph...@gmail.com>:
>
>> If I recall correctly, there is an alternate protocol that looks more like
>> xtreams or the traditional select/collect iterations.
>>
>> On 2 June 2017 at 21:12, Stephane Ducasse <stepharo.s...@gmail.com> wrote:
>>
>>> I have a design question
>>>
>>> why the library is implemented in functional style vs messages?
>>> I do not see why this is needed. To my eyes the compact notation
>>> goes against readibility of code and it feels ad-hoc in Smalltalk.
>>>
>>>
>>> I really prefer
>>>
>>> square := Map function: #squared.
>>> take := Take number: 1000.
>>>
>>> Because I know that I can read it and understand it.
>>> From that perspective I prefer Xtreams.
>>>
>>> Stef
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, May 31, 2017 at 2:23 PM, Steffen Märcker <merk...@web.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am the developer of the library 'Transducers' for VisualWorks. It was
>>>> formerly known as 'Reducers', but this name was a poor choice. I'd like
>>>> to
>>>> port it to Pharo, if there is any interest on your side. I hope to learn
>>>> more about Pharo in this process, since I am mainly a VW guy. And most
>>>> likely, I will come up with a bunch of questions. :-)
>>>>
>>>> Meanwhile, I'll cross-post the introduction from VWnc below. I'd be very
>>>> happy to hear your optinions, questions and I hope we can start a
>>>> fruitful
>>>> discussion - even if there is not Pharo port yet.
>>>>
>>>> Best, Steffen
>>>>
>>>>
>>>>
>>>> Transducers are building blocks that encapsulate how to process elements
>>>> of a data sequence independently of the underlying input and output
>>>> source.
>>>>
>>>>
>>>>
>>>> # Overview
>>>>
>>>> ## Encapsulate
>>>> Implementations of enumeration methods, such as #collect:, have the
>>>> logic
>>>> how to process a single element in common.
>>>> However, that logic is reimplemented each and every time. Transducers
>>>> make
>>>> it explicit and facilitate re-use and coherent behavior.
>>>> For example:
>>>> - #collect: requires mapping: (aBlock1 map)
>>>> - #select: requires filtering: (aBlock2 filter)
>>>>
>>>>
>>>> ## Compose
>>>> In practice, algorithms often require multiple processing steps, e.g.,
>>>> mapping only a filtered set of elements.
>>>> Transducers are inherently composable, and hereby, allow to make the
>>>> combination of steps explicit.
>>>> Since transducers do not build intermediate collections, their
>>>> composition
>>>> is memory-efficient.
>>>> For example:
>>>> - (aBlock1 filter) * (aBlock2 map)   "(1.) filter and (2.) map elements"
>>>>
>>>>
>>>> ## Re-Use
>>>> Transducers are decoupled from the input and output sources, and hence,
>>>> they can be reused in different contexts.
>>>> For example:
>>>> - enumeration of collections
>>>> - processing of streams
>>>> - communicating via channels
>>>>
>>>>
>>>>
>>>> # Usage by Example
>>>>
>>>> We build a coin flipping experiment and count the occurrence of heads
>>>> and
>>>> tails.
>>>>
>>>> First, we associate random numbers with the sides of a coin.
>>>>
>>>>     scale := [:x | (x * 2 + 1) floor] map.
>>>>     sides := #(heads tails) replace.
>>>>
>>>> Scale is a transducer that maps numbers x between 0 and 1 to 1 and 2.
>>>> Sides is a transducer that replaces the numbers with heads an tails by
>>>> lookup in an array.
>>>> Next, we choose a number of samples.
>>>>
>>>>     count := 1000 take.
>>>>
>>>> Count is a transducer that takes 1000 elements from a source.
>>>> We keep track of the occurrences of heads an tails using a bag.
>>>>
>>>>     collect := [:bag :c | bag add: c; yourself].
>>>>
>>>> Collect is binary block (reducing function) that collects events in a
>>>> bag.
>>>> We assemble the experiment by transforming the block using the
>>>> transducers.
>>>>
>>>>     experiment := (scale * sides * count) transform: collect.
>>>>
>>>>   From left to right we see the steps involved: scale, sides, count and
>>>> collect.
>>>> Transforming assembles these steps into a binary block (reducing
>>>> function)
>>>> we can use to run the experiment.
>>>>
>>>>     samples := Random new
>>>>                   reduce: experiment
>>>>                   init: Bag new.
>>>>
>>>> Here, we use #reduce:init:, which is mostly similar to #inject:into:.
>>>> To execute a transformation and a reduction together, we can use
>>>> #transduce:reduce:init:.
>>>>
>>>>     samples := Random new
>>>>                   transduce: scale * sides * count
>>>>                   reduce: collect
>>>>                   init: Bag new.
>>>>
>>>> We can also express the experiment as data-flow using #<~.
>>>> This enables us to build objects that can be re-used in other
>>>> experiments.
>>>>
>>>>     coin := sides <~ scale <~ Random new.
>>>>     flip := Bag <~ count.
>>>>
>>>> Coin is an eduction, i.e., it binds transducers to a source and
>>>> understands #reduce:init: among others.
>>>> Flip is a transformed reduction, i.e., it binds transducers to a
>>>> reducing
>>>> function and an initial value.
>>>> By sending #<~, we draw further samples from flipping the coin.
>>>>
>>>>     samples := flip <~ coin.
>>>>
>>>> This yields a new Bag with another 1000 samples.
>>>>
>>>>
>>>>
>>>> # Basic Concepts
>>>>
>>>> ## Reducing Functions
>>>>
>>>> A reducing function represents a single step in processing a data
>>>> sequence.
>>>> It takes an accumulated result and a value, and returns a new
>>>> accumulated
>>>> result.
>>>> For example:
>>>>
>>>>     collect := [:col :e | col add: e; yourself].
>>>>     sum := #+.
>>>>
>>>> A reducing function can also be ternary, i.e., it takes an accumulated
>>>> result, a key and a value.
>>>> For example:
>>>>
>>>>     collect := [:dic :k :v | dict at: k put: v; yourself].
>>>>
>>>> Reducing functions may be equipped with an optional completing action.
>>>> After finishing processing, it is invoked exactly once, e.g., to free
>>>> resources.
>>>>
>>>>     stream := [:str :e | str nextPut: each; yourself] completing:
>>>> #close.
>>>>     absSum := #+ completing: #abs
>>>>
>>>> A reducing function can end processing early by signaling Reduced with a
>>>> result.
>>>> This mechanism also enables the treatment of infinite sources.
>>>>
>>>>     nonNil := [:res :e | e ifNil: [Reduced signalWith: res] ifFalse:
>>>> [res]].
>>>>
>>>> The primary approach to process a data sequence is the reducing protocol
>>>> with the messages #reduce:init: and #transduce:reduce:init: if
>>>> transducers
>>>> are involved.
>>>> The behavior is similar to #inject:into: but in addition it takes care
>>>> of:
>>>> - handling binary and ternary reducing functions,
>>>> - invoking the completing action after finishing, and
>>>> - stopping the reduction if Reduced is signaled.
>>>> The message #transduce:reduce:init: just combines the transformation and
>>>> the reducing step.
>>>>
>>>> However, as reducing functions are step-wise in nature, an application
>>>> may
>>>> choose other means to process its data.
>>>>
>>>>
>>>> ## Reducibles
>>>>
>>>> A data source is called reducible if it implements the reducing
>>>> protocol.
>>>> Default implementations are provided for collections and streams.
>>>> Additionally, blocks without an argument are reducible, too.
>>>> This allows to adapt to custom data sources without additional effort.
>>>> For example:
>>>>
>>>>     "XStreams adaptor"
>>>>     xstream := filename reading.
>>>>     reducible := [[xstream get] on: Incomplete do: [Reduced signal]].
>>>>
>>>>     "natural numbers"
>>>>     n := 0.
>>>>     reducible := [n := n+1].
>>>>
>>>>
>>>> ## Transducers
>>>>
>>>> A transducer is an object that transforms a reducing function into
>>>> another.
>>>> Transducers encapsulate common steps in processing data sequences, such
>>>> as
>>>> map, filter, concatenate, and flatten.
>>>> A transducer transforms a reducing function into another via #transform:
>>>> in order to add those steps.
>>>> They can be composed using #* which yields a new transducer that does
>>>> both
>>>> transformations.
>>>> Most transducers require an argument, typically blocks, symbols or
>>>> numbers:
>>>>
>>>>     square := Map function: #squared.
>>>>     take := Take number: 1000.
>>>>
>>>> To facilitate compact notation, the argument types implement
>>>> corresponding
>>>> methods:
>>>>
>>>>     squareAndTake := #squared map * 1000 take.
>>>>
>>>> Transducers requiring no argument are singletons and can be accessed by
>>>> their class name.
>>>>
>>>>     flattenAndDedupe := Flatten * Dedupe.
>>>>
>>>>
>>>>
>>>> # Advanced Concepts
>>>>
>>>> ## Data flows
>>>>
>>>> Processing a sequence of data can often be regarded as a data flow.
>>>> The operator #<~ allows define a flow from a data source through
>>>> processing steps to a drain.
>>>> For example:
>>>>
>>>>     squares := Set <~ 1000 take <~ #squared map <~ (1 to: 1000).
>>>>     fileOut writeStream <~ #isSeparator filter <~ fileIn readStream.
>>>>
>>>> In both examples #<~ is only used to set up the data flow using reducing
>>>> functions and transducers.
>>>> In contrast to streams, transducers are completely independent from
>>>> input
>>>> and output sources.
>>>> Hence, we have a clear separation of reading data, writing data and
>>>> processing elements.
>>>> - Sources know how to iterate over data with a reducing function, e.g.,
>>>> via #reduce:init:.
>>>> - Drains know how to collect data using a reducing function.
>>>> - Transducers know how to process single elements.
>>>>
>>>>
>>>> ## Reductions
>>>>
>>>> A reduction binds an initial value or a block yielding an initial value
>>>> to
>>>> a reducing function.
>>>> The idea is to define a ready-to-use process that can be applied in
>>>> different contexts.
>>>> Reducibles handle reductions via #reduce: and #transduce:reduce:
>>>> For example:
>>>>
>>>>     sum := #+ init: 0.
>>>>     sum1 := #(1 1 1) reduce: sum.
>>>>     sum2 := (1 to: 1000) transduce: #odd filter reduce: sum.
>>>>
>>>>     asSet := [:set :e | set add: e; yourself] initializer: [Set new].
>>>>     set1 := #(1 1 1) reduce: asSet.
>>>>     set2 := #(1 to: 1000) transduce: #odd filter reduce: asSet.
>>>>
>>>> By combining a transducer with a reduction, a process can be further
>>>> modified.
>>>>
>>>>     sumOdds := sum <~ #odd filter
>>>>     setOdds := asSet <~ #odd filter
>>>>
>>>>
>>>> ## Eductions
>>>>
>>>> An eduction combines a reducible data sources with a transducer.
>>>> The idea is to define a transformed (virtual) data source that needs not
>>>> to be stored in memory.
>>>>
>>>>     odds1 := #odd filter <~ #(1 2 3) readStream.
>>>>     odds2 := #odd filter <~ (1 to 1000).
>>>>
>>>> Depending on the underlying source, eductions can be processed once
>>>> (streams, e.g., odds1) or multiple times (collections, e.g., odds2).
>>>> Since no intermediate data is stored, transducers actions are lazy,
>>>> i.e.,
>>>> they are invoked each time the eduction is processed.
>>>>
>>>>
>>>>
>>>> # Origins
>>>>
>>>> Transducers is based on the same-named Clojure library and its ideas.
>>>> Please see:
>>>> http://clojure.org/transducers
>>>>
>>>>
>

Re: [Pharo-users] Porting Transducers to Pharo

Reply via email to