Coupling this with Olek's work on the DataFrame could really come handy. Phil
On Mon, Jun 5, 2017 at 9:14 AM, Stephane Ducasse <stepharo.s...@gmail.com> wrote: > Hi Steffen > > > > The short answer is that the compact notation turned out to work much > better > > for me in my code, especially, if multiple transducers are involved. But > > that's my personal taste. You can choose which suits you better. In fact, > > > > 1000 take. > > > > just sits on top and simply calls > > > > Take number: 1000. > > To me this is much much better. > > > > If the need arises, we could of course factor the compact notation out > into > > a separate package. > Good idea > > Btw, would you prefer (Take n: 1000) over (Take number: > > 1000)? > > I tend to prefer explicit selector :) > > > > Damien, you're right, I experimented with additional styles. Right now, > we > > already have in the basic Transducer package: > > > > (collection transduce: #squared map * 1000 take. "which is equal to" > > (collection transduce: #squared map) transduce: 1000 take. > > > > Basically, one can split #transduce:reduce:init: into single calls of > > #transduce:, #reduce:, and #init:, depending on the needs. > > I also have an (unfinished) extension, that allows to write: > > > > (collection transduce map: #squared) take: 1000. > > To me this is much mre readable. > I cannot and do not want to use the other forms. > > > > This feels familiar, but becomes a bit hard to read if more than two > steps > > are needed. > > > > collection transduce > > map: #squared; > > take: 1000. > > Why this is would hard to read. We do that all the time everywhere. > > > > I think, this alternative would reads nicely. But as the message chain > has > > to modify the underlying object (an eduction), very snaky side effects > may > > occur. E.g., consider > > > > eduction := collection transduce. > > squared := eduction map: #squared. > > take := squared take: 1000. > > > > Now, all three variables hold onto the same object, which first squares > all > > elements and than takes the first 1000. > > This is because the programmer did not understand what he did. No? > > > > Stef > > PS: I played with infinite stream and iteration back in 1993 in CLOS. > Now I do not like to mix things because it breaks my flow of thinking. > > > > > > Best, > > Steffen > > > > > > > > > > > > Am .06.2017, 21:28 Uhr, schrieb Damien Pollet > > <damien.pollet+ph...@gmail.com>: > > > >> If I recall correctly, there is an alternate protocol that looks more > like > >> xtreams or the traditional select/collect iterations. > >> > >> On 2 June 2017 at 21:12, Stephane Ducasse <stepharo.s...@gmail.com> > wrote: > >> > >>> I have a design question > >>> > >>> why the library is implemented in functional style vs messages? > >>> I do not see why this is needed. To my eyes the compact notation > >>> goes against readibility of code and it feels ad-hoc in Smalltalk. > >>> > >>> > >>> I really prefer > >>> > >>> square := Map function: #squared. > >>> take := Take number: 1000. > >>> > >>> Because I know that I can read it and understand it. > >>> From that perspective I prefer Xtreams. > >>> > >>> Stef > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Wed, May 31, 2017 at 2:23 PM, Steffen Märcker <merk...@web.de> > wrote: > >>> > >>>> Hi, > >>>> > >>>> I am the developer of the library 'Transducers' for VisualWorks. It > was > >>>> formerly known as 'Reducers', but this name was a poor choice. I'd > like > >>>> to > >>>> port it to Pharo, if there is any interest on your side. I hope to > learn > >>>> more about Pharo in this process, since I am mainly a VW guy. And most > >>>> likely, I will come up with a bunch of questions. :-) > >>>> > >>>> Meanwhile, I'll cross-post the introduction from VWnc below. I'd be > very > >>>> happy to hear your optinions, questions and I hope we can start a > >>>> fruitful > >>>> discussion - even if there is not Pharo port yet. > >>>> > >>>> Best, Steffen > >>>> > >>>> > >>>> > >>>> Transducers are building blocks that encapsulate how to process > elements > >>>> of a data sequence independently of the underlying input and output > >>>> source. > >>>> > >>>> > >>>> > >>>> # Overview > >>>> > >>>> ## Encapsulate > >>>> Implementations of enumeration methods, such as #collect:, have the > >>>> logic > >>>> how to process a single element in common. > >>>> However, that logic is reimplemented each and every time. Transducers > >>>> make > >>>> it explicit and facilitate re-use and coherent behavior. > >>>> For example: > >>>> - #collect: requires mapping: (aBlock1 map) > >>>> - #select: requires filtering: (aBlock2 filter) > >>>> > >>>> > >>>> ## Compose > >>>> In practice, algorithms often require multiple processing steps, e.g., > >>>> mapping only a filtered set of elements. > >>>> Transducers are inherently composable, and hereby, allow to make the > >>>> combination of steps explicit. > >>>> Since transducers do not build intermediate collections, their > >>>> composition > >>>> is memory-efficient. > >>>> For example: > >>>> - (aBlock1 filter) * (aBlock2 map) "(1.) filter and (2.) map > elements" > >>>> > >>>> > >>>> ## Re-Use > >>>> Transducers are decoupled from the input and output sources, and > hence, > >>>> they can be reused in different contexts. > >>>> For example: > >>>> - enumeration of collections > >>>> - processing of streams > >>>> - communicating via channels > >>>> > >>>> > >>>> > >>>> # Usage by Example > >>>> > >>>> We build a coin flipping experiment and count the occurrence of heads > >>>> and > >>>> tails. > >>>> > >>>> First, we associate random numbers with the sides of a coin. > >>>> > >>>> scale := [:x | (x * 2 + 1) floor] map. > >>>> sides := #(heads tails) replace. > >>>> > >>>> Scale is a transducer that maps numbers x between 0 and 1 to 1 and 2. > >>>> Sides is a transducer that replaces the numbers with heads an tails by > >>>> lookup in an array. > >>>> Next, we choose a number of samples. > >>>> > >>>> count := 1000 take. > >>>> > >>>> Count is a transducer that takes 1000 elements from a source. > >>>> We keep track of the occurrences of heads an tails using a bag. > >>>> > >>>> collect := [:bag :c | bag add: c; yourself]. > >>>> > >>>> Collect is binary block (reducing function) that collects events in a > >>>> bag. > >>>> We assemble the experiment by transforming the block using the > >>>> transducers. > >>>> > >>>> experiment := (scale * sides * count) transform: collect. > >>>> > >>>> From left to right we see the steps involved: scale, sides, count > and > >>>> collect. > >>>> Transforming assembles these steps into a binary block (reducing > >>>> function) > >>>> we can use to run the experiment. > >>>> > >>>> samples := Random new > >>>> reduce: experiment > >>>> init: Bag new. > >>>> > >>>> Here, we use #reduce:init:, which is mostly similar to #inject:into:. > >>>> To execute a transformation and a reduction together, we can use > >>>> #transduce:reduce:init:. > >>>> > >>>> samples := Random new > >>>> transduce: scale * sides * count > >>>> reduce: collect > >>>> init: Bag new. > >>>> > >>>> We can also express the experiment as data-flow using #<~. > >>>> This enables us to build objects that can be re-used in other > >>>> experiments. > >>>> > >>>> coin := sides <~ scale <~ Random new. > >>>> flip := Bag <~ count. > >>>> > >>>> Coin is an eduction, i.e., it binds transducers to a source and > >>>> understands #reduce:init: among others. > >>>> Flip is a transformed reduction, i.e., it binds transducers to a > >>>> reducing > >>>> function and an initial value. > >>>> By sending #<~, we draw further samples from flipping the coin. > >>>> > >>>> samples := flip <~ coin. > >>>> > >>>> This yields a new Bag with another 1000 samples. > >>>> > >>>> > >>>> > >>>> # Basic Concepts > >>>> > >>>> ## Reducing Functions > >>>> > >>>> A reducing function represents a single step in processing a data > >>>> sequence. > >>>> It takes an accumulated result and a value, and returns a new > >>>> accumulated > >>>> result. > >>>> For example: > >>>> > >>>> collect := [:col :e | col add: e; yourself]. > >>>> sum := #+. > >>>> > >>>> A reducing function can also be ternary, i.e., it takes an accumulated > >>>> result, a key and a value. > >>>> For example: > >>>> > >>>> collect := [:dic :k :v | dict at: k put: v; yourself]. > >>>> > >>>> Reducing functions may be equipped with an optional completing action. > >>>> After finishing processing, it is invoked exactly once, e.g., to free > >>>> resources. > >>>> > >>>> stream := [:str :e | str nextPut: each; yourself] completing: > >>>> #close. > >>>> absSum := #+ completing: #abs > >>>> > >>>> A reducing function can end processing early by signaling Reduced > with a > >>>> result. > >>>> This mechanism also enables the treatment of infinite sources. > >>>> > >>>> nonNil := [:res :e | e ifNil: [Reduced signalWith: res] ifFalse: > >>>> [res]]. > >>>> > >>>> The primary approach to process a data sequence is the reducing > protocol > >>>> with the messages #reduce:init: and #transduce:reduce:init: if > >>>> transducers > >>>> are involved. > >>>> The behavior is similar to #inject:into: but in addition it takes care > >>>> of: > >>>> - handling binary and ternary reducing functions, > >>>> - invoking the completing action after finishing, and > >>>> - stopping the reduction if Reduced is signaled. > >>>> The message #transduce:reduce:init: just combines the transformation > and > >>>> the reducing step. > >>>> > >>>> However, as reducing functions are step-wise in nature, an application > >>>> may > >>>> choose other means to process its data. > >>>> > >>>> > >>>> ## Reducibles > >>>> > >>>> A data source is called reducible if it implements the reducing > >>>> protocol. > >>>> Default implementations are provided for collections and streams. > >>>> Additionally, blocks without an argument are reducible, too. > >>>> This allows to adapt to custom data sources without additional effort. > >>>> For example: > >>>> > >>>> "XStreams adaptor" > >>>> xstream := filename reading. > >>>> reducible := [[xstream get] on: Incomplete do: [Reduced signal]]. > >>>> > >>>> "natural numbers" > >>>> n := 0. > >>>> reducible := [n := n+1]. > >>>> > >>>> > >>>> ## Transducers > >>>> > >>>> A transducer is an object that transforms a reducing function into > >>>> another. > >>>> Transducers encapsulate common steps in processing data sequences, > such > >>>> as > >>>> map, filter, concatenate, and flatten. > >>>> A transducer transforms a reducing function into another via > #transform: > >>>> in order to add those steps. > >>>> They can be composed using #* which yields a new transducer that does > >>>> both > >>>> transformations. > >>>> Most transducers require an argument, typically blocks, symbols or > >>>> numbers: > >>>> > >>>> square := Map function: #squared. > >>>> take := Take number: 1000. > >>>> > >>>> To facilitate compact notation, the argument types implement > >>>> corresponding > >>>> methods: > >>>> > >>>> squareAndTake := #squared map * 1000 take. > >>>> > >>>> Transducers requiring no argument are singletons and can be accessed > by > >>>> their class name. > >>>> > >>>> flattenAndDedupe := Flatten * Dedupe. > >>>> > >>>> > >>>> > >>>> # Advanced Concepts > >>>> > >>>> ## Data flows > >>>> > >>>> Processing a sequence of data can often be regarded as a data flow. > >>>> The operator #<~ allows define a flow from a data source through > >>>> processing steps to a drain. > >>>> For example: > >>>> > >>>> squares := Set <~ 1000 take <~ #squared map <~ (1 to: 1000). > >>>> fileOut writeStream <~ #isSeparator filter <~ fileIn readStream. > >>>> > >>>> In both examples #<~ is only used to set up the data flow using > reducing > >>>> functions and transducers. > >>>> In contrast to streams, transducers are completely independent from > >>>> input > >>>> and output sources. > >>>> Hence, we have a clear separation of reading data, writing data and > >>>> processing elements. > >>>> - Sources know how to iterate over data with a reducing function, > e.g., > >>>> via #reduce:init:. > >>>> - Drains know how to collect data using a reducing function. > >>>> - Transducers know how to process single elements. > >>>> > >>>> > >>>> ## Reductions > >>>> > >>>> A reduction binds an initial value or a block yielding an initial > value > >>>> to > >>>> a reducing function. > >>>> The idea is to define a ready-to-use process that can be applied in > >>>> different contexts. > >>>> Reducibles handle reductions via #reduce: and #transduce:reduce: > >>>> For example: > >>>> > >>>> sum := #+ init: 0. > >>>> sum1 := #(1 1 1) reduce: sum. > >>>> sum2 := (1 to: 1000) transduce: #odd filter reduce: sum. > >>>> > >>>> asSet := [:set :e | set add: e; yourself] initializer: [Set new]. > >>>> set1 := #(1 1 1) reduce: asSet. > >>>> set2 := #(1 to: 1000) transduce: #odd filter reduce: asSet. > >>>> > >>>> By combining a transducer with a reduction, a process can be further > >>>> modified. > >>>> > >>>> sumOdds := sum <~ #odd filter > >>>> setOdds := asSet <~ #odd filter > >>>> > >>>> > >>>> ## Eductions > >>>> > >>>> An eduction combines a reducible data sources with a transducer. > >>>> The idea is to define a transformed (virtual) data source that needs > not > >>>> to be stored in memory. > >>>> > >>>> odds1 := #odd filter <~ #(1 2 3) readStream. > >>>> odds2 := #odd filter <~ (1 to 1000). > >>>> > >>>> Depending on the underlying source, eductions can be processed once > >>>> (streams, e.g., odds1) or multiple times (collections, e.g., odds2). > >>>> Since no intermediate data is stored, transducers actions are lazy, > >>>> i.e., > >>>> they are invoked each time the eduction is processed. > >>>> > >>>> > >>>> > >>>> # Origins > >>>> > >>>> Transducers is based on the same-named Clojure library and its ideas. > >>>> Please see: > >>>> http://clojure.org/transducers > >>>> > >>>> > > > > >