reduce name clash with transducers

Richard O'Keefe Sat, 15 Apr 2023 07:29:10 -0700

Let
  initial :: a
  combine :: a -> b -> a
  finish  :: a -> c
then
  (finish . foldl combine initial) :: Foldable t => t b -> c


This appears to be analogous to your 'reduce with completion'.
What *can* be done with composition generally *should* be done
with composition; I really don't see any advantage in
defining
  reduce finish combine initial = finish . combine initial

As for growing a collection and then finally trimming it to its
desired size, I've never known anything like that to be a good
idea.  There's a reason why my library includes
  Set                  BoundedSet
  IdentitySet          BoundedIdentitySet
  Deque                BoundedDeque
  Heap                 BoundedHeap
I use BoundedHeap a lot.  If I want to find the k best things
out of n, using BoundedHeap lets me do it in O(n.log k) time
and O(k) space instead of O(n.log n) time and O(n) space.

Your example showing how much better #reduceLeft: is when
expressed using transducers is not well chosen,because the
implementation of #reduceLeft: in Pharo is (how to say this
politely?) not up to the standards we expect from Pharo.
Here is how it stands in my compatibility library:

Enumerable
  methods for: 'summarising'
    reduceLeft: aBlock
      <compatibility: #pharo>
      |n|
      ^(n := aBlock argumentCount) > 2
         ifTrue: [
           |i a|
           a := Array new: n.
           i := 0.
           self do: [:each |
             a at: (i := i + 1) put: each.
             i = n ifTrue: [
               a at: (i := 1) put: (aBlock valueWithArguments: a)]].
           i = 1 ifTrue: [a at: i] ifFalse: [
             self error: 'collection/block size mismatch']]
         ifFalse: [
           "If n = 0 or 1 and the receiver has two or more elements,
            this will raise an exception passing two arguments to aBlock.
            That's a good way to complain about it."
           |r f|
           r := f := nil.
           self do: [:each |
             r := f ifNil: [f := self. each]
                    ifNotNil: [aBlock value: r value: each]].
           r == f ifNil: [CollectionTooSmall collection: self]
                ifNotNil: [r]]

No OrderedCollection.  And exactly one traversal of the collection.
(Enumerables can only be traversed once.  Collections can be traversed
multiple times.)  The only method the receiver needs to provide
is #do:, so this method doesn't really need to be in Enumerable.
It could, for example, be in Block.

#reduceRight: I've placed in AbstractSequence because it needs
#reverseDo:, but it equally makes sense as a method of Block
(as it knows more about what Blocks can do than it does about
what sequences can do).  For example, I have SortedSet and
SortedBag which can be traversed forwards and backwards but only
#reduceLeft: is available to them; #reduceRight: is not, despite
them sensibly supporting #reverseDo:.  For that matter, Deques
support #reverseDo: but are not sequences...
To the limited extent that I understand the Transducers package,
this view that (s reduce{Left,Right}: b) should really be
(b reduce{Left,Right}) applyTo: s
seems 100% in the spirit of Transducers.

In fact, now that I understand a bit about Transducers,
<collection> reduce: <transducer>
(a) seems back to front compared with
    <reducer> applyTo: <collection>
(b) seems like a misnomer because in general much may be
    generated transformed and retained, with nothing
    getting smaller, so why 'reduce'?
    It seems like a special case of applying a function-like
    object (the transducer) to an argument (the collection),
    so why not
    transducer applyTo: dataSource
    transducer applyTo: dataSource initially: initial

Did you look at Richard Waters' "obviously synchronizable series
expressions"?  (MIT AI Memo 958A and 959A)  Or his later SERIES
package?  (Appendix A of Common Lisp the Language, 2nd edition)




On Fri, 14 Apr 2023 at 22:32, Steffen Märcker <merk...@web.de> wrote:

> Hi Richard!
>
> Thanks for sharing your thoughts.
>
>   There's a reason why #inject:into: puts the block argument
>   last.  It works better to have "heavy" constituents on the
>   right in an English sentence, and it's easier to indent
>   blocks when they come last.
>
>
> Nice, I never though of it this way. I always appreciate a historical
> background.
>
> Let me try to respond to the rest with a focus on the ideas. First of all,
> the point of the transducers framework is to cleanly separate between
> iteration over sequences, processing of the elements and accumulation of a
> result. It enables easy reuse the concepts common to different data
> structures.
>
> 1. What do I mean by completion? If we iterate over a sequence of objects,
> we sometimes want to do a final step after all elements have seen to
> complete the computation. For instance, after copying elements to a new
> collection, we may want to trim it to its actual size:
>
> distinct := col inject: Set new into: #add.
> distinct trim.
>
> For some cases it turns out to be useful to have an object that knows how
> to do both:
>
> distinct := col reduce: (#add completing: #trim) init: Set new.
>
> #reduce:init: knows how to deal with both this new objects and ordinary
> blocks. For now I will call both variants a "reducing function". Note,
> completion is completely optional and the implementation is literally
> #inject:into: plus completion if required.
>
>
> 2. What is a reduction? In some cases, it turns out to be useful to pair
> up a reducing function with an (initial) value. You called it a magma and
> often its indeed the neutral element of a mathematical operator, e.g., +
> and 0. But we can use a block that yields the initial value, too. For
> instance:
>
> sum := #+ init: 0.
> result := numbers reduce: sum.
>
> toSet := (#add completing: #trim) initializer: [Set new].
> distinct := col reduce: toSet.
>
> #reduce: is just a shorthand that passes the function and the value to
> #reduce:init: Maybe #reduceMagma: is a reasonable name?
>
>
> 3. The reason I implemented #reduce:init: directly on collections,
> streams, etc. is that these objects know best how to efficiently iterate
> over their elements. And if a data structure knows how to #reduce:init: we
> can use it with all the transducers functions, e.g., for partitioning,
> filtering etc. Other useful methods  could then be added to the behaviour
> with a trait, e.g., #transduce:reduce:init which first apples a transducer
> and then reduces. As traits are not available in plain VW 8.3, I did not
> try this approach, though.
>
>
> 4. Lets take #reduce:Left: as and example and reimplement the method using
> transducers, shall we? The following code works for each
> sequence/collection/stream that supports #transduce:reduce:init:
>
> reduceLeft: aBlock
> | head rest arity |
> head := self transduce: (Take number: 1) reduce: [:r :e | e] init: nil.
> rest := Drop number: 1.
>
> arity := aBlock arity.
> ^arity = 2
> ifTrue: [self transduce: rest reduce: aBlock init: head]
> ifFalse: [
> | size arguments |
> size := arity - 1.
> rest := rest * (Partition length: size) * (Remove predicate: [:part | part
> size < size]).
> arguments := Array new: arity.
> arguments at: 1 put: head.
> self
> transduce: rest
> reduce: ([:args :part |
> args
> replaceFrom: 2 to: arity with: part;
> at: 1 put: (aBlock valueWithArguments: args);
> yourself] completing: [:args | args first])
> init: arguments]
>
> This code is both more general and faster: It does not create an
> intermediate OrderedCollection and it treats the common case of binary
> blocks efficiently. Note the implementation can more compact and optimized
> if it was specialized in certain class. For instance,
> SequenceableCollection allows accessing elements by index which turns the
> first line into a simple "self first".
>
> Thanks for staying with me for this long reply. I hope I did not miss a
> point. I do not insist on the existing names but will appreciate any ideas.
>
> Best, Steffen
>
>
>
>
> Richard O'Keefe schrieb am Freitag, 14. April 2023 09:43:32 (+02:00):
>
> #reduce: aReduction
>    Are you saying that aReduction is an object from which
>    a dyadic block and an initial value can be derived?
>    That's going to confuse the heck out of Dolphin and Pharo
>    users (like me, for example).  And in my copy of Pharo,
>    #reduce: calls #reduceLeft:, not #foldLeft:.
>    The sad thing about #reduceLeft: in Pharo is that in order
>    to provide extra generality I have no use for, it fails to
>    provide a fast path for the common case of a dyadic block.
>
> reduceLeft: aBlock
>   aBlock argumentCount = 2 ifTrue: [
>     |r|
>     r := self first.
>     self from: 2 to: self last do: [:each |
>       r := aBlock value: r value: each].
>     ^r].
>     ... everything else as before ...
>
> Adding up a million floats takes half the time using the
> fast path (67 msec vs 137 msec).  Does your #reduce:
> also perform "a completion action"?  If so, it definitely
> should not be named after #inject:into:.
>
>
>
> At any rate, if it does something different, it should have
> a different name, so #reduce: is no good.
>
> #reduce:init:
>   There's a reason why #inject:into: puts the block argument
>   last.  It works better to have "heavy" constituents on the
>   right in an English sentence, and it's easier to indent
>   blocks when they come last.
>
>   Which of the arguments here specifies the 'completion action'?
>   What does the 'completion action' do?  (I can't tell from the name.)
>
> I think the answer is clear:
> * choose new intention-revealing names that do not clash.
>
> If I have have understood your reduce: aReduction correctly,
> a Reduction specifies
>  - a binary operation (not necessarily associative)
>  - a value which can be passed to that binary operation
> which suggests that it represents a magma with identity.
> By the way, it is not clear whether
>  {x} reduce: <<ident. binop>>
> answers x or binop value: ident value: x.
> It's only when ident is an identity for binop that you
> can say 'it doesn't matter'.
> I don't suppose you could bring yourself to call
> aReduction aMagmaWithIdentity?
>
> Had you considered
>   aMagmaWithIdentity reduce: aCollection
> where the #reduce: method is now in your class so
> can't *technically* clash with anything else?
> All you really need from aCollection is #do: so
> it could even be a stream.
>
> MagmaWithIdentity
> >> identity
> >> combine:with:
> >> reduce: anEnumerable
>      |r|
>      r := self identity.
>      anEumerable do: [:each | r := self combine: r with: each].
>      ^r
>
> MagmaSansIdentity
> >> combine:with:
> >> reduce: anEnumerable
>      |r f|
>      f := r := nil.
>      anEnumerable do: [:each |
>        r := f ifNil: [f := self. each] ifNotNil: [self combine: r with:
> each]].
>      f ifNil: [anEnumerable error: 'is empty'].
>      ^r
>
>
>
>
> On Fri, 14 Apr 2023 at 05:02, Steffen Märcker <merk...@web.de> wrote:
>
>> The reason I came up with the naming question in the first place is that
>> I (finally !) finish my port of Transducers to Pharo. But currently, I am
>> running into a name clash. Maybe you have some good ideas how to resolve
>> the following situation in a pleasant way.
>>
>> - #fold: exists in Pharo and is an alias of #reduce:
>> - #reduce: exists in Pharo and calls #foldLeft: which also deals with
>> more than two block arguments
>>
>> Both of which are not present in VW. Hence, I used the following messages
>> in VW with no name clash:
>>
>> - #reduce: aReduction "= block + initial value"
>> - #reduce:init: is similar to #inject:into: but executes an additional
>> completion action
>>
>> Some obvious ways to avoid a clash in Pharo are:
>>
>> 1) Make #reduce: distinguish between a reduction and a simple block (e.g.
>> by double dispatch)
>> 2) Rename the transducers #reduce: to #injectInto: and adapt
>> #inject:into: to optionally do the completion
>> 3) Find another selector that is not too counter-intuitive
>>
>> All three approaches have some downsides in my opinion:
>> 1) Though straight forward to implement, both flavors behave quite
>> different, especially with respect to the number of block arguments. The
>> existing one creates a SequenceableCollection and partitions it according
>> to the required number of args. Transducers' #reduce: considers binary
>> blocks as the binary fold case but ternary blocks as fold with indexed
>> elements.
>> 2) This is a real extension of #inject:into: but requires to touch
>> multiple implementations of that message. Something I consider undesirabe.
>> 3) Currently, I cannot think of a good name that is not too far away from
>> what we're familiar with.
>>
>> Do you have some constructive comments and ideas?
>>
>> Kind regards,
>> Steffen
>>
>>
>>
>>
>> Steffen Märcker schrieb am Donnerstag, 13. April 2023 17:11:15 (+02:00):
>>
>> :-D I don't know how compress made onto that site. There is not even an
>> example in the list of language examples where fold/reduce is named
>> compress.
>>
>>
>> Richard O'Keefe schrieb am Donnerstag, 13. April 2023 16:34:29 (+02:00):
>>
>> OUCH.  Wikipedia is as reliable as ever, I see.
>> compress and reduce aren't even close to the same thing.
>> Since the rank of the result of compression is the same
>> as the rank of the right operand, and the rank of the
>> result of reducing is one lower, they are really quite
>> different.  compress is Fortran's PACK.
>> https://gcc.gnu.org/onlinedocs/gfortran/PACK.html
>>
>> On Fri, 14 Apr 2023 at 01:34, Steffen Märcker <merk...@web.de> wrote:
>>
>>> Hi Richard and Sebastian!
>>>
>>> Interesting read. I obviously was not aware of the variety of meanings
>>> for fold/reduce. Thanks for pointing this out. Also, in some languages it
>>> seems the same name is used for both reductions with and without an initial
>>> value. There's even a list on WP on the matter:
>>> https://en.wikipedia.org/wiki/Fold_%28higher-order_function%29#In_various_languages
>>>
>>> Kind regards,
>>> Steffen
>>>
>>> Richard O'Keefe schrieb am Donnerstag, 13. April 2023 13:16:28 (+02:00):
>>>
>>> The standard prelude in Haskell does not define anything
>>> called "fold".  It defines fold{l,r}{,1} which can be
>>> applied to any Foldable data (see Data.Foldable).  For
>>> technical reasons having to do with Haskell's
>>> non-strict evaluation, foldl' and foldr' also exist.
>>> But NOT "fold".
>>>
>>>
>>> https://hackage.haskell.org/package/base-4.18.0.0/docs/Data-Foldable.html#laws
>>>
>>>
>>> On Thu, 13 Apr 2023 at 21:17, Sebastian Jordan Montano <
>>> sebastian.jor...@inria.fr> wrote:
>>>
>>>> Hello Steffen,
>>>>
>>>> Let's take Kotlin documentation (
>>>> https://kotlinlang.org/docs/collection-aggregate.html#fold-and-reduce)
>>>>
>>>> > The difference between the two functions is that fold() takes an
>>>> initial value and uses it as the accumulated value on the first step,
>>>> whereas the first step of reduce() uses the first and the second elements
>>>> as operation arguments on the first step.
>>>>
>>>> Naming is not so consistent in all the programming languages, they mix
>>>> up the names "reduce" and "fold". For example in Haskell "fold" does not
>>>> take an initial value, so it is like a "reduce" in Kotlin. In Kotlin, Java,
>>>> Scala and other oo languages "reduce" does not take an initial value while
>>>> "fold" does. Pharo align with those languages (except that out fold is
>>>> called #inject:into:)
>>>>
>>>> So for me the Pharo methods #reduce: and #inject:into represent well
>>>> what they are doing and they are well named.
>>>>
>>>> Cheers,
>>>> Sebastian
>>>>
>>>> ----- Mail original -----
>>>> > De: "Steffen Märcker" <merk...@web.de>
>>>> > À: "Any question about pharo is welcome" <pharo-users@lists.pharo.org
>>>> >
>>>> > Envoyé: Mercredi 12 Avril 2023 19:03:01
>>>> > Objet: [Pharo-users] Collection>>reduce naming
>>>>
>>>> > Hi!
>>>> >
>>>> > I wonder whether there was a specific reason to name this method
>>>> #reduce:?
>>>> > I would have expected #fold: as this is the more common term for what
>>>> it
>>>> > does. And in fact, even the comment reads "Fold the result of the
>>>> receiver
>>>> > into aBlock." Whereas #reduce: is the common term for what we call
>>>> with
>>>> > #inject:into: .
>>>> >
>>>> > I am asking not to annoy anyone but out of curiosity. It figured this
>>>> out
>>>> > only by some weird behaviour after porting some code that (re)defines
>>>> > #reduce .
>>>> >
>>>> > Ciao!
>>>> > Steffen
>>>>
>>>
>> --
>> Gesendet mit Vivaldi Mail. Laden Sie Vivaldi kostenlos von vivaldi.com
>> herunter.
>>
>

[Pharo-users] Re: Collection>>reduce name clash with transducers

Reply via email to