reduce name clash with transducers

Steffen Märcker Mon, 17 Apr 2023 11:27:11 -0700

Dear Richard,

sorry for the delayed reply and thanks for your thoughts. I wrapped my head 
around your suggestions and tried a few thing to see how the play out in code. 
I tend towards the following names.

- Reducing function (a role) is a binary block / object that responds to 
#value:value: and #numArgs; commonly used with #inject:into:
- Completing function (a role) is an unary block / object that responds to 
#value:.

- Reducer is a pair (rfn, cfn) of a reducing function and a completing function.
- Reduction is a pair (red, val) of a reducer and an initial value.

I like your idea to inverse the message flow and implement #reduce: in 
Reduction. It reads well in my eyes in actual code. I do not think that this is 
a misnomer, since it indeed reduces a sequence of elements to a single object. 
A reducer knows how to apply first its rfn to a sequence and then cfn to the 
result.

negatedSum := (#+ completing: #negate) init: 0.
result := negatedSum reduce: (1 to: 10).

asSet := (#add completing: #trim) initializer: [Set new].
distinct := asSet reduce: aCollection

I understand your concerns that we usually either do not need cfn or we applies 
it manually to the result of the reduction. Yes, most often cfn = identity and 
I took this into account. However, in some cases I actually need them to pass 
around as a single object. 

I also attempted to find names closer to #inject:into: but did not come up with 
any good ideas yet.

That this makes sense so far?

I deliberately did not mention transducers above to focus on the naming first. 
I heard about "obviously synchronizable series expressions" before and had a 
look at the paper just now. In short, transducers are in the same spirit and 
offer similar nice properties. But maybe we save the details for another thread?

Regarding the examples. Yes, they are are simple, stupid and the 
reimplemenation of #reduce: is far from optimal. The goal was to show how the 
objects/messages are used rather then coming up with real world examples or 
showing off. ;-)

Kind regards,
Steffen

Richard O'Keefe schrieb am Samstag, 15. April 2023 16:28:09 (+02:00):

Let
  initial :: a
  combine :: a -> b -> a

  finish  :: a -> c
then
  (finish . foldl combine initial) :: Foldable t => t b -> c

This appears to be analogous to your 'reduce with completion'.
What *can* be done with composition generally *should* be done
with composition; I really don't see any advantage in
defining
  reduce finish combine initial = finish . combine initial

As for growing a collection and then finally trimming it to its
desired size, I've never known anything like that to be a good
idea.  There's a reason why my library includes
  Set                  BoundedSet
  IdentitySet          BoundedIdentitySet
  Deque                BoundedDeque
  Heap                 BoundedHeap
I use BoundedHeap a lot.  If I want to find the k best things
out of n, using BoundedHeap lets me do it in O(n.log k) time
and O(k) space instead of O(n.log n) time and O(n) space.

Your example showing how much better #reduceLeft: is when
expressed using transducers is not well chosen,because the
implementation of #reduceLeft: in Pharo is (how to say this
politely?) not up to the standards we expect from Pharo.
Here is how it stands in my compatibility library:

Enumerable
  methods for: 'summarising'

    reduceLeft: aBlock
      <compatibility: #pharo>
      |n|
      ^(n := aBlock argumentCount) > 2
         ifTrue: [
           |i a|
           a := Array new: n.
           i := 0.
           self do: [:each |
             a at: (i := i + 1) put: each.
             i = n ifTrue: [
               a at: (i := 1) put: (aBlock valueWithArguments: a)]].
           i = 1 ifTrue: [a at: i] ifFalse: [
             self error: 'collection/block size mismatch']]
         ifFalse: [
           "If n = 0 or 1 and the receiver has two or more elements,
            this will raise an exception passing two arguments to aBlock.
            That's a good way to complain about it."
           |r f|
           r := f := nil.
           self do: [:each |
             r := f ifNil: [f := self. each]
                    ifNotNil: [aBlock value: r value: each]].
           r == f ifNil: [CollectionTooSmall collection: self]
                ifNotNil: [r]]

No OrderedCollection.  And exactly one traversal of the collection.
(Enumerables can only be traversed once.  Collections can be traversed
multiple times.)  The only method the receiver needs to provide
is #do:, so this method doesn't really need to be in Enumerable.
It could, for example, be in Block.

#reduceRight: I've placed in AbstractSequence because it needs
#reverseDo:, but it equally makes sense as a method of Block
(as it knows more about what Blocks can do than it does about
what sequences can do).  For example, I have SortedSet and
SortedBag which can be traversed forwards and backwards but only
#reduceLeft: is available to them; #reduceRight: is not, despite
them sensibly supporting #reverseDo:.  For that matter, Deques
support #reverseDo: but are not sequences... 

To the limited extent that I understand the Transducers package,
this view that (s reduce{Left,Right}: b) should really be
(b reduce{Left,Right}) applyTo: s
seems 100% in the spirit of Transducers.

In fact, now that I understand a bit about Transducers,
<collection> reduce: <transducer>
(a) seems back to front compared with
    <reducer> applyTo: <collection>
(b) seems like a misnomer because in general much may be
    generated transformed and retained, with nothing
    getting smaller, so why 'reduce'?
    It seems like a special case of applying a function-like
    object (the transducer) to an argument (the collection),
    so why not
    transducer applyTo: dataSource
    transducer applyTo: dataSource initially: initial

Did you look at Richard Waters' "obviously synchronizable series
expressions"?  (MIT AI Memo 958A and 959A)  Or his later SERIES
package?  (Appendix A of Common Lisp the Language, 2nd edition)

On Fri, 14 Apr 2023 at 22:32, Steffen Märcker <merk...@web.de> wrote:

Hi Richard!

Thanks for sharing your thoughts. 

  There's a reason why #inject:into: puts the block argument
  last.  It works better to have "heavy" constituents on the
  right in an English sentence, and it's easier to indent
  blocks when they come last.

Nice, I never though of it this way. I always appreciate a historical 
background.

Let me try to respond to the rest with a focus on the ideas. First of all, the 
point of the transducers framework is to cleanly separate between iteration 
over sequences, processing of the elements and accumulation of a result. It 
enables easy reuse the concepts common to different data structures.

1. What do I mean by completion? If we iterate over a sequence of objects, we 
sometimes want to do a final step after all elements have seen to complete the 
computation. For instance, after copying elements to a new collection, we may 
want to trim it to its actual size:

        distinct := col inject: Set new into: #add.
        distinct trim.

For some cases it turns out to be useful to have an object that knows how to do 
both:

        distinct := col reduce: (#add completing: #trim) init: Set new.

#reduce:init: knows how to deal with both this new objects and ordinary blocks. 
For now I will call both variants a "reducing function". Note, completion is 
completely optional and the implementation is literally #inject:into: plus 
completion if required.

2. What is a reduction? In some cases, it turns out to be useful to pair up a 
reducing function with an (initial) value. You called it a magma and often its 
indeed the neutral element of a mathematical operator, e.g., + and 0. But we 
can use a block that yields the initial value, too. For instance:

        sum := #+ init: 0.
        result := numbers reduce: sum.

        toSet := (#add completing: #trim) initializer: [Set new].
        distinct := col reduce: toSet.

#reduce: is just a shorthand that passes the function and the value to 
#reduce:init: Maybe #reduceMagma: is a reasonable name?

3. The reason I implemented #reduce:init: directly on collections, streams, 
etc. is that these objects know best how to efficiently iterate over their 
elements. And if a data structure knows how to #reduce:init: we can use it with 
all the transducers functions, e.g., for partitioning, filtering etc. Other 
useful methods  could then be added to the behaviour with a trait, e.g., 
#transduce:reduce:init which first apples a transducer and then reduces. As 
traits are not available in plain VW 8.3, I did not try this approach, though.

4. Lets take #reduce:Left: as and example and reimplement the method using 
transducers, shall we? The following code works for each 
sequence/collection/stream that supports #transduce:reduce:init:

        reduceLeft: aBlock

        | head rest arity |
        head := self transduce: (Take number: 1) reduce: [:r :e | e] init: nil.
        rest := Drop number: 1.

        arity := aBlock arity.
        ^arity = 2
                ifTrue: [self transduce: rest reduce: aBlock init: head]
                ifFalse: [
                        | size arguments |
                        size := arity - 1.
                        rest := rest * (Partition length: size) * (Remove 
predicate: [:part | part size < size]).
                        arguments := Array new: arity.
                        arguments at: 1 put: head.
                        self
                                transduce: rest
                                reduce: ([:args :part |
                                        args
                                                replaceFrom: 2 to: arity with: 
part;
                                                at: 1 put: (aBlock 
valueWithArguments: args);
                                                yourself] completing: [:args | 
args first])
                                init: arguments]

This code is both more general and faster: It does not create an intermediate 
OrderedCollection and it treats the common case of binary blocks efficiently. 
Note the implementation can more compact and optimized if it was specialized in 
certain class. For instance, SequenceableCollection allows accessing elements 
by index which turns the first line into a simple "self first".

Thanks for staying with me for this long reply. I hope I did not miss a point. 
I do not insist on the existing names but will appreciate any ideas.

Best, Steffen

Richard O'Keefe schrieb am Freitag, 14. April 2023 09:43:32 (+02:00):

#reduce: aReduction
   Are you saying that aReduction is an object from which
   a dyadic block and an initial value can be derived?
   That's going to confuse the heck out of Dolphin and Pharo
   users (like me, for example).  And in my copy of Pharo,
   #reduce: calls #reduceLeft:, not #foldLeft:.
   The sad thing about #reduceLeft: in Pharo is that in order
   to provide extra generality I have no use for, it fails to
   provide a fast path for the common case of a dyadic block.

reduceLeft: aBlock
  aBlock argumentCount = 2 ifTrue: [
    |r|
    r := self first.
    self from: 2 to: self last do: [:each |
      r := aBlock value: r value: each].
    ^r].
    ... everything else as before ...

Adding up a million floats takes half the time using the
fast path (67 msec vs 137 msec).  Does your #reduce:
also perform "a completion action"?  If so, it definitely
should not be named after #inject:into:.

At any rate, if it does something different, it should have
a different name, so #reduce: is no good.

#reduce:init:
  There's a reason why #inject:into: puts the block argument
  last.  It works better to have "heavy" constituents on the
  right in an English sentence, and it's easier to indent
  blocks when they come last.

  Which of the arguments here specifies the 'completion action'?
  What does the 'completion action' do?  (I can't tell from the name.)

I think the answer is clear:
* choose new intention-revealing names that do not clash.

If I have have understood your reduce: aReduction correctly,
a Reduction specifies
 - a binary operation (not necessarily associative)
 - a value which can be passed to that binary operation
which suggests that it represents a magma with identity.
By the way, it is not clear whether
 {x} reduce: <<ident. binop>>
answers x or binop value: ident value: x.
It's only when ident is an identity for binop that you
can say 'it doesn't matter'. 

I don't suppose you could bring yourself to call
aReduction aMagmaWithIdentity?

Had you considered
  aMagmaWithIdentity reduce: aCollection
where the #reduce: method is now in your class so
can't *technically* clash with anything else?
All you really need from aCollection is #do: so
it could even be a stream.

MagmaWithIdentity
>> identity
>> combine:with:
>> reduce: anEnumerable
     |r|
     r := self identity.
     anEumerable do: [:each | r := self combine: r with: each].
     ^r

MagmaSansIdentity
>> combine:with:
>> reduce: anEnumerable
     |r f|
     f := r := nil.
     anEnumerable do: [:each |
       r := f ifNil: [f := self. each] ifNotNil: [self combine: r with: each]].
     f ifNil: [anEnumerable error: 'is empty'].
     ^r

On Fri, 14 Apr 2023 at 05:02, Steffen Märcker <merk...@web.de> wrote:

The reason I came up with the naming question in the first place is that I 
(finally !) finish my port of Transducers to Pharo. But currently, I am running 
into a name clash. Maybe you have some good ideas how to resolve the following 
situation in a pleasant way.

- #fold: exists in Pharo and is an alias of #reduce:
- #reduce: exists in Pharo and calls #foldLeft: which also deals with more than 
two block arguments

Both of which are not present in VW. Hence, I used the following messages in VW 
with no name clash:

- #reduce: aReduction "= block + initial value"
- #reduce:init: is similar to #inject:into: but executes an additional 
completion action

Some obvious ways to avoid a clash in Pharo are:

1) Make #reduce: distinguish between a reduction and a simple block (e.g. by 
double dispatch)
2) Rename the transducers #reduce: to #injectInto: and adapt #inject:into: to 
optionally do the completion
3) Find another selector that is not too counter-intuitive

All three approaches have some downsides in my opinion:
1) Though straight forward to implement, both flavors behave quite different, 
especially with respect to the number of block arguments. The existing one 
creates a SequenceableCollection and partitions it according to the required 
number of args. Transducers' #reduce: considers binary blocks as the binary 
fold case but ternary blocks as fold with indexed elements.
2) This is a real extension of #inject:into: but requires to touch multiple 
implementations of that message. Something I consider undesirabe.
3) Currently, I cannot think of a good name that is not too far away from what 
we're familiar with.

Do you have some constructive comments and ideas?

Kind regards,
Steffen

Steffen Märcker schrieb am Donnerstag, 13. April 2023 17:11:15 (+02:00):

:-D I don't know how compress made onto that site. There is not even an example 
in the list of language examples where fold/reduce is named compress.

Richard O'Keefe schrieb am Donnerstag, 13. April 2023 16:34:29 (+02:00):

OUCH.  Wikipedia is as reliable as ever, I see.
compress and reduce aren't even close to the same thing.
Since the rank of the result of compression is the same
as the rank of the right operand, and the rank of the
result of reducing is one lower, they are really quite
different.  compress is Fortran's PACK.
https://gcc.gnu.org/onlinedocs/gfortran/PACK.html

On Fri, 14 Apr 2023 at 01:34, Steffen Märcker <merk...@web.de> wrote:

Hi Richard and Sebastian!

Interesting read. I obviously was not aware of the variety of meanings for 
fold/reduce. Thanks for pointing this out. Also, in some languages it seems the 
same name is used for both reductions with and without an initial value. 
There's even a list on WP on the matter: 
https://en.wikipedia.org/wiki/Fold_%28higher-order_function%29#In_various_languages

Kind regards,
Steffen

Richard O'Keefe schrieb am Donnerstag, 13. April 2023 13:16:28 (+02:00):

The standard prelude in Haskell does not define anything
called "fold".  It defines fold{l,r}{,1} which can be
applied to any Foldable data (see Data.Foldable).  For
technical reasons having to do with Haskell's
non-strict evaluation, foldl' and foldr' also exist.
But NOT "fold".

https://hackage.haskell.org/package/base-4.18.0.0/docs/Data-Foldable.html#laws

On Thu, 13 Apr 2023 at 21:17, Sebastian Jordan Montano 
<sebastian.jor...@inria.fr> wrote:

Hello Steffen,

Let's take Kotlin documentation 
(https://kotlinlang.org/docs/collection-aggregate.html#fold-and-reduce)

> The difference between the two functions is that fold() takes an initial 
> value and uses it as the accumulated value on the first step, whereas the 
> first step of reduce() uses the first and the second elements as operation 
> arguments on the first step.

Naming is not so consistent in all the programming languages, they mix up the 
names "reduce" and "fold". For example in Haskell "fold" does not take an 
initial value, so it is like a "reduce" in Kotlin. In Kotlin, Java, Scala and 
other oo languages "reduce" does not take an initial value while "fold" does. 
Pharo align with those languages (except that out fold is called #inject:into:)

So for me the Pharo methods #reduce: and #inject:into represent well what they 
are doing and they are well named.

Cheers,
Sebastian

----- Mail original -----
> De: "Steffen Märcker" <merk...@web.de>
> À: "Any question about pharo is welcome" <pharo-users@lists.pharo.org>
> Envoyé: Mercredi 12 Avril 2023 19:03:01
> Objet: [Pharo-users] Collection>>reduce naming

> Hi!
>
> I wonder whether there was a specific reason to name this method #reduce:?
> I would have expected #fold: as this is the more common term for what it
> does. And in fact, even the comment reads "Fold the result of the receiver
> into aBlock." Whereas #reduce: is the common term for what we call with
> #inject:into: .
>
> I am asking not to annoy anyone but out of curiosity. It figured this out
> only by some weird behaviour after porting some code that (re)defines
> #reduce .
>
> Ciao!
> Steffen

-- 
Gesendet mit Vivaldi Mail. Laden Sie Vivaldi kostenlos unter vivaldi.com 
herunter

[Pharo-users] Re: Collection>>reduce name clash with transducers

Reply via email to