Re: [Rd] Incorrect vector recycling in rare circumstance

Gabriel Becker Wed, 28 Jan 2026 01:32:25 -0800

Geoffrey,

I think the key issue here is that you are expecting f(a,b,c) and mapply(f,
a, b, c) to do similar things in the first place when, in fact, they don't.


f(a,b,c) calls f (and thus evaluates your arithmetic expression) *one time*,
with arguments of length 4, 3, and 7. Here recycling occurs within the
arithmetic expression in order to determine the length of the result. As
such, the recycling is done piecemeal within the complex arithmetic
expression as you pointed out.

mapply(f, a, b, c) on the other hand, calls f *max(length(a), length(b),
length(c))* times, each time with arguments all of length 1. Here, the
recycling occurs *to determine how many times to call f and **what
arguments each call should receive. *As such it must occur before any call
to f can happen at all, and no recycling occurs within f (and thus none
occurs within the arithmetic expression).

Recall also that mapply can take *any* function, not just ones whose body
is made up of a single complex arithmetic expression. It is trivial to
construct a function f' such that f(a, b, c) and mapply(f, a, b, c) would
give results of the same length but whose values we would not even expect
to be the same.

On the other side of the coin, if you're arguing that (a+b)/c should act
like mapply(f, a, b, c), remember that

(a+b)/c

is really

`/`(`+`(a, b), c)

And here the `+` operator would need information about c, which is not one
of its arguments, in order to do the recycling that way. Also,  remember
that due to lazy evaluation, a, b, and c aren't even evaluated until their
values are needed. In your example, c is not needed until the `/` call
(specifically its second argument), which is after a + b is completely
done. But
in the proposed new mapply like order, c would need to be evaluated before a
+ b in order to get it's length to know how to recycle a and b, which it
does not need to do now and which violates lazy evaluation. Combine that
with the fact that a, b, and c, could be replaced with function calls and
functions are not *supposed* to have side effects but *are *allowed to do
so and I think we're looking at a can of worms that I personally would not
want to try to open.

Best,
~G


On Wed, Jan 28, 2026 at 12:25 AM Serguei Sokol via R-devel <
[email protected]> wrote:

> Le 27/01/2026 à 22:51, Duncan Murdoch a écrit :
> > My first reaction was that you shouldn't use the Introduction document
> > as a reference, you should be using the Language Definition or the man
> > pages.
> >
> > The Language Definition gives an example of adding two vectors, and
> > describes the result there.  It doesn't talk about recycling rules for
> > more complex expressions.
> >
> > The man page `?Arithmetic` gives a more complete description, also in
> > terms of binary operations, not complex expressions.
> >
> > So I think things are behaving as designed, and the Introduction
> > document describes it ambiguously, but not incorrectly strictly
> > speaking, since it doesn't say exactly how the recycling will occur.
> > Maybe this would be a clearer description:
> >
> > "Vectors occurring in the same expression need not all be of the same
> > length.  If they are not, the value of the expression is a vector with
> > the same length as the longest vector which occurs in the expression.
> > Recycling occurs in each binary operation:  the shorter vector is
> > recycled as often as need be (perhaps fractionally) until it matches
> > the length of the longer vector."
> This would eliminate ambiguity in the description but leave the
> "problem" as is. For me the problem is that actual behavior is easy to
> program but difficult to use in practice.
> I mean that for a programmer it is much easier to conceive a commun
> alignment for all components in an expression and a complexe action on
> it than to parse mentally an expression to follow what alignment occurs
> at which moment.
>
> I understand that programming a commun alignment through a complex
> expression could lead to some sever reverse compatibility issues but it
> may be proposed a new option or something alike to give an opportunity
> to a user to relie on a common alignement in complex expressions.
>
> Best,
> Serguei.
>
> >
> > Duncan Murdoch
> >
> > On 2026-01-27 3:58 p.m., Poole, Geoffrey via R-devel wrote:
> >> Synopsis:  In multistep expressions, e.g.:
> >>
> >> fun <- function(a, b, c) (a + b) / c
> >>
> >> `fun` returns an unexpected and non-intuative result when:
> >>   - a, b, and c are vectors
> >>   - c is the longest vector
> >>   - the lengths of a, b, and c are not even multiples of one another.
> >>
> >> In this case, because of the way vectors are being recycled:
> >>
> >>> fun(a, b, c)
> >>
> >> returns a different result from:
> >>
> >>> mapply(fun, a, b, c)
> >>
> >> Description:
> >>
> >> The R documentation in "An Introduction to R" Section 2.2 states:
> >>
> >>    "Vectors occurring in the same expression need not all be of the
> >> same length.
> >>    If they are not, the value of the expression is a vector with the
> >> same length
> >>    as the longest vector which occurs in the expression. Shorter
> >> vectors in the
> >>    expression are recycled as often as need be (perhaps fractionally)
> >> until they
> >>    match the length of the longest vector."
> >>
> >> Based on this documentation, I would expect that all vectors in an
> >> expression
> >> are recycled to match the length of the longest vector before
> >> element-wise
> >> operations are performed. However, R appears to perform recycling
> >> independently
> >> at each operation, which produces different results than the documented
> >> behavior would suggest.
> >>
> >> Minimal reproducible example:
> >>
> >> ```r
> >> # Simple function demonstrating the issue
> >> f <- function(a, b, c) {
> >>    (a + b) / c
> >> }
> >>
> >> # Vectors of different lengths (not multiples of each other)
> >> a <- c(1, 2, 3, 4)
> >> b <- c(10, 20, 30)
> >> c <- c(100, 200, 300, 400, 500, 600, 700)
> >>
> >> # Direct call
> >> direct_result <- f(a, b, c)
> >>
> >> # mapply (recycles all inputs to length 7 first, then applies
> >> element-wise)
> >> mapply_result <- mapply(f, a, b, c)
> >>
> >> # Compare results
> >> cat("Direct call result:\n")
> >> print(direct_result)
> >>
> >> cat("\nmapply result:\n")
> >> print(mapply_result)
> >>
> >> cat("\nResults are identical:", identical(direct_result,
> >> mapply_result), "\n")
> >>
> >> sessionInfo()
> >> ```
> >>
> >> Output:
> >>
> >>> f <- function(a, b, c) {
> >> +   (a + b) / c
> >> + }
> >>>
> >>> a <- c(1, 2, 3, 4)
> >>> b <- c(10, 20, 30)
> >>> c <- c(100, 200, 300, 400, 500, 600, 700)
> >>>
> >>> direct_result <- f(a, b, c)
> >> Warning messages:
> >> 1: In a + b :
> >>    longer object length is not a multiple of shorter object length
> >> 2: In (a + b)/c :
> >>    longer object length is not a multiple of shorter object length
> >>>
> >>> mapply_result <- mapply(f, a, b, c)
> >> Warning messages:
> >> 1: In mapply(f, a, b, c) :
> >>    longer argument not a multiple of length of shorter
> >> 2: In mapply(f, a, b, c) :
> >>    longer argument not a multiple of length of shorter
> >>>
> >>> print(direct_result)
> >> [1] 0.11000000 0.11000000 0.11000000 0.03500000 0.02200000 0.03666667
> >> 0.04714286
> >>>
> >>> print(mapply_result)
> >> [1] 0.11000000 0.11000000 0.11000000 0.03500000 0.04200000 0.05333333
> >> 0.01857143
> >>
> >>> cat("\nResults are identical:", identical(direct_result,
> >>> mapply_result), "\n")
> >> Results are identical: FALSE
> >>>
> >>> sessionInfo()
> >> R version 4.3.3 (2024-02-29)
> >> Platform: x86_64-pc-linux-gnu (64-bit)
> >> Running under: Linux Mint 22.2
> >>
> >> Matrix products: default
> >> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
> >> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
> >>
> >> locale:
> >>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8
> >>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
> >> LC_MESSAGES=en_US.UTF-8
> >>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C LC_ADDRESS=C
> >> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
> >> LC_IDENTIFICATION=C
> >>
> >> time zone: America/Denver
> >> tzcode source: system (glibc)
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods base
> >>
> >> loaded via a namespace (and not attached):
> >> [1] compiler_4.3.3 tools_4.3.3
> >>
> >> Explanation of what is happening:
> >>
> >> In the direct call, recycling occurs independently at each binary
> >> operation:
> >>
> >> 1. `a + b` is evaluated first: `a` (length 4) and `b` (length 3) are
> >> recycled
> >>     to length 4, producing `[11, 22, 33, 14]`
> >>
> >> 2. The length-4 result is then divided by `c` (length 7): the
> >> length-4 result
> >>     is recycled to length 7 as `[11, 22, 33, 14, 11, 22, 33]`, then
> >> divided by `c`
> >>
> >> 3. Final result: `[0.11, 0.11, 0.11, 0.035, 0.022, 0.0367, 0.0471]`
> >>
> >> However, based on my reading of the documentation, I would expect all
> >> three
> >> vectors (`a`, `b`, and `c`) to be recycled to length 7 (the length of
> >> the
> >> longest vector in the expression) before any operations are
> >> performed, which
> >> is what `mapply` does:
> >>
> >> - `a` recycled to length 7: `[1, 2, 3, 4, 1, 2, 3]`
> >> - `b` recycled to length 7: `[10, 20, 30, 10, 20, 30, 10]`
> >> - `c` unchanged: `[100, 200, 300, 400, 500, 600, 700]`
> >> - Then `(a + b) / c` computed element-wise: `[0.11, 0.11, 0.11,
> >> 0.035, 0.042, 0.0533, 0.0186]`
> >>
> >> The key difference is at positions 5, 6, and 7. In the direct call, the
> >> intermediate result `(a + b)` has length 4 and is recycled
> >> independently of
> >> the original vectors when dividing by `c`.
> >>
> >> Question:
> >>
> >> Does this example represent a bug in R's recycling behavior, or is the
> >> documentation in Section 2.2 not intended to describe how recycling
> >> works in
> >> expressions with multiple binary operations? If the current behavior is
> >> intentional, could the documentation be clarified to explain that
> >> recycling
> >> occurs at each binary operation rather than globally across the
> >> expression?
> >>
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [email protected] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> >
>
>
> --
> Serguei Sokol
> Ingenieur de recherche INRAE
>
> Cellule Mathématiques
> TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
> 135 Avenue de Rangueil
> 31077 Toulouse Cedex 04
>
> tel: +33 5 61 55 98 49
> email: [email protected]
>
> https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Incorrect vector recycling in rare circumstance

Reply via email to