My first reaction was that you shouldn't use the Introduction document as a reference, you should be using the Language Definition or the man pages.

The Language Definition gives an example of adding two vectors, and describes the result there. It doesn't talk about recycling rules for more complex expressions.

The man page `?Arithmetic` gives a more complete description, also in terms of binary operations, not complex expressions.

So I think things are behaving as designed, and the Introduction document describes it ambiguously, but not incorrectly strictly speaking, since it doesn't say exactly how the recycling will occur. Maybe this would be a clearer description:

"Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Recycling occurs in each binary operation: the shorter vector is recycled as often as need be (perhaps fractionally) until it matches the length of the longer vector."

Duncan Murdoch

On 2026-01-27 3:58 p.m., Poole, Geoffrey via R-devel wrote:
Synopsis:  In multistep expressions, e.g.:

fun <- function(a, b, c) (a + b) / c

`fun` returns an unexpected and non-intuative result when:
  - a, b, and c are vectors
  - c is the longest vector
  - the lengths of a, b, and c are not even multiples of one another.

In this case, because of the way vectors are being recycled:

fun(a, b, c)

returns a different result from:

mapply(fun, a, b, c)

Description:

The R documentation in "An Introduction to R" Section 2.2 states:

   "Vectors occurring in the same expression need not all be of the same length.
   If they are not, the value of the expression is a vector with the same length
   as the longest vector which occurs in the expression. Shorter vectors in the
   expression are recycled as often as need be (perhaps fractionally) until they
   match the length of the longest vector."

Based on this documentation, I would expect that all vectors in an expression
are recycled to match the length of the longest vector before element-wise
operations are performed. However, R appears to perform recycling independently
at each operation, which produces different results than the documented
behavior would suggest.

Minimal reproducible example:

```r
# Simple function demonstrating the issue
f <- function(a, b, c) {
   (a + b) / c
}

# Vectors of different lengths (not multiples of each other)
a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)

# Direct call
direct_result <- f(a, b, c)

# mapply (recycles all inputs to length 7 first, then applies element-wise)
mapply_result <- mapply(f, a, b, c)

# Compare results
cat("Direct call result:\n")
print(direct_result)

cat("\nmapply result:\n")
print(mapply_result)

cat("\nResults are identical:", identical(direct_result, mapply_result), "\n")

sessionInfo()
```

Output:

f <- function(a, b, c) {
+   (a + b) / c
+ }

a <- c(1, 2, 3, 4)
b <- c(10, 20, 30)
c <- c(100, 200, 300, 400, 500, 600, 700)

direct_result <- f(a, b, c)
Warning messages:
1: In a + b :
   longer object length is not a multiple of shorter object length
2: In (a + b)/c :
   longer object length is not a multiple of shorter object length

mapply_result <- mapply(f, a, b, c)
Warning messages:
1: In mapply(f, a, b, c) :
   longer argument not a multiple of length of shorter
2: In mapply(f, a, b, c) :
   longer argument not a multiple of length of shorter

print(direct_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.02200000 0.03666667 0.04714286

print(mapply_result)
[1] 0.11000000 0.11000000 0.11000000 0.03500000 0.04200000 0.05333333 0.01857143

cat("\nResults are identical:", identical(direct_result, mapply_result), "\n")
Results are identical: FALSE

sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 22.2

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Denver
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.3 tools_4.3.3

Explanation of what is happening:

In the direct call, recycling occurs independently at each binary operation:

1. `a + b` is evaluated first: `a` (length 4) and `b` (length 3) are recycled
    to length 4, producing `[11, 22, 33, 14]`

2. The length-4 result is then divided by `c` (length 7): the length-4 result
    is recycled to length 7 as `[11, 22, 33, 14, 11, 22, 33]`, then divided by 
`c`

3. Final result: `[0.11, 0.11, 0.11, 0.035, 0.022, 0.0367, 0.0471]`

However, based on my reading of the documentation, I would expect all three
vectors (`a`, `b`, and `c`) to be recycled to length 7 (the length of the
longest vector in the expression) before any operations are performed, which
is what `mapply` does:

- `a` recycled to length 7: `[1, 2, 3, 4, 1, 2, 3]`
- `b` recycled to length 7: `[10, 20, 30, 10, 20, 30, 10]`
- `c` unchanged: `[100, 200, 300, 400, 500, 600, 700]`
- Then `(a + b) / c` computed element-wise: `[0.11, 0.11, 0.11, 0.035, 0.042, 
0.0533, 0.0186]`

The key difference is at positions 5, 6, and 7. In the direct call, the
intermediate result `(a + b)` has length 4 and is recycled independently of
the original vectors when dividing by `c`.

Question:

Does this example represent a bug in R's recycling behavior, or is the
documentation in Section 2.2 not intended to describe how recycling works in
expressions with multiple binary operations? If the current behavior is
intentional, could the documentation be clarified to explain that recycling
occurs at each binary operation rather than globally across the expression?


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to