Dear Melissa,

Normally, in evaluating a formula an R modeling function follows the scoping rules in ?formula; that is,

"A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

"Formulas created with the ~ operator use the environment in which they were created. Formulas created with as.formula will use the env argument for their environment."

So, for example, if the variables in the formula live in the environment of the calling function and if the data argument isn't used, then the variables should be found. Modifying your example a bit and calling lme() works, for example:

------- snip -------

> analyze_this <- function(df) {
+
+   mean.x <- mean(df$age)
+   mean.y <- mean(df$height)
+   sd.x <- sd(df$age)
+   sd.y <- sd(df$height)
+
+   x <- (df$age - mean.x) / sd.x
+   y <- (df$height - mean.y) / sd.y
+   X <- model.matrix(~ x * male * black, data = df)
+   dummyID <- rep(1:2, times=c(floor(nrow(X)/2), ceiling(nrow(X)/2)))
+
+   lme(y ~ X[, -1], random= ~ 1 | dummyID)
+
+   # groupedData(y ~ X[, -1] | dummyID)
+
+ }

> analyze_this(growthIndiana)
Linear mixed-effects model fit by REML
  Data: NULL
  Log-restricted-likelihood: -2546.266
  Fixed: y ~ X[, -1]
(Intercept) X[, -1]x X[, -1]male X[, -1]black -0.16086555 0.73401464 0.26303561 -0.04761425 X[, -1]x:male X[, -1]x:black X[, -1]male:black X[, -1]x:male:black 0.27517924 -0.10318100 0.21899350 0.03048160

Random effects:
 Formula: ~1 | dummyID
         (Intercept)  Residual
StdDev: 3.688461e-05 0.4462984

Number of Observations: 4123
Number of Groups: 2

------- snip -------

Note that model.matrix() finds x and y in the environment of analyze_this(), and male and black in df.

But if you unquote the line groupedData(y ~ X[, -1] | dummyID), the function fails:

------- snip -------

> analyze_this(growthIndiana)
 Error in data.frame(y = y, X = X, dummyID = dummyID) :
  object 'X' not found

------- snip -------

This suggests that groupedData() is doing something unusual (which I don't have the inclination to figure out).

I'm not sure why one needs to manipulate the model matrix directly like this, but I assume that there is some coherent reason or you wouldn't be asking. Also isn't the formula for groupedData() supposed to have a *single* covariate on the right, like y ~ x | g (where y, x, and g are individual variables)?

Best,
 John

On 2022-01-07 5:29 p.m., Key, Melissa wrote:
John,

Thanks for your response.  I agree that the definition of the data frame is 
poor (in my defense it came directly from the demo code, but I should have 
checked it more thoroughly).  The good news is that your comments caused me to 
take a closer look at where X was defined, and I found the reason I wasn't 
getting the same results on my Mac and PC - that error was between keyboard and 
chair.

There is still something funny going on though (at least relative to my 
previous experience with how R searches environments:

If X is defined in the global environment, groupedData can find it there and 
use it.  (this is what I'm used to)
If X is defined within a function, groupedData cannot find it, even if 
groupedData is called within the same function. (this seems strange to me - 
usually parent.frame() captures information within the function environment, or 
so I thought)

My solution at the bottom still works - and unlike groupedData, nlme allows a 
list as input to the data argument (or at least, doesn't check to make sure 
it's a data frame), so I have a working (albeit hacky) solution that actually 
makes more sense to me than using groupedData, but it still seems strange that 
the function cannot find X in its search path.

Thanks again!
Melissa

-----Original Message-----
From: John Fox <j...@mcmaster.ca>
Sent: Friday, January 7, 2022 4:35 PM
To: Key, Melissa <m...@infoscitex.com>
Cc: r-help@r-project.org
Subject: [EXTERNAL] Re: [R] bug in Windows implementation of nlme::groupedData

Dear Melissa,

It seems strange to me that your code would work on any platform (it doesn't on my Mac) 
because the data frame you create shouldn't contain a matrix named "X" but 
rather columns including those originating from X.
To illustrate:

  > X <- matrix(1:12, 4, 3)
  > colnames(X) <- c("a", "b", "c")
  > X
       a b  c
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

  > y <- 1:4

  > (D <- data.frame(y, X))
    y a b  c
1 1 1 5  9
2 2 2 6 10
3 3 3 7 11
4 4 4 8 12

  > str(D)
'data.frame':   4 obs. of  4 variables:
   $ y: int  1 2 3 4
   $ a: int  1 2 3 4
   $ b: int  5 6 7 8
   $ c: int  9 10 11 12

My session info:

  > sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.1

Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] nlme_3.1-153 HRW_1.0-5

loaded via a namespace (and not attached):
[1] compiler_4.1.2     tools_4.1.2        KernSmooth_2.23-20
splines_4.1.2
[5] grid_4.1.2         lattice_0.20-45

I hope this helps,
   John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: 
https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&amp;data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=nTXsf6vbxVIstMup5AUlmABv9CdQa3hw44Fdb2lej7A%3D&amp;reserved=0

On 2022-01-07 11:23 a.m., Key, Melissa wrote:
I am trying to replicate a semi-parametric analysis described in Harezlak, Jaroslaw, David 
Ruppert, and Matt P. Wand. Semiparametric regression with R. New York, NY: Springer, 2018. 
(https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Flink.springer.com%2Fbook%2F10.1007%252F978-1-4939-8853-2&amp;data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ESCv%2Bp%2FPo7pyKm055Y4nYAK8Cj1RTkorP9ZvahZTPmE%3D&amp;reserved=0).

I can successfully run the analysis, but now I'm trying to move it into my 
workflow, which requires that the analysis be conducted within a function 
(using targets package), and the `groupedData` function now fails with an error 
that it cannot find the `X` matrix (see reprex below).  I've tried the reprex 
on both my personal Mac (where it works??) and on windows machines (where it 
does not) - so the problem is likely specific to Windows computers (yes, this 
seems weird to me too).
All packages have been updated, and I'm running the latest version of R on all 
machines.

Reprex:

library(HRW) # contains example data and ZOSull function
library(nlme)

data(growthIndiana)


analyze_this <- function(df) {

    mean.x <- mean(df$age)
    mean.y <- mean(df$height)
    sd.x <- sd(df$age)
    sd.y <- sd(df$height)

    df$x <- (df$age - mean.x) / sd.x
    df$y <- (df$height - mean.y) / sd.y

    X <- model.matrix(~ x * male * black, data = df)
    dummyID <- rep(1, length(nrow(X)))

    grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
data = data.frame(y = df$y, X, dummyID))

}


# doesn't work on Windows machine, does work on the Mac
analyze_this(growthIndiana)
#> Error in eval(aux[[2]], object): object 'X' not found

# does work

df <- growthIndiana

mean.x <- mean(df$age)
mean.y <- mean(df$height)
sd.x <- sd(df$age)
sd.y <- sd(df$height)

df$x <- (df$age - mean.x) / sd.x
df$y <- (df$height - mean.y) / sd.y

X <- model.matrix(~ x * male * black, data = df) dummyID <- rep(1,
length(nrow(X)))

grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data
= data.frame(y = df$y, X, dummyID))


# attempted work-around.

analyze_this2 <- function(df) {
    num.global.knots = 20
    num.subject.knots = 10

    mean.x <- mean(df$age)
    mean.y <- mean(df$height)
    sd.x <- sd(df$age)
    sd.y <- sd(df$height)

    df$x <- (df$age - mean.x) / sd.x
    df$y <- (df$height - mean.y) / sd.y

    X <- model.matrix(~ x * male * black, data = df)
    dummyID <- rep(1, length(nrow(X)))

    # grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
data = data.frame(y = df$y, X, dummyID))

    global.knots = quantile(unique(df$x), seq(0, 1, length = num.global.knots + 
2)[-c(1, num.global.knots + 2)])
    subject.knots = quantile(unique(df$x), seq(0, 1, length =
num.subject.knots + 2)[-c(1, num.subject.knots + 2)])

    Z.global <- ZOSull(df$x, range.x = range(df$x), global.knots)
    Z.group <- df$black * Z.global
    Z.subject <- ZOSull(df$x, range.x = range(df$x), subject.knots)

    Zblock <- list(
      dummyID = pdIdent(~ 0 + Z.global),
      dummyID = pdIdent(~ 0 + Z.group),
      idnum = pdSymm(~ x),
      idnum = pdIdent(~ 0 + Z.subject)
    )

    df$dummyID <- dummyID
    tmp_data <- c(
      df,
      X = list(X),
      Z.global = list(Z.global),
      Z.group = list(Z.global),
      Z.subject = list(Z.subject)
    )

    fit <- lme(y ~ 0 + X,
      data = tmp_data,
      random = Zblock
    )

}

# this works (warning - lme takes awhile to fit)
analyze_this2(growthIndiana)

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows
10 x64 (build 22000) #> #> Matrix products: default #> #> locale:
#> [1] LC_COLLATE=English_United States.1252 #> [2]
LC_CTYPE=English_United States.1252 #> [3] LC_MONETARY=English_United
States.1252 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United
States.1252 #> #> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base
#>
#> other attached packages:
#> [1] nlme_3.1-153 HRW_1.0-5
#>
#> loaded via a namespace (and not attached):
#>  [1] lattice_0.20-45    digest_0.6.29      withr_2.4.3        grid_4.1.2
#>  [5] magrittr_2.0.1     reprex_2.0.1       evaluate_0.14      
KernSmooth_2.23-20
#>  [9] highr_0.9          stringi_1.7.6      rlang_0.4.12       cli_3.1.0
#> [13] rstudioapi_0.13    fs_1.5.2           rmarkdown_2.11     splines_4.1.2
#> [17] tools_4.1.2        stringr_1.4.0      glue_1.6.0         xfun_0.29
#> [21] yaml_2.2.1         fastmap_1.1.0      compiler_4.1.2     htmltools_0.5.2
#> [25] knitr_1.37

Created on 2022-01-07 by the [reprex
package](https://usg02.safelinks.protection.office365.us/?url=https%3A
%2F%2Freprex.tidyverse.org%2F&amp;data=04%7C01%7Cmkey%40infoscitex.com
%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58
%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=hH
5ncnSVZGQ4KZsvLpvjgM%2Fgf9Zu5QnoeSntTrCZWjk%3D&amp;reserved=0)
(v2.0.1)

   Here's the session Info for the Mac:

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS
Monterey 12.1

Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRl
apack.dylib

locale:[1]
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] nlme_3.1-153  targets_0.8.1 HRW_1.0-5

loaded via a namespace (and not attached):
   [1] compiler_4.1.2     pillar_1.6.4       tools_4.1.2        digest_0.6.28   
   lattice_0.20-45    jsonlite_1.7.2     evaluate_0.14
   [8] lifecycle_1.0.1    tibble_3.1.6       pkgconfig_2.0.3    rlang_0.4.12    
   igraph_1.2.9       cli_3.1.0          yaml_2.2.1
[15] xfun_0.28          fastmap_1.1.0      withr_2.4.2        knitr_1.36        
 htmlwidgets_1.5.4  vctrs_0.3.8        grid_4.1.2
[22] tidyselect_1.1.1   glue_1.5.0         data.table_1.14.2  R6_2.5.1          
 processx_3.5.2     fansi_0.5.0        rmarkdown_2.11
[29] callr_3.7.0        purrr_0.3.4        magrittr_2.0.1     scales_1.1.1      
 codetools_0.2-18   ps_1.6.0           htmltools_0.5.2
[36] ellipsis_0.3.2     splines_4.1.2      colorspace_2.0-2   
KernSmooth_2.23-20 utf8_1.2.2         visNetwork_2.1.0   munsell_0.5.0
[43] crayon_1.4.2


Melissa Key, Ph.D.
Statistician
Non-Invasive Brain Stimulation Team
Infoscitex

Cell: 937-550-4981
Email: m...@infoscitex.com<mailto:m...@infoscitex.com>
Base email:
melissa.key....@us.af.mil<mailto:melissa.key....@us.af.mil>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsta
t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=04%7C01%7Cmkey%40info
scitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690
e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIj
oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&am
p;sdata=4c%2Bnu06ZfIKpwowJCTz2mIl7kAEQR5vbZSw8pTuqumk%3D&amp;reserved=
0 PLEASE do read the posting guide
https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.
r-project.org%2Fposting-guide.html&amp;data=04%7C01%7Cmkey%40infoscite
x.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e9850
38b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sda
ta=iT8fu9JX5i7gUfo0T2gQMVdHEm%2FSThakp2yaOJ2pOBQ%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to