Dear R-SIG-MAC I bought a new MacBook Air with the M3 chip, which has 8 CPUs, 10 GPUs, and 16GB of integrated memory. My R `torch` apps are crashing. I have assembled an MWE that works on other Mac architectures, including MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:
```{r} # ==== MWE # Download the training samples rds_file <- "https://raw.githubusercontent.com/e-sensing/sitsdata/master/inst/extdata/torch/train_samples.rds?raw=true"; dest_file <- paste0(tempdir(),"/train_samples.rds") download.file(rds_file, destfile = dest_file, method = "curl") train_samples <- readRDS(dest_file) # Sample labels labels <- c("Cerrado", "Forest", "Pasture", "Soy_Corn") # Create numeric labels vector code_labels <- seq_along(labels) names(code_labels) <- labels # Split the data into training and validation data sets # Create partitions for different splits of the input data frac <- 0.2 train_samples <- dplyr::group_by(train_samples, .data[["label"]]) test_samples <- train_samples |> dplyr::slice_sample(prop = frac) |> dplyr::ungroup() # Remove the lines used for validation sel <- !train_samples[["sample_id"]] %in% test_samples[["sample_id"]] train_samples <- train_samples[sel, ] # Shuffle the data train_samples <- train_samples[sample(nrow(train_samples), nrow(train_samples)), ] test_samples <- test_samples[sample(nrow(test_samples), nrow(test_samples)), ] # Organize data for model training train_x <- as.matrix(train_samples[, -2:0]) train_y <- unname(code_labels[train_samples[["label"]]]) # Create the test data test_x <- as.matrix(test_samples[, -2:0]) test_y <- unname(code_labels[test_samples[["label"]]]) # Set torch seed torch::torch_manual_seed(sample.int(10^5, 1)) # Avoid a global variable for 'self' self <- NULL # function to create a simple sequential NN module .torch_linear_relu_dropout <- torch::nn_module( classname = "torch_linear_batch_norm_relu_dropout", initialize = function(input_dim, output_dim, dropout_rate) { self$block <- torch::nn_sequential( torch::nn_linear(input_dim, output_dim), torch::nn_relu(), torch::nn_dropout(dropout_rate) ) }, forward = function(x) { self$block(x) } ) # Define the MLP architecture mlp_model <- torch::nn_module( initialize = function(num_pred, layers, dropout_rates, y_dim) { tensors <- list() # input layer tensors[[1]] <- .torch_linear_relu_dropout( input_dim = num_pred, output_dim = 512, dropout_rate = 0.40 ) # output layer tensors[[length(tensors) + 1]] <- torch::nn_linear(layers[length(layers)], y_dim) # add softmax tensor tensors[[length(tensors) + 1]] <- torch::nn_softmax(dim = 2) # create a sequential module that calls the layers in the same # order. self$model <- torch::nn_sequential(!!!tensors) }, forward = function(x) { self$model(x) } ) # Train the model using luz torch_model <- luz::setup( module = mlp_model, loss = torch::nn_cross_entropy_loss(), metrics = list(luz::luz_metric_accuracy()), optimizer = torch::optim_adamw, ) torch_model <- luz::set_hparams( torch_model, num_pred = ncol(train_x), layers = 512, dropout_rates = 0.3, y_dim = length(code_labels) ) torch_model <- luz::set_opt_hparams( torch_model, lr = 0.001, eps = 1e-08, weight_decay = 1.0e-06 ) torch_model <- luz::fit( torch_model, data = list(train_x, train_y), epochs = 100, valid_data = list(test_x, test_y), callbacks = list(luz::luz_callback_early_stopping( patience = 20, min_delta = 0.01 )), verbose = TRUE ) ``` The error occurs in the `luz::fit` function. Inside RStudio, the code gets stuck, and then RStudio asks to restart R. When running R from the terminal, the output is: ```{r} *** caught bus error *** address 0x16daa0000, cause 'invalid alignment' *** caught segfault *** address 0x9, cause 'invalid permissions' zsh: segmentation fault R ``` The `sessionInfo()` output is as follows: ```{r} R version 4.4.0 (2024-04-24) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Sao_Paulo tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] crayon_1.5.2 vctrs_0.6.5 cli_3.6.2 zeallot_0.1.0 [5] rlang_1.1.3 processx_3.8.4 generics_0.1.3 torch_0.12.0.9000 [9] coro_1.0.4 glue_1.7.0 bit_4.0.5 prettyunits_1.2.0 [13] luz_0.4.0 ps_1.7.6 hms_1.1.3 fansi_1.0.6 [17] tibble_3.2.1 progress_1.2.3 lifecycle_1.0.4 compiler_4.4.0 [21] dplyr_1.1.4 fs_1.6.4 Rcpp_1.0.12 pkgconfig_2.0.3 [25] rstudioapi_0.16.0 R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4 [29] pillar_1.9.0 callr_3.7.6 magrittr_2.0.3 tools_4.4.0 [33] bit64_4.0.5 ``` Any clues will be most appreciated. Thanks Gilberto ============================ Prof Dr Gilberto Camara Senior Researcher National Institute for Space Research (INPE), Brazil https://gilbertocamara.org/ ============================= _______________________________________________ R-SIG-Mac mailing list R-SIG-Mac@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac