[ https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630753#comment-17630753 ]
Carl Boettiger commented on ARROW-16680: ---------------------------------------- Hi folks, not to nag but this issue is still killing us. It seems only to occur when accessing relatively large remote S3 data, and even then isn't 100% repeatable, but I can't avoid it by setting CURLOPT_NOSIGNAL. This prevents us from using arrow in large automated workflows... We can avoid it by running using littler instead of R / RScript, as littler can accept the sigpipe, but that is of no use in other tools that control how R is called, such as quarto notebooks. We reported this to the quarto team ([https://github.com/quarto-dev/quarto-cli/issues/1667#issuecomment-1204554958)] but after some trial mechanisms to avoid it JJ suggested it really needs to be resolved upstream instead... > [R] Weird R error: Error in > fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE > signal > -------------------------------------------------------------------------------------------------------------- > > Key: ARROW-16680 > URL: https://issues.apache.org/jira/browse/ARROW-16680 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 8.0.0 > Reporter: Carl Boettiger > Priority: Major > > Okay apologies, this is a bit of a weird error but is annoying the heck out > of me. The following block of all R code, when run with Rscript (or embedded > into any form of Rmd, quarto, knitr doc) produces the error below (at least > most of the time): > > {code:java} > library(arrow) > library(dplyr){code} > {code:java} > Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE") > Sys.unsetenv("AWS_ACCESS_KEY_ID") > Sys.unsetenv("AWS_SECRET_ACCESS_KEY") > Sys.unsetenv("AWS_DEFAULT_REGION") > Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket = > "scores/parquet", > endpoint_override = "data.ecoforecast.org") > ds <- arrow::open_dataset(s3, partitioning = c("theme", "year")) > ds |> dplyr::filter(theme == "phenology") |> dplyr::collect() > {code} > Gives the error > > > {code:java} > Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) : > ignoring SIGPIPE signal > Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector > {code} > But only when run as a script! When run interactively in an R console, this > code runs just fine. Even as a script the code seems to run fine, but > erroneously seems to be attempting this sigpipe I don't understand. > If the script is executed with litter > ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since > littler handles sigpipe but Rscripts don't. But I have no idea why the above > code throws a pipe in the first place. Worse, if I choose a different filter > for the above, like "aquatics", it (usually) works without the error. > I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in > this, but would really appreciate any hints on how to avoid this as it makes > it very hard to use arrow in workflows right now! > > thanks for all you do! > -- This message was sent by Atlassian Jira (v8.20.10#820010)