[ 
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630753#comment-17630753
 ] 

Carl Boettiger commented on ARROW-16680:
----------------------------------------

Hi folks, not to nag but this issue is still killing us.  It seems only to 
occur when accessing relatively large remote S3 data, and even then isn't 100% 
repeatable, but I can't avoid it by setting CURLOPT_NOSIGNAL.  This prevents us 
from using arrow in large automated workflows...

We can avoid it by running using littler instead of R / RScript, as littler can 
accept the sigpipe, but that is of no use in other tools that control how R is 
called,  such as quarto notebooks.  We reported this to the quarto team 
([https://github.com/quarto-dev/quarto-cli/issues/1667#issuecomment-1204554958)]
 but after some trial mechanisms to avoid it JJ suggested it really needs to be 
resolved upstream instead...

> [R] Weird R error: Error in 
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) :    ignoring SIGPIPE 
> signal
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-16680
>                 URL: https://issues.apache.org/jira/browse/ARROW-16680
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 8.0.0
>            Reporter: Carl Boettiger
>            Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out 
> of me.  The following block of all R code, when run with Rscript (or embedded 
> into any form of Rmd, quarto, knitr doc) produces the error below (at least 
> most of the time):
>  
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket = 
> "scores/parquet",
>                        endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>  
>  
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) : 
>   ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector 
> {code}
> But only when run as a script! When run interactively in an R console, this 
> code runs just fine.  Even as a script the code seems to run fine, but 
> erroneously seems to be attempting this sigpipe I don't understand.  
> If the script is executed with litter 
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since 
> littler handles sigpipe but Rscripts don't.  But I have no idea why the above 
> code throws a pipe in the first place.  Worse, if I choose a different filter 
> for the above, like "aquatics", it (usually) works without the error.  
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in 
> this, but would really appreciate any hints on how to avoid this as it makes 
> it very hard to use arrow in workflows right now! 
>  
> thanks for all you do!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to