[ https://issues.apache.org/jira/browse/ARROW-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Claymore Marshall updated ARROW-11579: -------------------------------------- Description: On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read. This issue has been reproduced on 3 different windows machines. All running win 10, R 4.0.0 (or later). read_feather does not hang if using version = 1, or using uncompressed with version 2. This issue does not happen on tests on linux (Ubuntu 20.04 atleast) Example: library(arrow) m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n")) write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd for (j in 1:50){ y <- read_feather("test.feather4") # hangs after an unpredictable number of reads, just on windows though print(paste0("feather read ", j, "...")) } Interestingly, a work around is to use read_feather but call just one column at a time. This does not hang so far. e.g. y returns the full data frame, and this doesn't hang on repeated reads: _y <- lapply(cols, function(col) {_ _read_feather("test.feather4", col_select = all_of(col))_ _})_ was: On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read. This issue has been reproduced on 3 different windows machines. All running win 10, R 4.0.0 (or later). read_feather does not hang if using version = 1, or using uncompressed with version 2. This issue does not happen on tests on linux (Ubuntu 20.04 atleast) Example: _library(arrow)_ _m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n"))_ _write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd_ _for (j in 1:50)_ _ { y <- read_feather("test.feather4") # hangs after an unpredictable number of reads, just on windows though print(paste0("feather read ", j, "...")) } _ Interestingly, a work around is to use read_feather but call just one column at a time. This does not hang so far. e.g. y returns the full data frame, and this doesn't hang on repeated reads: _y <- lapply(cols, function(col) {_ _read_feather("test.feather4", col_select = all_of(col))_ _})_ > R's arrow::read_feather hanging on repeat reads of large objects > ---------------------------------------------------------------- > > Key: ARROW-11579 > URL: https://issues.apache.org/jira/browse/ARROW-11579 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 3.0.0 > Environment: windows 10, R 4.0.0, arrow 3.0.0 > Reporter: Claymore Marshall > Priority: Major > > On windows 10, reading large feather objects in R seems to lead to hanging on > a repeat read. > > This issue has been reproduced on 3 different windows machines. All running > win 10, R 4.0.0 (or later). > read_feather does not hang if using version = 1, or using uncompressed with > version 2. > This issue does not happen on tests on linux (Ubuntu 20.04 atleast) > > Example: > > library(arrow) > m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = > c("a", "n")) > write_feather(m, "test.feather4", version = 2, compression = "lz4") # does > not hang with uncompressed, but does with lz4 and zstd > for (j in 1:50){ > y <- read_feather("test.feather4") # hangs after an unpredictable number of > reads, just on windows though > print(paste0("feather read ", j, "...")) > } > > > > > > Interestingly, a work around is to use read_feather but call just one column > at a time. This does not hang so far. > > e.g. y returns the full data frame, and this doesn't hang on repeated reads: > > _y <- lapply(cols, function(col) {_ > _read_feather("test.feather4", col_select = all_of(col))_ > _})_ > -- This message was sent by Atlassian Jira (v8.3.4#803005)