Javier Luraschi created ARROW-4565: -------------------------------------- Summary: [R] Reading records with all non-null decimals SEGFAULTs Key: ARROW-4565 URL: https://issues.apache.org/jira/browse/ARROW-4565 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Javier Luraschi
Repro, {code:java} library(sparklyr) library(arrow) sc <- spark_connect(master = "local") sdf_len(sc, 10^5) %>% dplyr::mutate(batch = id %% 10) {code} produces using Arrow 0.12, no repro under Arrow 0.11. {code:java} *** caught segfault *** address 0x10, cause 'memory not mapped' Traceback: 1: RecordBatch__to_dataframe(x, use_threads = use_threads) 2: `as_tibble.arrow::RecordBatch`(record_entry) 3: tibble::as_tibble(record_entry) 4: arrow_read_stream(.) 5: function_list[[i]](value) 6: freduce(value, `_function_list`) 7: `_fseq`(`_lhs`) 8: eval(quote(`_fseq`(`_lhs`)), env, env) 9: eval(quote(`_fseq`(`_lhs`)), env, env) 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 11: invoke_static(sc, "sparklyr.ArrowConverters", "toArrowBatchRdd", sdf, session, time_zone) %>% arrow_read_stream() %>% dplyr::bind_rows() 12: arrow_collect(object, ...) {code} Notice that the following cast is unsupported, I can add a test if someone can come up with a way of creating a decimal type. {code:java} batch <- table(tibble::tibble(x = 1:10)) batch$cast(schema(x = decimal())){code} {code:java} Error in Decimal128Type__initialize(precision, scale) : argument "precision" is missing, with no default {code} I'll send a PR with a fix... -- This message was sent by Atlassian JIRA (v7.6.3#76005)