Javier Luraschi created ARROW-4565:
--------------------------------------

             Summary: [R] Reading records with all non-null decimals SEGFAULTs
                 Key: ARROW-4565
                 URL: https://issues.apache.org/jira/browse/ARROW-4565
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Javier Luraschi


Repro,

 
{code:java}
library(sparklyr)
library(arrow)
sc <- spark_connect(master = "local")
sdf_len(sc, 10^5) %>% dplyr::mutate(batch = id %% 10)
{code}
 

produces using Arrow 0.12, no repro under Arrow 0.11.

 
{code:java}
 *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
 1: RecordBatch__to_dataframe(x, use_threads = use_threads)
 2: `as_tibble.arrow::RecordBatch`(record_entry)
 3: tibble::as_tibble(record_entry)
 4: arrow_read_stream(.)
 5: function_list[[i]](value)
 6: freduce(value, `_function_list`)
 7: `_fseq`(`_lhs`)
 8: eval(quote(`_fseq`(`_lhs`)), env, env)
 9: eval(quote(`_fseq`(`_lhs`)), env, env)
10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
11: invoke_static(sc, "sparklyr.ArrowConverters", "toArrowBatchRdd",     sdf, 
session, time_zone) %>% arrow_read_stream() %>% dplyr::bind_rows()
12: arrow_collect(object, ...)
{code}
Notice that the following cast is unsupported, I can add a test if someone can 
come up with a way of creating a decimal type.

 

 
{code:java}
batch <- table(tibble::tibble(x = 1:10))
batch$cast(schema(x = decimal())){code}
 
{code:java}
Error in Decimal128Type__initialize(precision, scale) : argument "precision" is 
missing, with no default
{code}
I'll send a PR with a fix...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to