date:20241106

Re: Question on R arrow package

2024-11-06 Thread Weston Pace

I wonder why your workaround is also slow: ``` DS <- arrow::open_dataset(sources = sourcePath) listFiles <- DS$files[!grepl("$folder$",DS$files,fixed=TRUE)] DS2 <- arrow::open_dataset(sources = listFiles) ``` That was going to be my suggestion. Do you know which of the three statements takes a l

Arrow-r equivalent to dplyr's separate() or separate_wider_delim()?

2024-11-06 Thread Schwing, Adam via user

Hello! I would like to take a comma separated string and put each element in its own row. This is easy to do in dplyr using the separate() or separate_wider_delim() plus pivot_longer() functions. However, my dataset is very large because each string has thousands of elements and the dataset con

Re: Question on R arrow package

2024-11-06 Thread Neal Richardson

Looking at https://arrow.apache.org/docs/r/reference/open_dataset.html#arg-factory-options, it seems that `exclude_invalid_files` is slow on remote file systems because of the cost of accessing each file up front to determine if it is valid. And there is `selector_ignore_prefixes`, but it looks lik

Question on R arrow package

2024-11-06 Thread Huschto, Tony

Dear all, I'm using the arrow package to access partitioned parquet data on an AWS S3 bucket. The structure is the typical s3://some_path/entity=ABC/syncDate=mm-dd-/country=US/part***.snappy.parquet Reading the files works very well using DS <- arrow::open_dataset(sources = "s3://some_path/

Re: Question on R arrow package

2024-11-06 Thread Huschto, Tony

It's the last step that takes a lot of time. DS <- arrow::open_dataset(sources = sourcePath) listFiles <- DS$files[!grepl("$folder$",DS$files,fixed=TRUE)] runs very fast, but as DS$files does not contain the "s3://" prefix, I have to add it to listFiles in order to make the following work, and th

Re: Question on R arrow package

Arrow-r equivalent to dplyr's separate() or separate_wider_delim()?

Re: Question on R arrow package

Question on R arrow package

Re: Question on R arrow package

5 matches

Site Navigation

Mail list logo

Footer information