[R-pkg-devel] How to decrease time to import files in xlsx format?

Igor L Tue, 04 Oct 2022 11:31:54 -0700

Hello all,

I'm developing an R package that basically downloads, imports, cleans and
merges nine files in xlsx format updated monthly from a public institution.


The problem is that importing files in xlsx format is time consuming.

My initial idea was to parallelize the execution of the read_xlsx function
according to the number of cores in the user's processor, but apparently it
didn't make much difference, since when trying to parallelize it the
execution time went from 185.89 to 184.12 seconds:

# not parallelized code
y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
               readxl::read_excel, sheet = 1, skip = 4, col_types =
c(rep('text', 30)))

# parallelized code
plan(strategy = future::multicore(workers = 4))
y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
                             readxl::read_excel, sheet = 1, skip = 4,
col_types = c(rep('text', 30)))

 Any suggestions to reduce the import processing time?

Thanks in advance!

-- 
*Igor Laltuf Marques*
Economist (UFF)
Master in Urban and Regional Planning (IPPUR-UFRJ)
Researcher at ETTERN and CiDMob
https://igorlaltuf.github.io/

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[R-pkg-devel] How to decrease time to import files in xlsx format?

Reply via email to