Hello all, I'm developing an R package that basically downloads, imports, cleans and merges nine files in xlsx format updated monthly from a public institution.
The problem is that importing files in xlsx format is time consuming. My initial idea was to parallelize the execution of the read_xlsx function according to the number of cores in the user's processor, but apparently it didn't make much difference, since when trying to parallelize it the execution time went from 185.89 to 184.12 seconds: # not parallelized code y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais), readxl::read_excel, sheet = 1, skip = 4, col_types = c(rep('text', 30))) # parallelized code plan(strategy = future::multicore(workers = 4)) y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais), readxl::read_excel, sheet = 1, skip = 4, col_types = c(rep('text', 30))) Any suggestions to reduce the import processing time? Thanks in advance! -- *Igor Laltuf Marques* Economist (UFF) Master in Urban and Regional Planning (IPPUR-UFRJ) Researcher at ETTERN and CiDMob https://igorlaltuf.github.io/ [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel