I am writing a package that imports most of the Brazillian socio-economic micro datasets. (microdadosBrasil <https://github.com/lucasmation/microdadosBrasil>). The idea of the package that the data import is very simple, so even users with verry little R programming knowledge can use the data easily. Although I would like to have decent performance, the first concern is usability.
The package imports data to an in memory data.table object. I am now trying to implement support for out of memory datasets using MonetDBLite. Is there a (non OS dependent) way to predict if a dataset will fit into memory or not? Ideally the package would ask the computer for the maximum amount of RAM that R can use. The package would then default to MonetDBLite if the available RAM was smaller then 3x the in memory size of the dataset. There will also be an argument for the user to choose himself wether to use in RAM or out of RAM, but if that argument is not provided the package would choose for him. In any case, does that seems reasonable? Or should I force the user to be aware of this? Another option would be to default to MonetDB (unless the user explicitly asks for in-memory data). Is MonetDB performance so good that it would not make much of a difference? Another disadvantage of the MonetDB default is that the user will not be able to run base-R data manipulation commands. So he will have to use dplyr (which is great and simple) or SQL queries (which few people will know). reagards Lucas [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel