Hi All, I am currently loading 3B (20GB) events in my algorithm for processing. I am reading this data from postgresXL DB cluster (1 coordinator+4 datanodes (8cpu 61GB 200GB machines each)) total 1TB of space.
The whole data loading is taking too much time almost 5days before I can start running my algorithms. Can you please help me in suggesting right technology to choose for inputting data? So clearly DB is the bottleneck right now Should I move away from postgresXL ? Which is most suitable options DB, File, Paraquet File to load data efficiently in R? Look forward to your responses Thanks Prerna [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.