On Tue, 14 Apr 2020 at 12:42, Rahul Gupta <rahulgupta100...@gmail.com> wrote: > > Hello all, i have a csv of 1 gb which consists of 25000 columns and 20000 > rows. I want to apply pca so i have seen sciki-learn had inbuilt > fucntionality to use that. But i have seen to do eo you have to load data in > data frame. But my machine is i5 with 8 gb of ram which fails to load all > this data in data frame and shows memory error. Is there any alternative way > that still i could aaply PCA on the same machine to the same rata set
Do you know how to compute a covariance matrix "manually"? If so then it can be done while reading the data line by line without reading all of the data into memory at once. The problem though is that your 25000 columns mean that the matrix itself will fill most of your memory (25000**2*8 bytes == 5 GB using double precision floating point). You can make life much easier for yourself by choosing a subset of the columns that you are likely to be interested in and reducing the size of your dataset before you begin. -- Oscar -- https://mail.python.org/mailman/listinfo/python-list