Heli writes: > Hi all, > > Let me update my question, I have an ascii file(7G) which has around > 100M lines. I read this file using : > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) > > x=f[:,1] > y=f[:,2] > z=f[:,3] > id=f[:,0] > > I will need the x,y,z and id arrays later for interpolations. The > problem is reading the file takes around 80 min while the > interpolation only takes 15 mins.
(Are there only those four columns in the file? I guess yes.) > The following line which reads the entire 7.4 GB file increments the > memory usage by 3206.898 MiB (3.36 GB). First question is Why it does > not increment the memory usage by 7.4 GB? > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) In general, doubles take more space as text than as, well, doubles, which (in those arrays) take eight bytes (64 bits) each: >>> len("0.1411200080598672 -0.9899924966004454 -0.1425465430742778 >>> 20.085536923187668 ") 78 >>> 4*8 32 > Finally I still would appreciate if you could recommend me what is the > most optimized way to read/write to files in python? are numpy > np.loadtxt and np.savetxt the best? A document I found says "This function aims to be a fast reader for simply formatted files" so as long as you want to save the numbers as text, this is probably meant to be the best way. https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html Perhaps there are binary load and save functions? They could be faster. The binary data file would be opaque, but probably you are not editing it by hand anyway. -- https://mail.python.org/mailman/listinfo/python-list