On 29/11/2016 14:17, Heli wrote:
Hi all,

Let me update my question, I have an ascii file(7G) which has around 100M 
lines.  I read this file using :

f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

x=f[:,1]
y=f[:,2]
z=f[:,3]
id=f[:,0]

I will need the x,y,z and id arrays later for interpolations. The problem is 
reading the file takes around 80 min while the interpolation only takes 15 mins.

I tried to get the memory increment used by each line of the script using 
python memory_profiler module.

The following line which reads the entire 7.4 GB file increments the memory 
usage by 3206.898 MiB (3.36 GB). First question is Why it does not increment 
the memory usage by 7.4 GB?

Is there enough total RAM capacity for another 4.2GB?

But if the file is text, and being read into binary data in memory, it will be different. Usually binary data takes less space. I assume the loader doesn't load the entire text file first, do the conversions to binary, then unloads file, as that would then require 10.6GB during that process!

f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

The following 4 lines do not increment the memory at all.
x=f[:,1]
y=f[:,2]
z=f[:,3]
id=f[:,0]

That's surprising because if those are slices, they would normally create a copy (I suppose you don't set f to 0 or something after those lines). But if numpy data is involved, I seem to remember that slices are actually views into the data.

Finally I still would appreciate if you could recommend me what is the most 
optimized way to read/write to files in python? are numpy np.loadtxt and 
np.savetxt the best?

Why not post a sample couple of lines from the file? (We don't need the other 99,999,998 assuming they are all have the same format.) Then we can see if there's anything obviously inefficient about it.

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to