Re: best way to read a huge ascii file.

BartC Tue, 29 Nov 2016 09:33:24 -0800

On 29/11/2016 14:17, Heli wrote:

Hi all,


Let me update my question, I have an ascii file(7G) which has around 100M 
lines.  I read this file using :

f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

x=f[:,1]
y=f[:,2]
z=f[:,3]
id=f[:,0]

I will need the x,y,z and id arrays later for interpolations. The problem is 
reading the file takes around 80 min while the interpolation only takes 15 mins.

I tried to get the memory increment used by each line of the script using 
python memory_profiler module.

The following line which reads the entire 7.4 GB file increments the memory 
usage by 3206.898 MiB (3.36 GB). First question is Why it does not increment 
the memory usage by 7.4 GB?


Is there enough total RAM capacity for another 4.2GB?

But if the file is text, and being read into binary data in memory, itwill be different. Usually binary data takes less space. I assume theloader doesn't load the entire text file first, do the conversions tobinary, then unloads file, as that would then require 10.6GB during thatprocess!

f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

The following 4 lines do not increment the memory at all.
x=f[:,1]
y=f[:,2]
z=f[:,3]
id=f[:,0]

That's surprising because if those are slices, they would normallycreate a copy (I suppose you don't set f to 0 or something after thoselines). But if numpy data is involved, I seem to remember that slicesare actually views into the data.

Finally I still would appreciate if you could recommend me what is the most 
optimized way to read/write to files in python? are numpy np.loadtxt and 
np.savetxt the best?

Why not post a sample couple of lines from the file? (We don't need theother 99,999,998 assuming they are all have the same format.) Then wecan see if there's anything obviously inefficient about it.


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: best way to read a huge ascii file.

Reply via email to