Re: best way to read a huge ascii file.

2016-11-30 Thread Rolando Espinoza
Hi, Yes, working with binary formats is the way to go when you have large data. But for further reference, Dask[1] fits perfectly for your use case, see below how I process a 7Gb text file under 17 seconds (in a laptop: mbp + quad-core + ssd). # Create roughly ~7Gb worth text data. In [40]: impo

Re: best way to read a huge ascii file.

2016-11-30 Thread Chris Angelico
On Thu, Dec 1, 2016 at 3:26 AM, BartC wrote: > On 30/11/2016 16:16, Heli wrote: >> >> Hi all, >> >> Writing my ASCII file once to either of pickle or npy or hdf data types >> and then working afterwards on the result binary file reduced the read time >> from 80(min) to 2 seconds. > > > 240,000% f

Re: best way to read a huge ascii file.

2016-11-30 Thread BartC
On 30/11/2016 16:16, Heli wrote: Hi all, Writing my ASCII file once to either of pickle or npy or hdf data types and then working afterwards on the result binary file reduced the read time from 80(min) to 2 seconds. 240,000% faster? Something doesn't sound quite right! How big is the file

Re: best way to read a huge ascii file.

2016-11-30 Thread Heli
Hi all, Writing my ASCII file once to either of pickle or npy or hdf data types and then working afterwards on the result binary file reduced the read time from 80(min) to 2 seconds. Thanks everyone for your help. -- https://mail.python.org/mailman/listinfo/python-list

Re: best way to read a huge ascii file.

2016-11-29 Thread Steve D'Aprano
On Wed, 30 Nov 2016 01:17 am, Heli wrote: > The following line which reads the entire 7.4 GB file increments the > memory usage by 3206.898 MiB (3.36 GB). First question is Why it does not > increment the memory usage by 7.4 GB? > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

Re: best way to read a huge ascii file.

2016-11-29 Thread BartC
On 29/11/2016 14:17, Heli wrote: Hi all, Let me update my question, I have an ascii file(7G) which has around 100M lines. I read this file using : f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) x=f[:,1] y=f[:,2] z=f[:,3] id=f[:,0] I will need the x,y,z and id arrays later

Re: best way to read a huge ascii file.

2016-11-29 Thread marco . nawijn
On Tuesday, November 29, 2016 at 3:18:29 PM UTC+1, Heli wrote: > Hi all, > > Let me update my question, I have an ascii file(7G) which has around 100M > lines. I read this file using : > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) > > x=f[:,1] > y=f[:,2] > z=f[:,3]

Re: best way to read a huge ascii file.

2016-11-29 Thread Jussi Piitulainen
Heli writes: > Hi all, > > Let me update my question, I have an ascii file(7G) which has around > 100M lines. I read this file using : > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) > > x=f[:,1] > y=f[:,2] > z=f[:,3] > id=f[:,0] > > I will need the x,y,z and id arrays

Re: best way to read a huge ascii file.

2016-11-29 Thread Heli
Hi all, Let me update my question, I have an ascii file(7G) which has around 100M lines. I read this file using : f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) x=f[:,1] y=f[:,2] z=f[:,3] id=f[:,0] I will need the x,y,z and id arrays later for interpolations. The prob

Re: best way to read a huge ascii file.

2016-11-25 Thread Steve D'Aprano
On Sat, 26 Nov 2016 02:17 am, Heli wrote: > Hi, > > I have a huge ascii file(40G) and I have around 100M lines. I read this > file using : > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) [...] > I will need the x,y,z and id arrays later for interpolations. The problem > is

Re: best way to read a huge ascii file.

2016-11-25 Thread BartC
On 25/11/2016 15:17, Heli wrote: I have a huge ascii file(40G) and I have around 100M lines. I read this file using : f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) x=f1[:,1] y=f1[:,2] z=f1[:,3] id=f1[:,0] I will need the x,y,z and id arrays later for interpolations. The p

Re: best way to read a huge ascii file.

2016-11-25 Thread Marko Rauhamaa
Heli : > I have a huge ascii file(40G) and I have around 100M lines. I read this > file using : > > [...] > > The problem is reading the file takes around 80 min while the > interpolation only takes 15 mins. > > I was wondering if there is a more optimized way to read the file that > would reduce

best way to read a huge ascii file.

2016-11-25 Thread Heli
Hi, I have a huge ascii file(40G) and I have around 100M lines. I read this file using : f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) x=f1[:,1] y=f1[:,2] z=f1[:,3] id=f1[:,0] I will need the x,y,z and id arrays later for interpolations. The problem is reading the file t