On 24-Oct-2012 02:06, Oscar Benjamin wrote:
On 23 October 2012 15:31, Virgil Stokes <v...@it.uu.se> wrote:
I am working with some rather large data files (>100GB) that contain time
series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
format. I perform various types of processing on these data (e.g. moving
median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
manner and only a small number of these data need be stored in RAM when
being processed. When performing Kalman-filtering (forward in time pass, k =
0,1,...,N) I need to save to an external file several variables (e.g. 11*32
bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
(backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
variables saved to an external file from the forward pass, in reverse order
--- from last written to first written.
Finally, to my question --- What is a fast way to write these variables to
an external file and then read them in backwards?
You mentioned elsewhere that you are using numpy. I'll assume that the
data you want to read/write are numpy arrays.
Numpy arrays can be written very efficiently in binary form using
tofile/fromfile:
import numpy
a = numpy.array([1, 2, 5], numpy.int64)
a
array([1, 2, 5])
with open('data.bin', 'wb') as f:
... a.tofile(f)
...
You can then reload the array with:
with open('data.bin', 'rb') as f:
... a2 = numpy.fromfile(f, numpy.int64)
...
a2
array([1, 2, 5])
Numpy arrays can be reversed before writing or after reading using;
a2
array([1, 2, 5])
a2[::-1]
array([5, 2, 1])
Assuming you wrote the file forwards you can make an iterator to yield
the file in chunks backwards like so (untested):
def read_backwards(f, dtype, chunksize=1024 ** 2):
dtype = numpy.dtype(dtype)
nbytes = chunksize * dtype.itemsize
f.seek(0, 2)
fpos = f.tell()
while fpos > nbytes:
f.seek(fpos, 0)
yield numpy.fromfile(f, dtype, chunksize)[::-1]
fpos -= nbytes
yield numpy.fromfile(f, dtype)[::-1]
Oscar
Ok Oscar,
Thanks for the tip and I will look into this more.
--
http://mail.python.org/mailman/listinfo/python-list