On Tue, Oct 16, 2012 at 11:35 AM, Pradipto Banerjee <pradipto.baner...@adainvestments.com> wrote: > I am working with a series of large files with sizes 4 to 10GB and may need > to read these files repeated. What data format (i.e. pickle, json, csv, etc.) > is considered the fastest for reading via python?
Pickle /ought/ to be fastest, since it's binary (unless you use the oldest protocol version) and native to Python. Be sure to specify HIGHEST_PROTOCOL and use cPickle. http://docs.python.org/2/library/pickle.html#module-cPickle http://docs.python.org/2/library/pickle.html#pickle.HIGHEST_PROTOCOL You might consider using SQLite (or some other database) if you will be doing queries over the data that would be amenable to SQL or similar. http://docs.python.org/2/library/sqlite3.html Cheers, Chris P.S. The verbose disclaimer at the end of your emails is kinda annoying... -- http://mail.python.org/mailman/listinfo/python-list