On Wednesday, January 3, 2018 at 1:43:40 AM UTC+5:30, Paul Moore wrote: > On 2 January 2018 at 17:24, Rustom Mody wrote: > > Someone who works in hadoop asked me: > > > > If our data is in terabytes can we do statistical (ie numpy pandas etc) > > analysis on it? > > > > I said: No (I dont think so at least!) ie I expect numpy (pandas etc) > > to not work if the data does not fit in memory > > > > Well sure *python* can handle (streams of) terabyte data I guess > > *numpy* cannot > > > > Is there a more sophisticated answer? > > > > ["Terabyte" is a just a figure of speech for "too large for main memory"] > > You might want to look at Dask (https://pypi.python.org/pypi/dask, > docs at http://dask.pydata.org/en/latest/).
Thanks Looks like what I was asking about -- https://mail.python.org/mailman/listinfo/python-list