Re: Numpy and Terabyte data

Paul Moore Tue, 02 Jan 2018 12:16:25 -0800

On 2 January 2018 at 17:24, Rustom Mody <rustompm...@gmail.com> wrote:
> Someone who works in hadoop asked me:
>
> If our data is in terabytes can we do statistical (ie numpy pandas etc)
> analysis on it?
>
> I said: No (I dont think so at least!) ie I expect numpy (pandas etc)
> to not work if the data does not fit in memory
>
> Well sure *python* can handle (streams of) terabyte data I guess
> *numpy* cannot
>
> Is there a more sophisticated answer?
>
> ["Terabyte" is a just a figure of speech for "too large for main memory"]


You might want to look at Dask (https://pypi.python.org/pypi/dask,
docs at http://dask.pydata.org/en/latest/).

I've not used it myself, but I believe it's designed for very much the
sort of use case you describe.
Paul
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Numpy and Terabyte data

Reply via email to