Re: Out of memory while reading excel file

Mahmood Naderan via Python-list Wed, 10 May 2017 10:22:47 -0700

Well actually cells are treated as strings and not integer or float numbers.

One way to overcome is to get the number of rows and then split it to 4 or 5 
arrays and then process them. However, i was looking for a better solution. 

I read in pages that large excels are in the order of milion rows. Mine is 
about 100k. Currently, the task manager shows about 4GB of ram usage while 
working with numpy.

Regards,
Mahmood

--------------------------------------------
On Wed, 5/10/17, Peter Otten <__pete...@web.de> wrote:

 Subject: Re: Out of memory while reading excel file
 To: python-list@python.org
 Date: Wednesday, May 10, 2017, 3:48 PM

 Mahmood Naderan via Python-list wrote:

 > Thanks for your reply. The
 openpyxl part (reading the workbook) works
 > fine. I printed some debug
 information and found that when it reaches the
 > np.array, after some 10 seconds,
 the memory usage goes high.
 > 
 > 
 > So, I think numpy is unable to
 manage the memory.

 Hm, I think numpy is designed to manage
 huge arrays if you have enough RAM.

 Anyway: are all values of the same
 type? Then the numpy array may be kept 
 much smaller than in the general case
 (I think). You can also avoid the 
 intermediate list of lists:

 wb =
 load_workbook(filename='beta.xlsx', read_only=True)
 ws = wb['alpha']

 a = numpy.zeros((ws.max_row,
 ws.max_column), dtype=float)
 for y, row in enumerate(ws.rows):
     a[y] = [cell.value for
 cell in row]

 -- 
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Out of memory while reading excel file

Reply via email to