Well actually cells are treated as strings and not integer or float numbers.
One way to overcome is to get the number of rows and then split it to 4 or 5 arrays and then process them. However, i was looking for a better solution. I read in pages that large excels are in the order of milion rows. Mine is about 100k. Currently, the task manager shows about 4GB of ram usage while working with numpy. Regards, Mahmood -------------------------------------------- On Wed, 5/10/17, Peter Otten <__pete...@web.de> wrote: Subject: Re: Out of memory while reading excel file To: python-list@python.org Date: Wednesday, May 10, 2017, 3:48 PM Mahmood Naderan via Python-list wrote: > Thanks for your reply. The openpyxl part (reading the workbook) works > fine. I printed some debug information and found that when it reaches the > np.array, after some 10 seconds, the memory usage goes high. > > > So, I think numpy is unable to manage the memory. Hm, I think numpy is designed to manage huge arrays if you have enough RAM. Anyway: are all values of the same type? Then the numpy array may be kept much smaller than in the general case (I think). You can also avoid the intermediate list of lists: wb = load_workbook(filename='beta.xlsx', read_only=True) ws = wb['alpha'] a = numpy.zeros((ws.max_row, ws.max_column), dtype=float) for y, row in enumerate(ws.rows): a[y] = [cell.value for cell in row] -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list