Re: Speed of csvReader

data pulverizer via Digitalmars-d-learn Thu, 21 Jan 2016 11:30:47 -0800

On Thursday, 21 January 2016 at 19:08:38 UTC, data pulverizerwrote:

On Thursday, 21 January 2016 at 18:46:03 UTC, Justin Whearwrote:
On Thu, 21 Jan 2016 18:37:08 +0000, data pulverizer wrote:
It's interesting that the output first array is not the sameas the input
byLine reuses a buffer (for speed) and the subsequent splitoperation just returns slices into that buffer. So whenbyLine progresses to the next line the strings (slices)returned previously now point into a buffer with differentcontents. You should either use byLineCopy or .idup to createcopies of the relevant strings. If your use-case allows forstreaming and doesn't require having all the data present atonce, you could continue to use byLine and just be careful notto refer to previous rows.
Thanks. It now works with byLineCopy()

Time (s): 1.128


Currently the timing is similar to python pandas:

# Script (Python 2.7.6)
import pandas as pd
import time

col_types = {'col1': str, 'col2': str, 'col3': str, 'col4': str,'col5': str, 'col6': str, 'col7': str, 'col8': str, 'col9': str,'col10': str, 'col11': str, 'col12': str, 'col13': str, 'col14':str, 'col15': str, 'col16': str, 'col17': str, 'col18': str,'col19': str, 'col20': str, 'col21': str, 'col22': str}

begin = time.time()

x = pd.read_csv('Acquisition_2009Q2.txt', sep = '|', dtype =col_types)

end = time.time()

print end - begin

$ python file_read.py
1.19544792175

Re: Speed of csvReader

Reply via email to