Wes McKinney created ARROW-3909: ----------------------------------- Summary: [Python] Table.from_pandas call that seemingly should zero copy does not Key: ARROW-3909 URL: https://issues.apache.org/jira/browse/ARROW-3909 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Wes McKinney Fix For: 0.12.0
While doing some performance testing, I noticed that a {{Table.from_pandas}} call that ought to be zero-copy / free was taking 50ms {code} import pandas as pd import pyarrow as pa import numpy as np K = 1000 N = 50000000 df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)}) table = pa.Table.from_pandas(df) {code} I see {code} In [14]: timeit table = pa.Table.from_pandas(df) 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) {code} I haven't determined what's going on (is it counting nulls?), and initial attempts to get a Flamegraph produced a bunch of "unknown" entries -- This message was sent by Atlassian JIRA (v7.6.3#76005)